TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Low-Latency Sound Source Separation Using Deep Neural Networks

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
OtsikkoIEEE Global Conference on Signal and Information Processing, 2016
KustantajaIEEE
Sivut272-276
ISBN (elektroninen)978-1-5090-4544-0
DOI - pysyväislinkit
TilaJulkaistu - 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaIEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING -
Kesto: 1 tammikuuta 1900 → …

Conference

ConferenceIEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING
Ajanjakso1/01/00 → …

Tiivistelmä

Sound source separation at low-latency requires that each incoming frame of audio data be processed at very low delay, and outputted as soon as possible. For practical purposes involving human listeners, a 20 ms algorithmic delay is the uppermost limit which is comfortable to the listener. In this paper, we propose a low-latency (algorithmic delay <20 ms) deep neural network (DNN) based source separation method. The proposed method takes advantage of an extended past context, outputting soft time-frequency masking filters which are then applied to incoming audio frames to give better separation performance as compared to NMF baseline. Acoustic mixtures from five pairs of speakers from CMU Arctic database [1] were used for the experiments. At least 1 dB average improvement in source to distortion ratios (SDR) was observed in our DNN-based system over a low-latency NMF baseline for different processing and analysis frame lengths. The effect of incorporating previous temporal context into DNN inputs yielded significant improvements in SDR for short processing frame lengths.