TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Otsikko16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
KustantajaIEEE
Sivut51-55
Sivumäärä5
ISBN (elektroninen)9781538681510
DOI - pysyväislinkit
TilaJulkaistu - 2 marraskuuta 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Workshop on Acoustic Signal Enhancement - Tokyo, Japani
Kesto: 17 syyskuuta 201820 syyskuuta 2018

Conference

ConferenceInternational Workshop on Acoustic Signal Enhancement
MaaJapani
KaupunkiTokyo
Ajanjakso17/09/1820/09/18

Tiivistelmä

This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.

Tutkimusalat

Julkaisufoorumi-taso