Tampere University of Technology

TUTCRIS Research Portal

Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Standard

Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks. / Parviainen, Mikko; Pertila, Pasi; Virtanen, Tuomas; Grosche, Peter.

16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018. IEEE, 2018. p. 51-55.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Parviainen, M, Pertila, P, Virtanen, T & Grosche, P 2018, Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks. in 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018. IEEE, pp. 51-55, International Workshop on Acoustic Signal Enhancement, Tokyo, Japan, 17/09/18. https://doi.org/10.1109/IWAENC.2018.8521400

APA

Vancouver

Author

Parviainen, Mikko ; Pertila, Pasi ; Virtanen, Tuomas ; Grosche, Peter. / Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks. 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018. IEEE, 2018. pp. 51-55

Bibtex - Download

@inproceedings{e2e2170d56484caa8a9a09ce4934109a,
title = "Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks",
abstract = "This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.",
keywords = "Neural networks, Speech enhancement",
author = "Mikko Parviainen and Pasi Pertila and Tuomas Virtanen and Peter Grosche",
year = "2018",
month = "11",
day = "2",
doi = "10.1109/IWAENC.2018.8521400",
language = "English",
pages = "51--55",
booktitle = "16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018",
publisher = "IEEE",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - Time-frequency masking strategies for single-channel low-latency speech enhancement using neural networks

AU - Parviainen, Mikko

AU - Pertila, Pasi

AU - Virtanen, Tuomas

AU - Grosche, Peter

PY - 2018/11/2

Y1 - 2018/11/2

N2 - This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.

AB - This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.

KW - Neural networks

KW - Speech enhancement

U2 - 10.1109/IWAENC.2018.8521400

DO - 10.1109/IWAENC.2018.8521400

M3 - Conference contribution

SP - 51

EP - 55

BT - 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018

PB - IEEE

ER -