Tampere University of Technology

TUTCRIS Research Portal

Exemplar-based speech enhancement for deep neural network based automatic speech recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Standard

Exemplar-based speech enhancement for deep neural network based automatic speech recognition. / Baby, Deepak; Gemmeke, Jort F.; Virtanen, Tuomas; Van Hamme, Hugo.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. The Institute of Electrical and Electronics Engineers, Inc., 2015. p. 4485-4489.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Baby, D, Gemmeke, JF, Virtanen, T & Van Hamme, H 2015, Exemplar-based speech enhancement for deep neural network based automatic speech recognition. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. The Institute of Electrical and Electronics Engineers, Inc., pp. 4485-4489, IEEE International Conference on Acoustics, Speech and Signal Processing, 1/01/00. https://doi.org/10.1109/ICASSP.2015.7178819

APA

Baby, D., Gemmeke, J. F., Virtanen, T., & Van Hamme, H. (2015). Exemplar-based speech enhancement for deep neural network based automatic speech recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 4485-4489). The Institute of Electrical and Electronics Engineers, Inc.. https://doi.org/10.1109/ICASSP.2015.7178819

Vancouver

Baby D, Gemmeke JF, Virtanen T, Van Hamme H. Exemplar-based speech enhancement for deep neural network based automatic speech recognition. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. The Institute of Electrical and Electronics Engineers, Inc. 2015. p. 4485-4489 https://doi.org/10.1109/ICASSP.2015.7178819

Author

Baby, Deepak ; Gemmeke, Jort F. ; Virtanen, Tuomas ; Van Hamme, Hugo. / Exemplar-based speech enhancement for deep neural network based automatic speech recognition. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. The Institute of Electrical and Electronics Engineers, Inc., 2015. pp. 4485-4489

Bibtex - Download

@inproceedings{6daacbf921394c6087a3fac5a412a47a,
title = "Exemplar-based speech enhancement for deep neural network based automatic speech recognition",
abstract = "Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting, the noisy speech is decomposed as a weighted sum of atoms in an input dictionary containing exemplars sampled from a domain of choice, and the resulting weights are applied to a coupled output dictionary containing exemplars sampled in the short-time Fourier transform (STFT) domain to directly obtain the speech and noise estimates for speech enhancement. In this work, settings using input dictionary of exemplars sampled from the STFT, Mel-integrated magnitude STFT and modulation envelope spectra are evaluated. Experiments performed on the AURORA-4 database revealed that these pre-processing stages can improve the performance of the DNN-HMM-based ASR systems with both clean and multi-condition training.",
keywords = "coupled dictionaries, deep neural networks, modulation envelope, non-negative matrix factorisation, speech enhancement",
author = "Deepak Baby and Gemmeke, {Jort F.} and Tuomas Virtanen and {Van Hamme}, Hugo",
year = "2015",
month = "8",
day = "4",
doi = "10.1109/ICASSP.2015.7178819",
language = "English",
isbn = "9781467369978",
pages = "4485--4489",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "The Institute of Electrical and Electronics Engineers, Inc.",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - Exemplar-based speech enhancement for deep neural network based automatic speech recognition

AU - Baby, Deepak

AU - Gemmeke, Jort F.

AU - Virtanen, Tuomas

AU - Van Hamme, Hugo

PY - 2015/8/4

Y1 - 2015/8/4

N2 - Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting, the noisy speech is decomposed as a weighted sum of atoms in an input dictionary containing exemplars sampled from a domain of choice, and the resulting weights are applied to a coupled output dictionary containing exemplars sampled in the short-time Fourier transform (STFT) domain to directly obtain the speech and noise estimates for speech enhancement. In this work, settings using input dictionary of exemplars sampled from the STFT, Mel-integrated magnitude STFT and modulation envelope spectra are evaluated. Experiments performed on the AURORA-4 database revealed that these pre-processing stages can improve the performance of the DNN-HMM-based ASR systems with both clean and multi-condition training.

AB - Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting, the noisy speech is decomposed as a weighted sum of atoms in an input dictionary containing exemplars sampled from a domain of choice, and the resulting weights are applied to a coupled output dictionary containing exemplars sampled in the short-time Fourier transform (STFT) domain to directly obtain the speech and noise estimates for speech enhancement. In this work, settings using input dictionary of exemplars sampled from the STFT, Mel-integrated magnitude STFT and modulation envelope spectra are evaluated. Experiments performed on the AURORA-4 database revealed that these pre-processing stages can improve the performance of the DNN-HMM-based ASR systems with both clean and multi-condition training.

KW - coupled dictionaries

KW - deep neural networks

KW - modulation envelope

KW - non-negative matrix factorisation

KW - speech enhancement

U2 - 10.1109/ICASSP.2015.7178819

DO - 10.1109/ICASSP.2015.7178819

M3 - Conference contribution

SN - 9781467369978

SP - 4485

EP - 4489

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - The Institute of Electrical and Electronics Engineers, Inc.

ER -