Tampere University of Technology

TUTCRIS Research Portal

Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition

Research output: Contribution to journalArticleScientificpeer-review

Standard

Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. / Baby, Deepak; Virtanen, Tuomas; Gemmeke, Jort F.; Van hamme, Hugo.

In: Ieee-Acm transactions on audio speech and language processing, Vol. 23, No. 11, 01.11.2015, p. 1788-1799.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

Baby, D, Virtanen, T, Gemmeke, JF & Van hamme, H 2015, 'Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition', Ieee-Acm transactions on audio speech and language processing, vol. 23, no. 11, pp. 1788-1799. https://doi.org/10.1109/TASLP.2015.2450491

APA

Baby, D., Virtanen, T., Gemmeke, J. F., & Van hamme, H. (2015). Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. Ieee-Acm transactions on audio speech and language processing, 23(11), 1788-1799. https://doi.org/10.1109/TASLP.2015.2450491

Vancouver

Baby D, Virtanen T, Gemmeke JF, Van hamme H. Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. Ieee-Acm transactions on audio speech and language processing. 2015 Nov 1;23(11):1788-1799. https://doi.org/10.1109/TASLP.2015.2450491

Author

Baby, Deepak ; Virtanen, Tuomas ; Gemmeke, Jort F. ; Van hamme, Hugo. / Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition. In: Ieee-Acm transactions on audio speech and language processing. 2015 ; Vol. 23, No. 11. pp. 1788-1799.

Bibtex - Download

@article{14c5d324b1c145a2a0d704783f5178dc,
title = "Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition",
abstract = "Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.",
keywords = "Exemplar-based, Modulation envelope, Noise robust automatic speech recognition, Non-negative sparse coding",
author = "Deepak Baby and Tuomas Virtanen and Gemmeke, {Jort F.} and {Van hamme}, Hugo",
year = "2015",
month = "11",
day = "1",
doi = "10.1109/TASLP.2015.2450491",
language = "English",
volume = "23",
pages = "1788--1799",
journal = "Ieee-Acm transactions on audio speech and language processing",
issn = "2329-9290",
publisher = "IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC",
number = "11",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition

AU - Baby, Deepak

AU - Virtanen, Tuomas

AU - Gemmeke, Jort F.

AU - Van hamme, Hugo

PY - 2015/11/1

Y1 - 2015/11/1

N2 - Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.

AB - Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.

KW - Exemplar-based

KW - Modulation envelope

KW - Noise robust automatic speech recognition

KW - Non-negative sparse coding

U2 - 10.1109/TASLP.2015.2450491

DO - 10.1109/TASLP.2015.2450491

M3 - Article

VL - 23

SP - 1788

EP - 1799

JO - Ieee-Acm transactions on audio speech and language processing

JF - Ieee-Acm transactions on audio speech and language processing

SN - 2329-9290

IS - 11

ER -