TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement

Tutkimustuotosvertaisarvioitu

Standard

Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement. / Pertilä, Pasi.

2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2019. (IEEE International Workshop on Multimedia Signal Processing).

Tutkimustuotosvertaisarvioitu

Harvard

Pertilä, P 2019, Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement. julkaisussa 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE International Workshop on Multimedia Signal Processing, IEEE, IEEE International Workshop on Multimedia Signal Processing, 1/01/00. https://doi.org/10.1109/MMSP.2019.8901800

APA

Pertilä, P. (2019). Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement. teoksessa 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP) (IEEE International Workshop on Multimedia Signal Processing). IEEE. https://doi.org/10.1109/MMSP.2019.8901800

Vancouver

Pertilä P. Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement. julkaisussa 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE. 2019. (IEEE International Workshop on Multimedia Signal Processing). https://doi.org/10.1109/MMSP.2019.8901800

Author

Pertilä, Pasi. / Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement. 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2019. (IEEE International Workshop on Multimedia Signal Processing).

Bibtex - Lataa

@inproceedings{c7f1c25f09e14643af7ad59de5af2c70,
title = "Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement",
abstract = "The time-frequency mask and the magnitude spectrum are two common targets for deep learning-based speech enhancement. Both the ensemble and the neural network fusion of magnitude spectra obtained with these approaches have been shown to improve the objective perceptual quality with synthetic mixtures of data. This work generalizes the ensemble approach by proposing neural network layers to predict time-frequency varying weights for the combination of the two magnitude spectra. In order to combine the best individual magnitude spectrum estimates, the weight prediction network is trained after the time-frequency mask and magnitude spectrum sub-networks have been separately trained for their corresponding objectives and their weights have been frozen. Using the publicly available CHiME3 -challenge data, which consists of both simulated and real speech recordings in everyday environments with noise and interference, the proposed approach leads to significantly higher noise suppression in terms of segmental source-to-distortion ratio over the alternative approaches. In addition, the approach achieves similar improvements in the average objective instrumentally measured intelligibility scores with respect to the best achieved scores.",
author = "Pasi Pertil{\"a}",
year = "2019",
month = "9",
doi = "10.1109/MMSP.2019.8901800",
language = "English",
isbn = "978-1-7281-1818-5",
series = "IEEE International Workshop on Multimedia Signal Processing",
publisher = "IEEE",
booktitle = "2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)",

}

RIS (suitable for import to EndNote) - Lataa

TY - GEN

T1 - Data-Dependent Ensemble of Magnitude Spectrum Predictions for Single Channel Speech Enhancement

AU - Pertilä, Pasi

PY - 2019/9

Y1 - 2019/9

N2 - The time-frequency mask and the magnitude spectrum are two common targets for deep learning-based speech enhancement. Both the ensemble and the neural network fusion of magnitude spectra obtained with these approaches have been shown to improve the objective perceptual quality with synthetic mixtures of data. This work generalizes the ensemble approach by proposing neural network layers to predict time-frequency varying weights for the combination of the two magnitude spectra. In order to combine the best individual magnitude spectrum estimates, the weight prediction network is trained after the time-frequency mask and magnitude spectrum sub-networks have been separately trained for their corresponding objectives and their weights have been frozen. Using the publicly available CHiME3 -challenge data, which consists of both simulated and real speech recordings in everyday environments with noise and interference, the proposed approach leads to significantly higher noise suppression in terms of segmental source-to-distortion ratio over the alternative approaches. In addition, the approach achieves similar improvements in the average objective instrumentally measured intelligibility scores with respect to the best achieved scores.

AB - The time-frequency mask and the magnitude spectrum are two common targets for deep learning-based speech enhancement. Both the ensemble and the neural network fusion of magnitude spectra obtained with these approaches have been shown to improve the objective perceptual quality with synthetic mixtures of data. This work generalizes the ensemble approach by proposing neural network layers to predict time-frequency varying weights for the combination of the two magnitude spectra. In order to combine the best individual magnitude spectrum estimates, the weight prediction network is trained after the time-frequency mask and magnitude spectrum sub-networks have been separately trained for their corresponding objectives and their weights have been frozen. Using the publicly available CHiME3 -challenge data, which consists of both simulated and real speech recordings in everyday environments with noise and interference, the proposed approach leads to significantly higher noise suppression in terms of segmental source-to-distortion ratio over the alternative approaches. In addition, the approach achieves similar improvements in the average objective instrumentally measured intelligibility scores with respect to the best achieved scores.

U2 - 10.1109/MMSP.2019.8901800

DO - 10.1109/MMSP.2019.8901800

M3 - Conference contribution

SN - 978-1-7281-1818-5

T3 - IEEE International Workshop on Multimedia Signal Processing

BT - 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP)

PB - IEEE

ER -