Tampere University of Technology

TUTCRIS Research Portal

Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms

Research output: Contribution to journalArticleScientificpeer-review

Standard

Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms. / Barker, Tom; Virtanen, Tuomas.

In: Ieee-Acm transactions on audio speech and language processing, Vol. 24, No. 12, 01.12.2016, p. 2377-2389.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

Barker, T & Virtanen, T 2016, 'Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms', Ieee-Acm transactions on audio speech and language processing, vol. 24, no. 12, pp. 2377-2389. https://doi.org/10.1109/TASLP.2016.2602546

APA

Vancouver

Barker T, Virtanen T. Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms. Ieee-Acm transactions on audio speech and language processing. 2016 Dec 1;24(12):2377-2389. https://doi.org/10.1109/TASLP.2016.2602546

Author

Barker, Tom ; Virtanen, Tuomas. / Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms. In: Ieee-Acm transactions on audio speech and language processing. 2016 ; Vol. 24, No. 12. pp. 2377-2389.

Bibtex - Download

@article{e8d6449c27874804bcd20d5b041f9422,
title = "Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms",
abstract = "This paper presents an algorithm for unsupervised single-channel source separation of audio mixtures. The approach specifically addresses the challenging case of separation where no training data are available. By representing mixtures in the modulation spectrogram (MS) domain, we exploit underlying similarities in patterns present across frequency. A three-dimensional tensor factorization is able to take advantage of these redundant patterns, and is used to separate a mixture into an approximated sum of components by minimizing a divergence cost. Furthermore, we show that the basic tensor factorization can be extended with convolution in time being used to improve separation results and provide update rules to learn components in such a manner. Following factorization, sources are reconstructed in the audio domain from estimated components using a novel approach based on reconstruction masks that are learned using MS activations, and then applied to a mixture spectrogram. We demonstrate that the proposed method produces superior separation performance to a spectrally based nonnegative matrix factorization approach, in terms of source-to-distortion ratio. We also compare separation with the perceptually motivated interference-related perceptual score metric and identify cases with higher performance.",
keywords = "Factorization, nonnegative matrix factorization (NMF), source separation, speech enhancement",
author = "Tom Barker and Tuomas Virtanen",
year = "2016",
month = "12",
day = "1",
doi = "10.1109/TASLP.2016.2602546",
language = "English",
volume = "24",
pages = "2377--2389",
journal = "Ieee-Acm transactions on audio speech and language processing",
issn = "2329-9290",
publisher = "IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC",
number = "12",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms

AU - Barker, Tom

AU - Virtanen, Tuomas

PY - 2016/12/1

Y1 - 2016/12/1

N2 - This paper presents an algorithm for unsupervised single-channel source separation of audio mixtures. The approach specifically addresses the challenging case of separation where no training data are available. By representing mixtures in the modulation spectrogram (MS) domain, we exploit underlying similarities in patterns present across frequency. A three-dimensional tensor factorization is able to take advantage of these redundant patterns, and is used to separate a mixture into an approximated sum of components by minimizing a divergence cost. Furthermore, we show that the basic tensor factorization can be extended with convolution in time being used to improve separation results and provide update rules to learn components in such a manner. Following factorization, sources are reconstructed in the audio domain from estimated components using a novel approach based on reconstruction masks that are learned using MS activations, and then applied to a mixture spectrogram. We demonstrate that the proposed method produces superior separation performance to a spectrally based nonnegative matrix factorization approach, in terms of source-to-distortion ratio. We also compare separation with the perceptually motivated interference-related perceptual score metric and identify cases with higher performance.

AB - This paper presents an algorithm for unsupervised single-channel source separation of audio mixtures. The approach specifically addresses the challenging case of separation where no training data are available. By representing mixtures in the modulation spectrogram (MS) domain, we exploit underlying similarities in patterns present across frequency. A three-dimensional tensor factorization is able to take advantage of these redundant patterns, and is used to separate a mixture into an approximated sum of components by minimizing a divergence cost. Furthermore, we show that the basic tensor factorization can be extended with convolution in time being used to improve separation results and provide update rules to learn components in such a manner. Following factorization, sources are reconstructed in the audio domain from estimated components using a novel approach based on reconstruction masks that are learned using MS activations, and then applied to a mixture spectrogram. We demonstrate that the proposed method produces superior separation performance to a spectrally based nonnegative matrix factorization approach, in terms of source-to-distortion ratio. We also compare separation with the perceptually motivated interference-related perceptual score metric and identify cases with higher performance.

KW - Factorization

KW - nonnegative matrix factorization (NMF)

KW - source separation

KW - speech enhancement

U2 - 10.1109/TASLP.2016.2602546

DO - 10.1109/TASLP.2016.2602546

M3 - Article

VL - 24

SP - 2377

EP - 2389

JO - Ieee-Acm transactions on audio speech and language processing

JF - Ieee-Acm transactions on audio speech and language processing

SN - 2329-9290

IS - 12

ER -