TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Semi-supervised non-negative tensor factorisation of modulation spectrograms for monaural speech separation

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
OtsikkoNeural Networks (IJCNN), 2014 International Joint Conference on
Sivut3556-3561
Sivumäärä6
DOI - pysyväislinkit
TilaJulkaistu - 1 heinäkuuta 2014
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaINTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS -
Kesto: 1 tammikuuta 1900 → …

Conference

ConferenceINTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS
Ajanjakso1/01/00 → …

Tiivistelmä

This paper details the use of a semi-supervised approach to audio source separation. Where only a single source model is available, the model for an unknown source must be estimated. A mixture signal is separated through factorisation of a feature-tensor representation, based on the modulation spectrogram. Harmonically related components tend to modulate in a similar fashion, and this redundancy of patterns can be isolated. This feature representation requires fewer parameters than spectrally based methods and so minimises overfitting. Following the tensor factorisation, the separated signals are reconstructed by learning appropriate Wiener-filter spectral parameters which have been constrained by activation parameters learned in the first stage. Strong results were obtained for two-speaker mixtures where source separation performance exceeded those used as benchmarks. Specifically, the proposed semi-supervised method outperformed both semi-supervised non-negative matrix factorisation and blind non-negative modulation spectrum tensor factorisation.

Tutkimusalat

Julkaisufoorumi-taso