Tampere University of Technology

TUTCRIS Research Portal

MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Details

Original languageEnglish
Title of host publication2018 International Joint Conference on Neural Networks (IJCNN)
PublisherIEEE
ISBN (Electronic)978-1-5090-6014-6
ISBN (Print)978-1-5090-6015-3
DOIs
Publication statusPublished - 10 Jul 2018
Publication typeA4 Article in a conference publication
EventInternational Joint Conference on Neural Networks -
Duration: 1 Jan 1900 → …

Publication series

Name
ISSN (Electronic)2161-4407

Conference

ConferenceInternational Joint Conference on Neural Networks
Period1/01/00 → …

Abstract

Monaural singing voice separation task focuses on the prediction of the singing voice from a single channel music mixture signal. Current state of the art (SOTA) results in monaural singing voice separation are obtained with deep learning based methods. In this work we present a novel recurrent neural approach that learns long-term temporal patterns and structures of a musical piece. We build upon the recently proposed Masker-Denoiser (MaD) architecture and we enhance it with the Twin Networks, a technique to regularize a recurrent generative network using a backward running copy of the network. We evaluate our method using the Demixing Secret Dataset and we obtain an increment to signal-to-distortion ratio (SDR) of 0.37 dB and to signal-to-interference ratio (SIR) of 0.23 dB, compared to previous SOTA results.

Publication forum classification

Field of science, Statistics Finland