Tampere University of Technology

TUTCRIS Research Portal

Harmonic-Percussive Source Separation with Deep Neural Networks and Phase Recovery

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Original languageEnglish
Title of host publication16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
ISBN (Electronic)9781538681510
Publication statusPublished - Nov 2018
Publication typeA4 Article in a conference publication
EventInternational Workshop on Acoustic Signal Enhancement (iWAENC) - Hitotsubashi Hall, Tokyo, Japan
Duration: 17 Sep 201820 Sep 2018


ConferenceInternational Workshop on Acoustic Signal Enhancement (iWAENC)


Harmonic/percussive source separation (HPSS) consists in separating the pitched instruments from the percussive parts in a music mixture. In this paper, we propose to apply the recently introduced Masker-Denoiser with twin networks (MaD TwinNet) system to this task. MaD TwinNet is a deep learning architecture that has reached state-of-the-art results in monaural singing voice separation. Herein, we propose to apply it to HPSS by using it to estimate the magnitude spectrogram of the percussive source. Then, we retrieve the complex-valued short-term Fourier transform of the sources by means of a phase recovery algorithm, which minimizes the reconstruction error and enforces the phase of the harmonic part to follow a sinusoidal phase model. Experiments conducted on realistic music mixtures show that this novel separation system outperforms the previous state-of-the art kernel additive model approach.

Publication forum classification

Field of science, Statistics Finland