Tampere University of Technology

TUTCRIS Research Portal

Learning vocal mode classifiers from heterogeneous data sources

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Details

Original languageEnglish
Title of host publication 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
PublisherIEEE Computer Society
Pages16–20
ISBN (Print)978-1-5386-1631-4
DOIs
Publication statusPublished - 2017
Publication typeA4 Article in a conference publication
EventIEEE Workshop on Applications of Signal Processing to Audio and Acoustics -
Duration: 1 Jan 1900 → …

Conference

ConferenceIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Period1/01/00 → …

Abstract

This paper targets on a generalized vocal mode classifier (speech/singing) that works on audio data from an arbitrary data source. However, previous studies on sound classification are commonly based on cross-validation using a single dataset, without considering the cases that training and testing data are recorded in mismatched condition. Experiments revealed a big difference between homogeneous recognition scenario and heterogeneous recognition scenario, using a new dataset TUT-vocal-2016. In the homogeneous recognition scenario, the classification accuracy using cross-validation on TUT-vocal-2016 was 95.5%. In heterogeneous recognition scenario, seven existing datasets were used as training material and TUT-vocal-2016 was used for testing, the classification accuracy was only 69.6%. Several feature normalization methods were tested to improve the performance in heterogeneous recognition scenario. The best performance (96.8%) was obtained using the proposed subdataset-wise normalization.

Keywords

  • sound classification, vocal mode, heterogeneous data sources, feature normalization

Publication forum classification

Field of science, Statistics Finland

Downloads statistics

No data available