TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Learning vocal mode classifiers from heterogeneous data sources

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Otsikko 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
KustantajaIEEE Computer Society
Sivut16–20
ISBN (painettu)978-1-5386-1631-4
DOI - pysyväislinkit
TilaJulkaistu - 2017
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaIEEE Workshop on Applications of Signal Processing to Audio and Acoustics -
Kesto: 1 tammikuuta 1900 → …

Conference

ConferenceIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Ajanjakso1/01/00 → …

Tiivistelmä

This paper targets on a generalized vocal mode classifier (speech/singing) that works on audio data from an arbitrary data source. However, previous studies on sound classification are commonly based on cross-validation using a single dataset, without considering the cases that training and testing data are recorded in mismatched condition. Experiments revealed a big difference between homogeneous recognition scenario and heterogeneous recognition scenario, using a new dataset TUT-vocal-2016. In the homogeneous recognition scenario, the classification accuracy using cross-validation on TUT-vocal-2016 was 95.5%. In heterogeneous recognition scenario, seven existing datasets were used as training material and TUT-vocal-2016 was used for testing, the classification accuracy was only 69.6%. Several feature normalization methods were tested to improve the performance in heterogeneous recognition scenario. The best performance (96.8%) was obtained using the proposed subdataset-wise normalization.

Tutkimusalat

Julkaisufoorumi-taso

Latausten tilastot

Ei tietoja saatavilla