TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Otsikko2016 International Joint Conference on Neural Networks (IJCNN)
KustantajaIEEE
ISBN (elektroninen)978-1-5090-0620-5
DOI - pysyväislinkit
TilaJulkaistu - 3 marraskuuta 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaINTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS -
Kesto: 1 tammikuuta 1900 → …

Julkaisusarja

Nimi
ISSN (elektroninen)2161-4407

Conference

ConferenceINTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS
Ajanjakso1/01/00 → …

Tiivistelmä

Deep learning techniques such as deep feedforward neural networks and deep convolutional neural networks have recently been shown to improve the performance in sound event detection compared to traditional methods such as Gaussian mixture models. One of the key factors of this improvement is the capability of deep architectures to automatically learn higher levels of acoustic features in each layer. In this work, we aim to combine the feature learning capabilities of deep architectures with the empirical knowledge of human perception. We use the first layer of a deep neural network to learn a mapping from a high-resolution magnitude spectrum to smaller amount of frequency bands, which effectively learns a filterbank for the sound event detection task. We initialize the first hidden layer weights to match with the perceptually motivated mel filterbank magnitude response. We also integrate this initialization scheme with context windowing by using an appropriately constrained deep convolutional neural network. The proposed method does not only result with better detection accuracy, but also provides insight on the frequencies deemed essential for better discrimination of given sound events.

Latausten tilastot

Ei tietoja saatavilla