Tampere University of Technology

TUTCRIS Research Portal

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Research output: Contribution to journalArticleScientificpeer-review

Standard

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. / Cakir, Emre; Parascandolo, Giambattista; Heittola, Toni; Huttunen, Heikki; Virtanen, Tuomas.

In: Ieee-Acm transactions on audio speech and language processing, Vol. 25, No. 6, 06.2017, p. 1291-1303.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

Cakir, E, Parascandolo, G, Heittola, T, Huttunen, H & Virtanen, T 2017, 'Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection', Ieee-Acm transactions on audio speech and language processing, vol. 25, no. 6, pp. 1291-1303. https://doi.org/10.1109/TASLP.2017.2690575

APA

Cakir, E., Parascandolo, G., Heittola, T., Huttunen, H., & Virtanen, T. (2017). Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. Ieee-Acm transactions on audio speech and language processing, 25(6), 1291-1303. https://doi.org/10.1109/TASLP.2017.2690575

Vancouver

Cakir E, Parascandolo G, Heittola T, Huttunen H, Virtanen T. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. Ieee-Acm transactions on audio speech and language processing. 2017 Jun;25(6):1291-1303. https://doi.org/10.1109/TASLP.2017.2690575

Author

Cakir, Emre ; Parascandolo, Giambattista ; Heittola, Toni ; Huttunen, Heikki ; Virtanen, Tuomas. / Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. In: Ieee-Acm transactions on audio speech and language processing. 2017 ; Vol. 25, No. 6. pp. 1291-1303.

Bibtex - Download

@article{2b3f984ea3c14fa1b0307f8e979739a6,
title = "Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection",
abstract = "Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNNs) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.",
keywords = "Convolutional neural networks (CNNs), deep neural networks, recurrent neural networks (RNNs), sound event detection, RECOGNITION",
author = "Emre Cakir and Giambattista Parascandolo and Toni Heittola and Heikki Huttunen and Tuomas Virtanen",
year = "2017",
month = "6",
doi = "10.1109/TASLP.2017.2690575",
language = "English",
volume = "25",
pages = "1291--1303",
journal = "Ieee-Acm transactions on audio speech and language processing",
issn = "2329-9290",
publisher = "IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC",
number = "6",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

AU - Cakir, Emre

AU - Parascandolo, Giambattista

AU - Heittola, Toni

AU - Huttunen, Heikki

AU - Virtanen, Tuomas

PY - 2017/6

Y1 - 2017/6

N2 - Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNNs) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

AB - Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNNs) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a convolutional recurrent neural network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

KW - Convolutional neural networks (CNNs)

KW - deep neural networks

KW - recurrent neural networks (RNNs)

KW - sound event detection

KW - RECOGNITION

U2 - 10.1109/TASLP.2017.2690575

DO - 10.1109/TASLP.2017.2690575

M3 - Article

VL - 25

SP - 1291

EP - 1303

JO - Ieee-Acm transactions on audio speech and language processing

JF - Ieee-Acm transactions on audio speech and language processing

SN - 2329-9290

IS - 6

ER -