Tampere University of Technology

TUTCRIS Research Portal

Sound Event Detection in the DCASE 2017 Challenge

Research output: Contribution to journalArticleScientificpeer-review

Standard

Sound Event Detection in the DCASE 2017 Challenge. / Mesaros, Annamaria; Diment, Aleksandr; Elizalde, Benjamin; Heittola, Toni; Vincent, Emmanuel; Raj, Bhiksha; Virtanen, Tuomas.

In: IEEE/ACM Transactions on Audio Speech and Language Processing, Vol. 27, No. 6, 01.06.2019, p. 992-1006.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

Mesaros, A, Diment, A, Elizalde, B, Heittola, T, Vincent, E, Raj, B & Virtanen, T 2019, 'Sound Event Detection in the DCASE 2017 Challenge', IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 27, no. 6, pp. 992-1006. https://doi.org/10.1109/TASLP.2019.2907016

APA

Mesaros, A., Diment, A., Elizalde, B., Heittola, T., Vincent, E., Raj, B., & Virtanen, T. (2019). Sound Event Detection in the DCASE 2017 Challenge. IEEE/ACM Transactions on Audio Speech and Language Processing, 27(6), 992-1006. https://doi.org/10.1109/TASLP.2019.2907016

Vancouver

Mesaros A, Diment A, Elizalde B, Heittola T, Vincent E, Raj B et al. Sound Event Detection in the DCASE 2017 Challenge. IEEE/ACM Transactions on Audio Speech and Language Processing. 2019 Jun 1;27(6):992-1006. https://doi.org/10.1109/TASLP.2019.2907016

Author

Mesaros, Annamaria ; Diment, Aleksandr ; Elizalde, Benjamin ; Heittola, Toni ; Vincent, Emmanuel ; Raj, Bhiksha ; Virtanen, Tuomas. / Sound Event Detection in the DCASE 2017 Challenge. In: IEEE/ACM Transactions on Audio Speech and Language Processing. 2019 ; Vol. 27, No. 6. pp. 992-1006.

Bibtex - Download

@article{7e067a017bf547369090f231c3137df3,
title = "Sound Event Detection in the DCASE 2017 Challenge",
abstract = "Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly labeled data were available for training. In this paper, we present three tasks, including the datasets and baseline systems, and analyze the challenge entries for each task. We observe the popularity of methods using deep neural networks, and the still widely used mel frequency-based representations, with only few approaches standing out as radically different. Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance. We also introduce the calculation of confidence intervals based on a jackknife resampling procedure to perform statistical analysis of the challenge results. The analysis indicates that while the 95{\%} confidence intervals for many systems overlap, there are significant differences in performance between the top systems and the baseline for all tasks.",
keywords = "confidence intervals, jackknife estimates, pattern recognition, Sound event detection, weak labels",
author = "Annamaria Mesaros and Aleksandr Diment and Benjamin Elizalde and Toni Heittola and Emmanuel Vincent and Bhiksha Raj and Tuomas Virtanen",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/TASLP.2019.2907016",
language = "English",
volume = "27",
pages = "992--1006",
journal = "Ieee-Acm transactions on audio speech and language processing",
issn = "2329-9290",
publisher = "IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC",
number = "6",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Sound Event Detection in the DCASE 2017 Challenge

AU - Mesaros, Annamaria

AU - Diment, Aleksandr

AU - Elizalde, Benjamin

AU - Heittola, Toni

AU - Vincent, Emmanuel

AU - Raj, Bhiksha

AU - Virtanen, Tuomas

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly labeled data were available for training. In this paper, we present three tasks, including the datasets and baseline systems, and analyze the challenge entries for each task. We observe the popularity of methods using deep neural networks, and the still widely used mel frequency-based representations, with only few approaches standing out as radically different. Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance. We also introduce the calculation of confidence intervals based on a jackknife resampling procedure to perform statistical analysis of the challenge results. The analysis indicates that while the 95% confidence intervals for many systems overlap, there are significant differences in performance between the top systems and the baseline for all tasks.

AB - Each edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) contained several tasks involving sound event detection in different setups. DCASE 2017 presented participants with three such tasks, each having specific datasets and detection requirements: Task 2, in which target sound events were very rare in both training and testing data, Task 3 having overlapping events annotated in real-life audio, and Task 4, in which only weakly labeled data were available for training. In this paper, we present three tasks, including the datasets and baseline systems, and analyze the challenge entries for each task. We observe the popularity of methods using deep neural networks, and the still widely used mel frequency-based representations, with only few approaches standing out as radically different. Analysis of the systems behavior reveals that task-specific optimization has a big role in producing good performance; however, often this optimization closely follows the ranking metric, and its maximization/minimization does not result in universally good performance. We also introduce the calculation of confidence intervals based on a jackknife resampling procedure to perform statistical analysis of the challenge results. The analysis indicates that while the 95% confidence intervals for many systems overlap, there are significant differences in performance between the top systems and the baseline for all tasks.

KW - confidence intervals

KW - jackknife estimates

KW - pattern recognition

KW - Sound event detection

KW - weak labels

U2 - 10.1109/TASLP.2019.2907016

DO - 10.1109/TASLP.2019.2907016

M3 - Article

VL - 27

SP - 992

EP - 1006

JO - Ieee-Acm transactions on audio speech and language processing

JF - Ieee-Acm transactions on audio speech and language processing

SN - 2329-9290

IS - 6

ER -