TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Using sequential information in polyphonic sound event detection

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Otsikko16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
KustantajaIEEE
Sivut291-295
Sivumäärä5
ISBN (elektroninen)9781538681510
DOI - pysyväislinkit
TilaJulkaistu - 2 marraskuuta 2018
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Workshop on Acoustic Signal Enhancement - Tokyo, Japani
Kesto: 17 syyskuuta 201820 syyskuuta 2018

Conference

ConferenceInternational Workshop on Acoustic Signal Enhancement
MaaJapani
KaupunkiTokyo
Ajanjakso17/09/1820/09/18

Tiivistelmä

To detect the class, and start and end times of sound events in real world recordings is a challenging task. Current computer systems often show relatively high frame-wise accuracy but low event-wise accuracy. In this paper, we attempted to merge the gap by explicitly including sequential information to improve the performance of a state-of-the-art polyphonic sound event detection system. We propose to 1) use delayed predictions of event activities as additional input features that are fed back to the neural network; 2) build N-grams to model the co-occurrence probabilities of different events; 3) use se-quentialloss to train neural networks. Our experiments on a corpus of real world recordings show that the N-grams could smooth the spiky output of a state-of-the-art neural network system, and improve both the frame-wise and the event-wise metrics.

Tutkimusalat

Julkaisufoorumi-taso