Tampere University of Technology

TUTCRIS Research Portal

Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review


Original languageEnglish
Title of host publicationProceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)
ISBN (Electronic)978-0-578-59596-2
Publication statusPublished - Oct 2019
Publication typeA4 Article in a conference publication
EventWorkshop on Detection and Classification of Acoustic Scenes and Events - New York, United States
Duration: 25 Oct 201926 Oct 2019


WorkshopWorkshop on Detection and Classification of Acoustic Scenes and Events
Abbreviated titleDCASE
CountryUnited States
CityNew York


This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN). We use a CRNN previously proposed for the localization and detection of stationary sources, and show that the recurrent layers enable the spatial tracking of moving sources when trained with dynamic scenes. The tracking performance of the CRNN is compared with a stand-alone tracking method that combines a multi-source (DOA) estimator and a particle filter. Their respective performance is evaluated in various acoustic conditions such as anechoic and reverberant scenarios, stationary and moving sources at several angular velocities, and with a varying number of overlapping sources. The results show that the CRNN manages to track multiple sources more consistently than the parametric method across acoustic scenarios, but at the cost of higher localization error.

Publication forum classification

Field of science, Statistics Finland