Tampere University of Technology

TUTCRIS Research Portal

City Classification from Multiple Real-World Sound Scenes

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Standard

City Classification from Multiple Real-World Sound Scenes. / Bear, Helen L.; Heittola, Toni; Mesaros, Annamaria; Benetos, Emmanouil; Virtanen, Tuomas.

2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019. p. 11-15 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Bear, HL, Heittola, T, Mesaros, A, Benetos, E & Virtanen, T 2019, City Classification from Multiple Real-World Sound Scenes. in 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, pp. 11-15, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1/01/00. https://doi.org/10.1109/WASPAA.2019.8937271

APA

Bear, H. L., Heittola, T., Mesaros, A., Benetos, E., & Virtanen, T. (2019). City Classification from Multiple Real-World Sound Scenes. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (pp. 11-15). (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics). IEEE. https://doi.org/10.1109/WASPAA.2019.8937271

Vancouver

Bear HL, Heittola T, Mesaros A, Benetos E, Virtanen T. City Classification from Multiple Real-World Sound Scenes. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE. 2019. p. 11-15. (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics). https://doi.org/10.1109/WASPAA.2019.8937271

Author

Bear, Helen L. ; Heittola, Toni ; Mesaros, Annamaria ; Benetos, Emmanouil ; Virtanen, Tuomas. / City Classification from Multiple Real-World Sound Scenes. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019. pp. 11-15 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics).

Bibtex - Download

@inproceedings{436172fb9c8a4228adebf56cd668cf82,
title = "City Classification from Multiple Real-World Sound Scenes",
abstract = "The majority of sound scene analysis work focuses on one of two clearly defined tasks: acoustic scene classification or sound event detection. Whilst this separation of tasks is useful for problem definition, they inherently ignore some subtleties of the real-world, in particular how humans vary in how they describe a scene. Some will describe the weather and features within it, others will use a holistic descriptor like ‘park’, and others still will use unique identifiers such as cities or names. In this paper, we undertake the task of automatic city classification to ask whether we can recognize a city from a set of sound scenesƒ In this problem each city has recordings from multiple scenes. We test a series of methods for this novel task and show that a simple convolutional neural network (CNN) can achieve accuracy of 50{\%}. This is less than the acoustic scene classification task baseline in the DCASE 2018 ASC challenge on the same data. A simple adaptation to the class labels of pairing city labels with grouped scenes, accuracy increases to 52{\%}, closer to the simpler scene classification task. Finally we also formulate the problem in a multi-task learning framework and achieve an accuracy of 56{\%}, outperforming the aforementioned approaches.",
keywords = "Acoustic scene classification, location identification, city classification, computational sound scene analysis",
author = "Bear, {Helen L.} and Toni Heittola and Annamaria Mesaros and Emmanouil Benetos and Tuomas Virtanen",
year = "2019",
month = "10",
doi = "10.1109/WASPAA.2019.8937271",
language = "English",
isbn = "978-1-7281-1124-7",
series = "IEEE Workshop on Applications of Signal Processing to Audio and Acoustics",
publisher = "IEEE",
pages = "11--15",
booktitle = "2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)",

}

RIS (suitable for import to EndNote) - Download

TY - GEN

T1 - City Classification from Multiple Real-World Sound Scenes

AU - Bear, Helen L.

AU - Heittola, Toni

AU - Mesaros, Annamaria

AU - Benetos, Emmanouil

AU - Virtanen, Tuomas

PY - 2019/10

Y1 - 2019/10

N2 - The majority of sound scene analysis work focuses on one of two clearly defined tasks: acoustic scene classification or sound event detection. Whilst this separation of tasks is useful for problem definition, they inherently ignore some subtleties of the real-world, in particular how humans vary in how they describe a scene. Some will describe the weather and features within it, others will use a holistic descriptor like ‘park’, and others still will use unique identifiers such as cities or names. In this paper, we undertake the task of automatic city classification to ask whether we can recognize a city from a set of sound scenesƒ In this problem each city has recordings from multiple scenes. We test a series of methods for this novel task and show that a simple convolutional neural network (CNN) can achieve accuracy of 50%. This is less than the acoustic scene classification task baseline in the DCASE 2018 ASC challenge on the same data. A simple adaptation to the class labels of pairing city labels with grouped scenes, accuracy increases to 52%, closer to the simpler scene classification task. Finally we also formulate the problem in a multi-task learning framework and achieve an accuracy of 56%, outperforming the aforementioned approaches.

AB - The majority of sound scene analysis work focuses on one of two clearly defined tasks: acoustic scene classification or sound event detection. Whilst this separation of tasks is useful for problem definition, they inherently ignore some subtleties of the real-world, in particular how humans vary in how they describe a scene. Some will describe the weather and features within it, others will use a holistic descriptor like ‘park’, and others still will use unique identifiers such as cities or names. In this paper, we undertake the task of automatic city classification to ask whether we can recognize a city from a set of sound scenesƒ In this problem each city has recordings from multiple scenes. We test a series of methods for this novel task and show that a simple convolutional neural network (CNN) can achieve accuracy of 50%. This is less than the acoustic scene classification task baseline in the DCASE 2018 ASC challenge on the same data. A simple adaptation to the class labels of pairing city labels with grouped scenes, accuracy increases to 52%, closer to the simpler scene classification task. Finally we also formulate the problem in a multi-task learning framework and achieve an accuracy of 56%, outperforming the aforementioned approaches.

KW - Acoustic scene classification

KW - location identification

KW - city classification

KW - computational sound scene analysis

U2 - 10.1109/WASPAA.2019.8937271

DO - 10.1109/WASPAA.2019.8937271

M3 - Conference contribution

SN - 978-1-7281-1124-7

T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

SP - 11

EP - 15

BT - 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

PB - IEEE

ER -