TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Zero-Shot Audio Classification Based On Class Label Embeddings

Tutkimustuotosvertaisarvioitu

Standard

Zero-Shot Audio Classification Based On Class Label Embeddings. / Xie, Huang; Virtanen, Tuomas.

2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019. s. 264-267 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics).

Tutkimustuotosvertaisarvioitu

Harvard

Xie, H & Virtanen, T 2019, Zero-Shot Audio Classification Based On Class Label Embeddings. julkaisussa 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, Sivut 264-267, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1/01/00. https://doi.org/10.1109/WASPAA.2019.8937283

APA

Xie, H., & Virtanen, T. (2019). Zero-Shot Audio Classification Based On Class Label Embeddings. teoksessa 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) (Sivut 264-267). (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics). IEEE. https://doi.org/10.1109/WASPAA.2019.8937283

Vancouver

Xie H, Virtanen T. Zero-Shot Audio Classification Based On Class Label Embeddings. julkaisussa 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE. 2019. s. 264-267. (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics). https://doi.org/10.1109/WASPAA.2019.8937283

Author

Xie, Huang ; Virtanen, Tuomas. / Zero-Shot Audio Classification Based On Class Label Embeddings. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019. Sivut 264-267 (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics).

Bibtex - Lataa

@inproceedings{8245be2fb8bd4eceb4748a151c033861,
title = "Zero-Shot Audio Classification Based On Class Label Embeddings",
abstract = "This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 {\%} on average) better than random guess (10 {\%}) on each audio category. Particularly, it reaches up to 39.7 {\%} for the category of natural audio classes.",
keywords = "zero-shot learning, audio classification, class label embedding",
author = "Huang Xie and Tuomas Virtanen",
note = "INT=COMP, {"}Xie, Huang{"}",
year = "2019",
month = "10",
doi = "10.1109/WASPAA.2019.8937283",
language = "English",
isbn = "978-1-7281-1124-7",
series = "IEEE Workshop on Applications of Signal Processing to Audio and Acoustics",
publisher = "IEEE",
pages = "264--267",
booktitle = "2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)",

}

RIS (suitable for import to EndNote) - Lataa

TY - GEN

T1 - Zero-Shot Audio Classification Based On Class Label Embeddings

AU - Xie, Huang

AU - Virtanen, Tuomas

N1 - INT=COMP, "Xie, Huang"

PY - 2019/10

Y1 - 2019/10

N2 - This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes.

AB - This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zero-shot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes.

KW - zero-shot learning

KW - audio classification

KW - class label embedding

U2 - 10.1109/WASPAA.2019.8937283

DO - 10.1109/WASPAA.2019.8937283

M3 - Conference contribution

SN - 978-1-7281-1124-7

T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

SP - 264

EP - 267

BT - 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

PB - IEEE

ER -