TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Visual Voice Activity Detection in the Wild

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Sivut967-977
Sivumäärä11
JulkaisuIEEE Transactions on Multimedia
Vuosikerta18
Numero6
DOI - pysyväislinkit
TilaJulkaistu - 1 kesäkuuta 2016
OKM-julkaisutyyppiA1 Alkuperäisartikkeli

Tiivistelmä

The visual voice activity detection (V-VAD) problem in unconstrained environments is investigated in this paper. A novel method for V-VAD in the wild, exploiting local shape and motion information appearing at spatiotemporal locations of interest for facial video segment description and the bag of words model for facial video segment representation, is proposed. Facial video segment classification is subsequently performed using the state-of-The-Art classification algorithms. Experimental results on one publicly available V-VAD dataset denote the effectiveness of the proposed method, since it achieves better generalization performance in unseen users, when compared to the recently proposed state-of-The-Art methods. Additional results on a new unconstrained dataset provide evidence that the proposed method can be effective even in such cases in which any other existing method fails.