TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Speech Detection on Broadcast Audio

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Otsikko18TH European Signal Processing Conference (EUSIPCO-2010)
ToimittajatB Kleijn, J Larsen
JulkaisupaikkaKESSARIANI
KustantajaEUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP
Sivut85-89
Sivumäärä5
TilaJulkaistu - 2010
Julkaistu ulkoisestiKyllä
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
Tapahtuma18th European Signal Processing Conference (EUSIPCO) - Aalborg, Tanska
Kesto: 23 elokuuta 201027 elokuuta 2010

Julkaisusarja

NimiEuropean Signal Processing Conference
KustantajaEUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP
Vuosikerta18
ISSN (painettu)2076-1465

Conference

Conference18th European Signal Processing Conference (EUSIPCO)
MaaTanska
Ajanjakso23/08/1027/08/10

Tiivistelmä

Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).

Tutkimusalat