Speech Detection on Broadcast Audio
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Standard
Speech Detection on Broadcast Audio. / Zubari, Unal; Ozan, Ezgi Can; Acar, Banu Oskay; Ciloglu, Tolga; Esen, Ersin; Ates, Tugrul K.; Onur, Duygu Oskay.
18TH European Signal Processing Conference (EUSIPCO-2010). ed. / B Kleijn; J Larsen. KESSARIANI : EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP, 2010. p. 85-89 (European Signal Processing Conference; Vol. 18).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Harvard
APA
Vancouver
Author
Bibtex - Download
}
RIS (suitable for import to EndNote) - Download
TY - GEN
T1 - Speech Detection on Broadcast Audio
AU - Zubari, Unal
AU - Ozan, Ezgi Can
AU - Acar, Banu Oskay
AU - Ciloglu, Tolga
AU - Esen, Ersin
AU - Ates, Tugrul K.
AU - Onur, Duygu Oskay
PY - 2010
Y1 - 2010
N2 - Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).
AB - Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).
KW - CLASSIFICATION
KW - RETRIEVAL
KW - MUSIC
M3 - Conference contribution
T3 - European Signal Processing Conference
SP - 85
EP - 89
BT - 18TH European Signal Processing Conference (EUSIPCO-2010)
A2 - Kleijn, B
A2 - Larsen, J
PB - EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP
CY - KESSARIANI
ER -