Tampere University of Technology

TUTCRIS Research Portal

Speech Detection on Broadcast Audio

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Details

Original languageEnglish
Title of host publication18TH European Signal Processing Conference (EUSIPCO-2010)
EditorsB Kleijn, J Larsen
Place of PublicationKESSARIANI
PublisherEUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP
Pages85-89
Number of pages5
Publication statusPublished - 2010
Externally publishedYes
Publication typeA4 Article in a conference publication
Event18th European Signal Processing Conference (EUSIPCO) - Aalborg, Denmark
Duration: 23 Aug 201027 Aug 2010

Publication series

NameEuropean Signal Processing Conference
PublisherEUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP
Volume18
ISSN (Print)2076-1465

Conference

Conference18th European Signal Processing Conference (EUSIPCO)
CountryDenmark
Period23/08/1027/08/10

Abstract

Speech boundary detection contributes to performance of speech based applications such as speech recognition and speaker recognition. Speech boundary detector implemented in this study works on broadcast audio as a pre-processor module of a keyword spotter. Speech boundary detection is handled in 3 steps. At first step, audio data is segmented into homogeneous regions in an unsupervised manner. After an ACTIVITY/NON-ACTIVITY decision is made for each region, ACTIVITY regions are classified as Speech/Non-speech via Gaussian Mixture Model (GMM) based classification. GMM's are trained using a novel feature, Spectral Flow Direction (SFD), and an improved multi-band harmonicity feature in addition to widely used Mel Frequency Cepstral Coefficients (MFCC's).

Keywords

  • CLASSIFICATION, RETRIEVAL, MUSIC