Tampere University of Technology

TUTCRIS Research Portal

Cycle-consistent Adversarial Networks for Non-parallel Vocal Effort Based Speaking Style Conversion

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Details

Original languageEnglish
Title of host publicationICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages6835-6839
Number of pages5
ISBN (Electronic)978-1-4799-8131-1
ISBN (Print)978-1-4799-8132-8
DOIs
Publication statusPublished - 17 Apr 2019
Publication typeA4 Article in a conference publication
EventIEEE International Conference on Acoustics, Speech and Signal Processing -
Duration: 1 Jan 19001 Jan 2000

Publication series

NameIEEE International Conference on Acoustics, Speech and Signal Processing
ISSN (Print)1520-6149
ISSN (Electronic)2379-190X

Conference

ConferenceIEEE International Conference on Acoustics, Speech and Signal Processing
Period1/01/001/01/00

Abstract

Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterances in the source style to the corresponding features of target speech. Finally, the mapped features are converted to a Lombard speech waveform with the PML. The CycleGAN was compared in subjective listening tests with 2 other standard mapping methods used in conversion, and the CycleGAN was found to have the best performance in terms of speech quality and in terms of the magnitude of the perceptual change between the two styles.

Keywords

  • feature extraction, speech processing, vocoders, cycle-consistent adversarial networks, nonparallel vocal effort, CycleGAN, PML, Lombard speech waveform, natural speech signal conversion, standard mapping methods, SSC technology, speaking style conversion technology, pulse model in log domain vocoder, speech feature extraction, Vocoders, Feature extraction, Training, Standards, Speech, Speech processing, style conversion, vocal effort, Lom-bard speech, pulse-model in log domain vocoder

Publication forum classification

Field of science, Statistics Finland