Tampere University of Technology

TUTCRIS Research Portal

Subjective responses to synthesised speech with lexical emotional content: The effect of the naturalness of the synthetic voice

Research output: Contribution to journalReview ArticleScientificpeer-review

Details

Original languageEnglish
Pages (from-to)117-131
Number of pages15
JournalBehaviour and Information Technology
Volume32
Issue number2
DOIs
Publication statusPublished - 1 Feb 2013
Publication typeA2 Review article in a scientific journal

Abstract

This study aimed to investigate how the degree of naturalness and lexical emotional content of synthesised speech affects the subjective ratings of emotional experiences and how the naturalness of the voice affects the ratings of voice quality. Twenty-four participants listened to a set of affective words produced by three different speech synthesis techniques: formant synthesis, diphone synthesis and unit selection synthesis. The participants task was to rate their experiences evoked by the speech samples using three emotion-related bipolar scales for valence, arousal and approachability. The pleasantness, naturalness and clarity of the voices were also rated. The results showed that the affective words produced by the synthesisers evoked congruent emotion-related ratings in the participants. The ratings of the experienced valence and approachability were statistically significantly stronger when the affective words were produced by the more humanlike voices as compared to the more machinelike voice. The more humanlike voices were also rated as statistically significantly more natural, pleasant and clear than the less humanlike voice. Thus, our findings suggest that even machinelike voices can be used to communicate affective messages but that increasing the level of naturalness enhances positive feelings about synthetic voices and strengthens emotional communication between computers and humans.