TUTCRIS - Tampereen teknillinen yliopisto


Subjective responses to synthesised speech with lexical emotional content: The effect of the naturalness of the synthetic voice



JulkaisuBehaviour and Information Technology
DOI - pysyväislinkit
TilaJulkaistu - 1 helmikuuta 2013
OKM-julkaisutyyppiA2 Katsausartikkeli


This study aimed to investigate how the degree of naturalness and lexical emotional content of synthesised speech affects the subjective ratings of emotional experiences and how the naturalness of the voice affects the ratings of voice quality. Twenty-four participants listened to a set of affective words produced by three different speech synthesis techniques: formant synthesis, diphone synthesis and unit selection synthesis. The participants task was to rate their experiences evoked by the speech samples using three emotion-related bipolar scales for valence, arousal and approachability. The pleasantness, naturalness and clarity of the voices were also rated. The results showed that the affective words produced by the synthesisers evoked congruent emotion-related ratings in the participants. The ratings of the experienced valence and approachability were statistically significantly stronger when the affective words were produced by the more humanlike voices as compared to the more machinelike voice. The more humanlike voices were also rated as statistically significantly more natural, pleasant and clear than the less humanlike voice. Thus, our findings suggest that even machinelike voices can be used to communicate affective messages but that increasing the level of naturalness enhances positive feelings about synthetic voices and strengthens emotional communication between computers and humans.