Tampere University of Technology

TUTCRIS Research Portal

Multilingual text-to-speech system for mobile devices: development and applications

Research output: Collection of articlesDoctoral Thesis


Original languageEnglish
Place of PublicationTampere
PublisherTampere University of Technology
Number of pages71
ISBN (Electronic)978-952-15-1751-8
ISBN (Print)978-952-1763-1
StatePublished - 27 Apr 2007
Publication typeG5 Doctoral dissertation (article)

Publication series

NameTampere University of Technology. Publication
PublisherTampere University of Technology
ISSN (Print)1459-2045


A multilingual text-to-speech system is different from a collection of language-specific synthesizers in the sense that it applies the same procedures and techniques to all languages it supports. Ideally, all language specific information and data should be stored in data tables and structures, and all algorithms should be shared by all languages. However, in practice this is relatively hard to achieve since many languages have quite different requirements for synthesis techniques and e.g. for text analysis and it is not straightforward to extend a method that is suitable for one language to new languages. Therefore, multilinguality in text-to-speech presents a number of technical challenges when designing a new system. One of the main problems in the current text-to-speech systems is the time consuming internationalization process of the synthesis technology. Development requires knowledge about the human speech production and about the languages being developed. The development, implementation and integration work of a fully functional system requires multidisciplinary skills, such as signal processing, language processing and phonetics as well as software programming. Therefore, it is important to separate the language creation from the actual speech synthesis engine development. This thesis presents methods and techniques to improve the language development process in a speech synthesis system. The main idea is to separate the language independent synthesis engine and the language specific data and also provide a framework and tools, including an integrated development environment that can be used to ease the language creation process. In multilingual text-to-speech, common algorithms and techniques should be applicable for multiple languages. A multilingual rule-based number expansion framework is proposed in the thesis. The framework is also extended to cover additional text normalization tasks. The thesis also presents a text-to-speech framework that has been successfully localized for over 40 languages. The system consists of a language independent synthesizer, a rule interpreter and a data configurable prosody model and language specific data that is used to control the speech synthesis. The introduced text-to-speech system is especially suitable for devices having limited memory resources, such as mobile phones. The size of the synthesizer increases every time a new language is added. Furthermore, the most memory intensive parts of the whole text-to-speech system are the ones which contain the language specific information. Such components are for example, lexicons and, in the case of concatenative speech synthesis, speech databases. For devices having limited memory resources, support for multiple languages is a major design and implementation challenge. The thesis presents a novel technique to reduce memory consumption by using an existing synthesis language to approximate a new language on a phonetic level. The presented technique can also be useful if the language portfolio has to be rapidly increased. The last part of this thesis discusses the application of a text-to-speech system as part of the voice user interface. Moreover, the role of the automatic speech recognition system in some applications is also briefly covered. A preliminary usability study and evaluation of using a concatenative text-to-speech system to read text messages is presented. The synthesis quality of the system is found to be suitable for reading text messages. Furthermore, text-to-speech can be especially useful in situations where eyes-free operation of the device is needed.

Open access publication

Country of publishing

Publication forum classification