Text-to-speech (TTS) synthesis is the oldest speech technology, originating from as early as the 18th century, when first "speaking machines" appeared. Meanwhile, this area has developed tremendously, mostly due to advances in computer technology during the last decades. Development of this technology is a multidisciplinary problem whose solution requires knowledge from a range of fields such as acoustics, phonetics and linguistics, as well as mathematics, telecommunications and signal processing.

This is the speech technology with language dependency at its highest, and solutions developed for one language cannot be used for others. An adaptation is possible (but still painstaking) only in case of extremely similar languages.

The aim of speech synthesis is to generate intelligible speech based on textual input. The intelligibility implies a certain level of naturalness, achieved by manipulating lexical and sentence intonation as well as phonetic content, in much the same way the humans do it. Naturalness of synthesized speech is not only a matter of aesthetics; it is an important element that makes synthesized speech easier to understand, helping the listener to separate it into words.

Applications of speech synthesisers are numerous. In computer telephony such systems are an indispensable tool for providing information to a large number of callers, especially when it comes to large quantities of information that changes frequently (e.g. newly received e-mails), making it impractical or impossible to engage a voice talent to read it.

Speech synthesisers are extremely helpful to the disabled. This particularly applies to the speech impaired, who can use them to engage in conversations – even by telephone, as well as the visually impaired, who are given the ability to independently use PCs and thus engage in everyday life more easily.

Read more about the AlfaNum software component for speech synthesis.

 

Laptop talking to megaphone