next up previous
Next: Evaluation Up: The Voice Previous: Metathesis

Voice-building Process

The voice development process for Cepstral's Thetasynthesizer is based on Festival, which is well documented through the Festvox project [1]. A set of sentences which cover the phonetic, prosodic, and lexical space of the language is selected from a corpus of naturally-occurring text or transcribed spoken language. For a task such as this in which something is known of the target domain, the sentences are selected from domain-relevant text, but general-domain language such as greetings must also be included.

Initial phoneme labels are generated automatically by building database-specific acoustic models using the CMU SphinxTrain package and then forced aligning with the CMU Sphinx recognition system. The labels are then manually corrected. The manual correction in this voice was not done by native speakers of Arabic. The labeling team, while highly experienced in phonetic annotation, had no knowledge of Arabic beyond a basic introduction to the writing system and phoneme inventory. They did have access to native speakers for questions, but in most cases had very little difficulty defining boundaries and identifying speech errors and errors in the autolabeling. Most of the problems referred to native speakers involved labeling of the uvular fricative `ayn and incorrect transcription of doubled consonants and vowel length.

After labels have been hand-corrected, the voice can be built and evaluated.


next up previous
Next: Evaluation Up: The Voice Previous: Metathesis
Alan W Black 2003-10-27