Home > Published Issues > 2014 > Volume 3, No. 2, April 2014 >

A COMPREHENSIVE REVIEW ON CONCATENATION BASED TEXT TO SPEECH SYNTHESIS FOR INDIAN LANGUAGE

Arun Kumar C and Shreekanth T
Department of E&C, SJCE, Mysore, Karnataka, India.

Abstract—The goal of Text-To-Speech (TTS) synthesis system is to convert an arbitrary input text to intelligible and natural sounding speech so as to transmit information from a machine to a person. In the present world of human computer interaction the visually impaired community in India and other developing countries are deprived of technologies that could help them to communicate with the sighted world. In this view many Text-To-Speech (TTS) systems have been developed. This review traces the earlier works on the development of TTS system using Concatenation based speech synthesis system. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech samples of different unit length. The quality of synthesized speech obtained from approximate matching of syllables and direct waveform concatenation will be of better quality and natural, when compared to Pitch Synchronous Overlap and Add (TDPSOLA) and Harmonic plus Noise Model (HNM) technique.

Index Terms—Speech synthesis, Concatenative synthesis, Dynamic Time Wrapping (DTW), Frequency Domain (FD), Harmonic plus Noise Model (HNM), Mean Opinion Score (MOS), Pitch Synchronous Overlap and Add (PSOLA), Time Domain (TD)

Cite: Arun Kumar C and Shreekanth T, "A COMPREHENSIVE REVIEW ON CONCATENATION BASED TEXT TO SPEECH SYNTHESIS FOR INDIAN LANGUAGE," International Journal of Electrical and Electronic Engineering & Telecommunications, Vol. 3, No. 2, pp. 17-25, April 2014.