VOIP.com - Internet Phone Service

voip.com Internet Phone Service

VoIP > Blog

Voip Blog

How Will Text To Speech Voice Sound Over VoIP?

Friday, April 04, 2008

You may have seen this video floating around the Intertubes:

While it might sound real, I can certainly tell it's not real. It's a text-to-speech system by the folks at IVONA. It is certainly better than a lot of the text-to-speech stuff I've heard over the years, but I wouldn't have a hard time telling it apart from a human.

One obvious application for this text to speed technology is interactive voice response systems. You know, those systems that you run into when you call a corporation's 800 number. Even voip.com has one of these systems answering their 800 number.

Right now, what happens is that a person has to record the prompts. They may have an in-house person speak the prompts or they hire a professional such as Allison Smith to do it. Either way, it costs time and or money.

Imagine an IP-based PBX where you could simply type in what you want the PBX to say. It would speak it for you and it would sound close enough to real for most people. No more paying for a professional or spending hours getting your voice prompts right. Just type it in, and you're done.

With this hypothetical IP PBX, calls would come in over a number of methods--including IP. Depending on what codecs are used for that particular call, the voice may sound perfectly natural, or it may sound like crap.

The thing is, to transmit your voice over an IP protocol, your voice must be sampled. The codecs often leave something to be desired--either making the voice more metallic-sounding or worse. If the voice isn't perfect, putting it through a VoIP call is going to amplify any imperfections.

To give a real-life example of this, when I call home on my cell phone and talk to my young daughter, I have a difficult time understanding her on the phone, even though I can understand her fine in person. While a mobile phone doesn't necessarily imply VoIP, the same principle of codecs applies since an analog voice must be converted into something digital so it can be transmitted over the mobile networks. The process of encoding and decoding the voice--sometimes multiple times--degrades the voice quality.

How is this stuff going to sound over VoIP, hard to tell. If anyone can do any real-world testing of this, leave some feedback.