Building Text to Speech Applications Using Asterisk Open Source VoIP Platform

Introduction

Asterisk is a software implementation of a telephone private branch exchange (PBX); it allows attached telephones to make calls to one another, and to connect to other telephone services, such as the public switched telephone network (PSTN) and Voice over Internet Protocol (VoIP) services.

Features

The Asterisk software includes many features available in proprietary PBX systems: voice mail, conference calling, interactive voice response (phone menus), and automatic call distribution. Users can create new functionality by writing dial plan scripts in several of Asterisk’s own extensions languages, by adding custom loadable modules written in C, or by implementing Asterisk Gateway Interface (AGI) programs using any programming language capable of communicating via the standard streams system (stdin and stdout) or by network TCP sockets.

Asterisk supports several standard voice over IP protocols, including the Session Initiation Protocol (SIP), the Media Gateway Control Protocol (MGCP), and H.323. Asterisk supports most SIP telephones, acting both as registrar and back-to-back user agent, and can serve as a gateway between IP phones and the public switched telephone network (PSTN) via T- or E-carrier interfaces or analog FXO cards. The Inter-Asterisk eXchange (IAX) protocol, RFC 5456, native to Asterisk, provides efficient trunking of calls among Asterisk PBXes, in addition to distributing some configuration logic. Many VoIP service providers support it for call completion into the PSTN, often because it is easy to interface.

Text-to-Speech Utilities

Text-to-speech utilities are used to convert strings of words into audio that can be played to your callers. Text-to-speech has been around for many years, and has been continually improving. While we can’t recommend text-to-speech utilities to take the place of professionally recorded prompts, they do offer some degree of usefulness in applications where dynamic data needs to be communicated to a caller.

Festival

Festival is one of the oldest running applications for text-to-speech on Linux. While the quality of Festival is not sufficient for us to recommend it for production use, it is certainly a useful way of testing a text-to-speech-based application. If a more polished sound is required for your application, we recommend you look at Cepstral (covered next).

Installing Festival on CentOS

Installing Festival and its dependencies on CentOS is straightforward. Simply use yum to install the festival package:

$ sudo yum install festival

Installing Festival on Ubuntu

To install Festival and its dependencies on Ubuntu, simply use apt-get to install the festival package:

$ sudo apt-get install festival

Using Festival with Asterisk

With Festival installed, we need to modify the festival.scm file in order to enable Asterisk to connect to the Festival server. On both CentOS and Ubuntu, the file is located in /usr/share/festival/. Open the file and place the following text just above the last line, (provide ‘festival):

(define (tts_textasterisk string mode)

“(tts_textasterisk STRING MODE)

Apply tts to STRING. This function is specifically designed for

use in server mode so a single function call may synthesize the string.

This function name may be added to the server safe functions.”

(let ((wholeutt (utt.synth (eval (list ‘Utterance ‘Text string)))))

(utt.wave.resample wholeutt 8000)

(utt.wave.rescale wholeutt 5)

(utt.send.wave.client wholeutt)))

After adding that, you need to start the Festival server:

$ sudo festival_server 2>&1 > /dev/null &

Using menuselect from your Asterisk source directory, verify that the app_festival application has been selected under the Applications heading. If it was not already selected, be sure to run make install after selecting it to install the Festival() dialplan application.

Before you can use the Festival() application, you need to tell Asterisk how to connect to the Festival server. The festival.conf file is used for controlling how Asterisk connects to and interacts with the Festival server. The sample festival.conf file located in the Asterisk source directory is a good place to start, so copy festival.conf.sample from the configs/ subdirectory of your Asterisk source to the /etc/asterisk/ configuration directory now:

$ cp ~/asterisk-complete/asterisk/1.8/configs/festival.conf.sample \

/etc/asterisk/festival.conf

The default configuration is typically enough to connect to the Festival server running on the local machine, but you can optionally configure parameters such as the host where the Festival server is running (if remote), the port to connect to, whether to enable caching of files (defaults to no), the location of the cache directory (defaults to /tmp), and the command Asterisk passes to the Festival server.

You can verify that the Festival() dialplan application is accessible by running core show application festival from the Asterisk console:

*CLI> core show application festival

If you don’t get output, you may need to load the app_festival.so module:

*CLI> module load app_festival.so

Verify that the app_festival.so module exists in /usr/lib/asterisk/modules/ if you’re still having issues with loading the module.

After loading the Festival() application into Asterisk, you need to create a test dialplan extension to verify that Festival() is working:

[LocalSets]

exten => 203,1,Verbose(2,This is a Festival test)

same => n,Answer()

same => n,Playback(silence/1)

same => n,Festival(Hello World)

same => n,Hangup()

Reload the dialplan with the dialplan reload command from the Asterisk console, and test out the connection to Festival by dialing extension 203.

Alternatively, if you’re having issues with the Festival server, you could use the following method to generate files with the text2wave application supplied with the festival package:

exten => 202,1,Verbose(2,Trying out Festival)

same => n,Answer()

; *** This line should not have any line breaks

same => n,System(echo “This is a test of Festival”

| /usr/bin/text2wave -scale 1.5 -F 8000 -o /tmp/festival.wav)

same => n,Playback(/tmp/festival)

same => n,System(rm -f /tmp/festival.wav)

same => n,Hangup()

You should now have enough to get started with generating text-to-speech audio for your Asterisk system. The audio quality is not brilliant, and the speech generated is not clear enough to be easy to understand over a telephone, but for development and testing purposes Festival is an application that can fill the gap until you’re ready for a more professional-sounding text-to-speech generator such as Cepstral.

Cepstral

Cepstral is a text-to-speech engine that works in a similar manner as the Festival() application in the dialplan, but produces much higher-quality sound. Not only is the quality significantly better, but Cepstral has developed a text-to-speech engine that emulates Allison’s voice, so your text-to-speech engine can sound the same as the English sound files that ship with Asterisk by default, to give a consistent experience to the caller.

Cepstral is commercial module, but for around $30 you can have a text-to-speech engine that is clearer, is more consistent with other sound prompts on your system, and provides a more pleasurable experience for your callers. The Cepstral software and installation instructions can be downloaded from the Digium.com webstore.

I hope using the above guidelines, you can use Festival or Cepstral in your Asterisk Set up.