IoT Text to Speech (TTS)

Make your Pi say things it doesn’t mean

Overview

Festival
Flite
eSpeak
Amazon Polly
Others
More on IoT

This page describes ways to get your Pi to turn text to speech you hear on a speaker.

Cloud-based services leverage powerful servers to provide the most precise speech synthesis.

But programs can run locally on board the computer.

Programs that run on-board the Pi to output voice include Festival and its derivative Flite.

Festival

Festival is written by The Centre for Speech Technology Research at the University of Edingburgh (UK). It offers a framework for building speech synthesis systems. It offers full text to speech through a number APIs: from shell level, via a command interpreter, as a C++ library, from Java, and an Emacs editor interface.

Festival is multi-lingual (currently British English, American English, and Spanish. Other groups work to release new languages for the system. Festival is in the package manager for the Raspberry Pi making it very easy to install.

Install

sudo apt-get install festival -y

-y avoid confirmation of 19.2 MB disk space usage.
Identify voices from

https://packages.debian.org/jessie/festival-voice

NOTE: A 16khz sample rate is clearer than 8khz, but require more disk space and takes up more CPU.
Install a voice file specifically for processing by Festival on Debian:

sudo apt-get install festvox-rablpc16k -y

This “British English male speaker” voice takes 9 MB.

PROTIP: Only one voice is needed at a time. Save your disk space and keep what is needed, then download another when needed.
Send from command line:

echo “Hello Wilson!” | festival –tts

NOTE: There may be some “electrical” sound behind a robot talking quickly.
Use a Chrome browser to see the documentation and on-line demo:

http://www.cstr.ed.ac.uk/projects/festival

The Firefox browser needs a plug-in to be installed.

Python code to invoke TTS from text in a variable and in a file:

import subprocess
text = '"Hello world"'
subprocess.call('echo '+text+'|festival --tts', shell=True)
 
text = '"You are listening to text to speech synthesis using Festival package from the University Edingburg in the UK."'
filename = 'hello'
file=open(filename,'w')
file.write(text)
file.close()
subprocess.call('festival --tts '+filename, shell=True)

Flite

Flite (from Carnagie Mellon University) is a lighter version of Festival built specifically for embedded systems. It runs faster than Festival because it doesn’t have Festival’s complex scripting language or phoneme handling.

eSpeak

http://espeak.sourceforge.net/

Amazon Polly

https://aws.amazon.com/polly is a service that uses advanced deep learning technologies to synthesize speech across 24 languages. It emits sounds using 47 lifelike human voices.

Type text, select a language and voice, then click to speech at
https://console.aws.amazon.com/polly/home/SynthesizeSpeech

The default English, US has more voices than other languages:

Ivy sounds like a young female girl
Justin sounds like a young male boy
Joey

I like hearing British Amy, who has a more breathy voice than British Emma.

Select English | Indian, and Raveena speaks with an Indian accent.

Edit the SSML to vary the sound or upload the whole lexicon, which can be up to 4,000 characters and 1,000 rules.

What amazed me is that English text is translated before being spoken.

Polly is within Amazon’s Artificial Intelligence services that include Lex (to build chatbots), Rekognition (to ecognize objects and scenes), and Machine Learning.

Payment is by the number of characters converted to speech. Sound files reused do not incur a cost. Sounds can be saved in MP3, OGG (open source), and PCM (high definition) formats (at 8,000, 16,000, and 22,050 Hz).

Use AWS Lambda to generate pre-signed Polly URLs based on events from the AWS IoT rules engine, then use Device Gateway to send these URLs to your IoT devices to allow them to request lifelike speech.

https://portal.aws.amazon.com/gp/aws/developer/registration/index.html

Others

AT&T
IBM Watson

https://dzone.com/articles/integrating-watson-text-to-speech-into-an-android
Google
Microsoft

More on IoT

This is one of a series on IoT:

NOTE: Pages about GE’s Predix have been removed.

Wilson Mar