Make your Pi say things it doesn’t mean
This page describes ways to get your Pi to turn text to speech you hear on a speaker.
Cloud-based services leverage powerful servers to provide the most precise speech synthesis.
But programs can run locally on board the computer.
Programs that run on-board the Pi to output voice include Festival and its derivative Flite.
Festival is written by The Centre for Speech Technology Research at the University of Edingburgh (UK). It offers a framework for building speech synthesis systems. It offers full text to speech through a number APIs: from shell level, via a command interpreter, as a C++ library, from Java, and an Emacs editor interface.
Festival is multi-lingual (currently British English, American English, and Spanish. Other groups work to release new languages for the system. Festival is in the package manager for the Raspberry Pi making it very easy to install.
sudo apt-get install festival -y
-yavoid confirmation of 19.2 MB disk space usage.
Identify voices from
NOTE: A 16khz sample rate is clearer than 8khz, but require more disk space and takes up more CPU.
Install a voice file specifically for processing by Festival on Debian:
sudo apt-get install festvox-rablpc16k -y
This “British English male speaker” voice takes 9 MB.
PROTIP: Only one voice is needed.
Send from command line:
echo “Hello Wilson!” festival –tts
NOTE: There may be some “electrical” sound behind a robot talking quickly.
Use a Chrome browser to see the documentation and on-line demo:
The Firefox browser needs a plug-in to be installed.
Python code to invoke TTS from text in a variable and in a file:
import subprocess text = '"Hello world"' subprocess.call('echo '+text+'|festival --tts', shell=True) text = '"You are listening to text to speech synthesis using Festival package from the University Edingburg in the UK."' filename = 'hello' file=open(filename,'w') file.write(text) file.close() subprocess.call('festival --tts '+filename, shell=True)
Flite is a lighter version of Festival built specifically for embedded systems. It runs faster than Festival because it doesn’t have Festival’s complex scripting language or phoneme handling.
https://aws.amazon.com/polly/ is a service that uses advanced deep learning technologies to synthesize speech across 24 languages. It emits sounds using 47 lifelike voices human voices.
Type text, select a language and voice, then click to speech at
The default English, US has more voices than other languages:
- Ivy sounds like a young female
- Justin sounds like a young male
I like hearing British Amy, who has a more breathy voice than British Emma.
Select English, Indian and Raveena speaks with an Indian accent.
Edit the SSML to vary the sound or upload the whole lexicon, which can be up to 4,000 characters and 1,000 rules.
What amazed me is that English text is translated before being spoken.
Payment is by the number of characters converted to speech. Sound files reused do not incur a cost. Sounds can be saved in MP3, OGG, and PCM formats (at 8,000, 16,000, and 22,050 Hz).
Use AWS Lambda to generate pre-signed Polly URLs based on events from the AWS IoT rules engine, then use Device Gateway to send these URLs to your IoT devices to allow them to request lifelike speech.
Polly is within Amazon’s Artificial Intelligence services that include Lex (to build chatbots), Rekognition (to ecognize objects and scenes), and Machine Learning.
More on IoT
This is one of a series on IoT:
- IoT Apprentice school curriculum
- IoT use cases
- IoT reminders prevent dead mobile battery
- IoT text to speech synthesis
- IoT AWS button
- Intel IoT
- IoT Raspberry hardware
- IoT Clouds
- Predix basics
- Predix installation
- Predix services
- Predix programming