Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github

LinkedIn

Make your Pi say things it doesn’t mean

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

This page describes ways to get your Pi to turn text to speech you hear on a speaker.

Cloud-based services leverage powerful servers to provide the most precise speech synthesis.

But programs can run locally on board the computer.


Programs that run on-board the Pi to output voice include Festival and its derivative Flite.

Festival

Festival is written by The Centre for Speech Technology Research at the University of Edingburgh (UK). It offers a framework for building speech synthesis systems. It offers full text to speech through a number APIs: from shell level, via a command interpreter, as a C++ library, from Java, and an Emacs editor interface.

Festival is multi-lingual (currently British English, American English, and Spanish. Other groups work to release new languages for the system. Festival is in the package manager for the Raspberry Pi making it very easy to install.

  1. Install

    sudo apt-get install festival -y

    -y avoid confirmation of 19.2 MB disk space usage.

  2. Identify voices from

    https://packages.debian.org/jessie/festival-voice

    NOTE: A 16khz sample rate is clearer than 8khz, but require more disk space and takes up more CPU.

  3. Install a voice file specifically for processing by Festival on Debian:

    sudo apt-get install festvox-rablpc16k -y

    This “British English male speaker” voice takes 9 MB.

    PROTIP: Only one voice is needed at a time. Save your disk space and keep what is needed, then download another when needed.

  4. Send from command line:

    echo “Hello Wilson!” | festival –tts

    NOTE: There may be some “electrical” sound behind a robot talking quickly.

  5. Use a Chrome browser to see the documentation and on-line demo:

    http://www.cstr.ed.ac.uk/projects/festival

    The Firefox browser needs a plug-in to be installed.

  6. Python code to invoke TTS from text in a variable and in a file:

    import subprocess
    text = '"Hello world"'
    subprocess.call('echo '+text+'|festival --tts', shell=True)
     
    text = '"You are listening to text to speech synthesis using Festival package from the University Edingburg in the UK."'
    filename = 'hello'
    file=open(filename,'w')
    file.write(text)
    file.close()
    subprocess.call('festival --tts '+filename, shell=True)
    
    
    
    

Flite

Flite (from Carnagie Mellon University) is a lighter version of Festival built specifically for embedded systems. It runs faster than Festival because it doesn’t have Festival’s complex scripting language or phoneme handling.

eSpeak

http://espeak.sourceforge.net/

Amazon Polly

https://aws.amazon.com/polly is a service that uses advanced deep learning technologies to synthesize speech across 24 languages. It emits sounds using 47 lifelike human voices.

Type text, select a language and voice, then click to speech at
https://console.aws.amazon.com/polly/home/SynthesizeSpeech

The default English, US has more voices than other languages:

  • Ivy sounds like a young female girl
  • Justin sounds like a young male boy
  • Joey

I like hearing British Amy, who has a more breathy voice than British Emma.

Select English | Indian, and Raveena speaks with an Indian accent.

Edit the SSML to vary the sound or upload the whole lexicon, which can be up to 4,000 characters and 1,000 rules.

What amazed me is that English text is translated before being spoken.

Polly is within Amazon’s Artificial Intelligence services that include Lex (to build chatbots), Rekognition (to ecognize objects and scenes), and Machine Learning.

Payment is by the number of characters converted to speech. Sound files reused do not incur a cost. Sounds can be saved in MP3, OGG (open source), and PCM (high definition) formats (at 8,000, 16,000, and 22,050 Hz).

Use AWS Lambda to generate pre-signed Polly URLs based on events from the AWS IoT rules engine, then use Device Gateway to send these URLs to your IoT devices to allow them to request lifelike speech.

https://portal.aws.amazon.com/gp/aws/developer/registration/index.html

Others


More on IoT

This is one of a series on IoT:

  1. IoT Acronymns and Abbreviations on Quizlet

  2. IoT Home Assistant system

  3. IoT Apprentice school curriculum
  4. IoT use cases
  5. IoT reminders prevent dead mobile battery
  6. IoT barn feeder

  7. IoT text to speech synthesis
  8. IoT AWS button
  9. Intel IoT
  10. IoT Raspberry hardware
  11. IoT Raspberry installation

  12. IoT Clouds
  13. Samsung IoT Cloud

NOTE: Pages about GE’s Predix have been removed.