I just published Piper Voices TTS installer, a script that turns a Linux machine into a local, offline voice generator for phone systems. It downloads the rhasspy/piper neural TTS binary and a set of pre-selected voices, then renders each test phrase twice. Once as a regular WAV file, and once as an 8kHz mono μ-law file that drops straight into a SIP or PBX prompt directory. No cloud accounts, no API keys, no per-character billing.

See: https://github.com/mevdschee/piper-voices-tts

Why local TTS for telephony

Most IVR vendors push you towards cloud TTS, which is fine until the audio prompts of your customer support line depend on a third party staying online, keeping its prices stable, and not changing its API. For a telephony system that runs in your own datacenter, a local model is a better fit. Piper is fast enough to run on a small VM, the voices are good enough for production prompts, and the output is fully deterministic, so you can commit the generated WAV files to git and reproduce them later.

What the script installs

The piper-voices.sh script is idempotent and installs three voices that I personally find usable for IVR:

Dutch (nl_NL-alex-medium, 109 KB)

Welkom bij ons telefoonsysteem. Dit is een test van de piper TTS stemmen. Ik ben een computerstem, maar ik probeer zo natuurlijk mogelijk te klinken. Bedankt voor het luisteren naar deze test van de piper TTS stemmen. Ik hoop dat je het leuk vond!

English US (en_US-lessac-high, 109 KB)

Welcome to our phone system. This is a test of the piper TTS voices. I am a computer voice, but I try to sound as natural as possible. Thank you for listening to this test of the piper TTS voices. I hope you enjoyed it!

German (de_DE-thorsten-high, 121 KB)

Willkommen in unserem Telefonsystem. Dies ist ein Test der piper TTS Stimmen. Ich bin eine Computerstimme, aber ich versuche so natürlich wie möglich zu klingen. Vielen Dank, dass Sie sich diesen Test der piper TTS Stimmen angehört haben. Ich hoffe, es hat Ihnen gefallen!

The Piper binary lands in /opt/piper-tts, the voice models in /usr/share/piper-voices, and the test phrases are written to the current directory. Re-running the script skips anything that is already installed, so it is safe to call from provisioning tools.

Telephone bandwidth in one sox call

The piece that took the longest to get right was not the synthesis, it was the post-processing. Phone systems expect 8kHz mono μ-law audio with the speech band rolled off, otherwise the prompt sounds tinny or muffled when it hits the codec. The script does this in a single sox invocation:

sox input.wav -r 8000 -c 1 -e mu-law output_mulaw.wav \
    highpass 300 lowpass 3400

The 300Hz highpass and 3400Hz lowpass match the classic G.711 telephone band, so what you hear in your headphones is close to what your callers will hear.

Generating a prompt

Once installed, generating a new prompt is a one-liner:

echo "Your call is important to us, please hold." | \
  /opt/piper-tts/piper/piper \
    --model /usr/share/piper-voices/en_US-lessac-high/en_US-lessac-high.onnx \
    --output_file hold.wav

Pipe the result through the same sox command and you have a phone-ready prompt. For Dutch I use a --length_scale of 1.2, which slows the voice down a touch and makes it noticeably easier to follow over a low bitrate codec.

Edge’s cloud TTS service

Local Piper covers the common case, but sometimes you need a voice that Piper just does not have, for example a specific regional accent or a language with no good open model. For those situations I maintain a fork of surfaceyu/edge-tts-go at mevdschee/edge-tts-go, a small Go client for Microsoft Edge’s online TTS service. It exposes hundreds of neural voices in dozens of languages and is trivial to script:

edge-tts-go --voice nl-NL-FennaNeural \
  --text "Welkom bij ons telefoonsysteem." \
  --write-media welcome.mp3

Pipe that through the same sox filter and you have a μ-law prompt in any voice Microsoft offers. I reach for Piper first, and fall back to edge-tts-go when no local voice fits the bill.

Enjoy!