Use Google Text-to-Speech

Andreas

2023-02-28 19:00

For a project idea I want to see the quality of Google TTS as a reader of a longer story.

Again the most complicated thing is getting the correct key-file to be allowed to call the API. Thankfully the quickstart-guide helps here:

# replace YOUR_PROJECT with the correct value for your project

# create a service account named "tts-quickstart"
gcloud iam service-accounts create tts-quickstart --project YOUR_PROJECT

# give the new user the right "roles/viewer"
gcloud projects add-iam-policy-binding YOUR_PROJECT --member \
  serviceAccount:tts-quickstart@YOUR_PROJECT.iam.gserviceaccount.com \
  --role roles/viewer

# export the key as json file
gcloud iam service-accounts keys create tts-key.json \
  --iam-account tts-quickstart@YOUR_PROJECT.iam.gserviceaccount.com

# and now set it in the environment for the script below to actually use it
export GOOGLE_APPLICATION_CREDENTIALS=tts-key.json

Next the actual Python code that uses the API to generate an mp3 file for a given text.

from google.cloud import texttospeech_v1beta1 as tts

client = tts.TextToSpeechClient()

request = tts.SynthesizeSpeechRequest(
    input=tts.SynthesisInput(
        text="There was a stream at the foot of the hill. "
        "They filled their bottles and the small camping "
        "kettle at a little fall where the water fell a few "
        "feet over an outcrop of grey stone. "
        "It was icy cold; and they spluttered and puffed "
        "as they bathed their faces and hands."
    ),
    voice=tts.VoiceSelectionParams(
        language_code="en-GB", name="en-GB-Neural2-D"
    ),
    audio_config=tts.AudioConfig(
        audio_encoding=tts.AudioEncoding.MP3
  ),
)

response = client.synthesize_speech(request=request)
with open("output.mp3", "wb") as fp:
    fp.write(response.audio_content)

I selected the voice en-GB-Neural2-D on the demo website of Google TTS. The quality seems good enough for my usecase.