For a project idea I want to see the quality of Google TTS as a reader of a longer story.
Again the most complicated thing is getting the correct key-file to be allowed to call the API. Thankfully the quickstart-guide helps here:
# replace YOUR_PROJECT with the correct value for your project # create a service account named "tts-quickstart" gcloud iam service-accounts create tts-quickstart --project YOUR_PROJECT # give the new user the right "roles/viewer" gcloud projects add-iam-policy-binding YOUR_PROJECT --member \ serviceAccount:tts-quickstart@YOUR_PROJECT.iam.gserviceaccount.com \ --role roles/viewer # export the key as json file gcloud iam service-accounts keys create tts-key.json \ --iam-account tts-quickstart@YOUR_PROJECT.iam.gserviceaccount.com # and now set it in the environment for the script below to actually use it export GOOGLE_APPLICATION_CREDENTIALS=tts-key.json
Next the actual Python code that uses the API to generate an mp3 file for a given text.
from google.cloud import texttospeech_v1beta1 as tts client = tts.TextToSpeechClient() request = tts.SynthesizeSpeechRequest( input=tts.SynthesisInput( text="There was a stream at the foot of the hill. " "They filled their bottles and the small camping " "kettle at a little fall where the water fell a few " "feet over an outcrop of grey stone. " "It was icy cold; and they spluttered and puffed " "as they bathed their faces and hands." ), voice=tts.VoiceSelectionParams( language_code="en-GB", name="en-GB-Neural2-D" ), audio_config=tts.AudioConfig( audio_encoding=tts.AudioEncoding.MP3 ), ) response = client.synthesize_speech(request=request) with open("output.mp3", "wb") as fp: fp.write(response.audio_content)
I selected the voice
en-GB-Neural2-D on the demo website of Google TTS.
The quality seems good enough for my usecase.