Google Cloud Translate API Batch requests
After spending some time getting Google Cloud Translate API with batch requests running, I document this here for future me.
This step-by-step post needs Google Cloud SDK installed!
First the API needs to be activated.
Second, we need a way to authenticate. I chose a service-account with the rights to use the translate API and to write to Google Cloud Storage. The service-account is downloaded as a json file and the filename has to be set as an environment variable, i.e.
The API request is a json file too. This file has a specified structure. Mine looked like this:
{ "sourceLanguageCode": "en", "targetLanguageCodes": ["ja"], "inputConfigs": [ { "gcsSource": { "inputUri": "gs://YOUR-STORAGE-BUCKET/input/inputdata.tsv" } } ], "outputConfig": { "gcsDestination": { "outputUriPrefix": "gs://YOUR-STORAGE-BUCKET/output/" } } }
Then I uploaded the inputdata.tsv
to Google Cloud Storage.
I used the webinterface, but gsutil -m cp inputdata.tsv gs://YOUR-STORAGE-BUCKET/input/
should work too.
And now finally the request to translate the tsv file.
curl -X POST \ -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \ -H "Content-Type: application/json; charset=utf-8" \ -d @request.json \<PROJECT_ID>/locations/us-central1:batchTranslateText
Replace request.json
with the filename of your json file (see above) and <PROJECT_ID>
with the id of your Google Cloud Project.
The command returns the operation-id, i.e.
{ "name": "projects/123456/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf", "metadata": { "@type": "", "state": "RUNNING" } }
This operation-id can be used to get the status the translation request:
curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token)<PROJECT_ID>/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf
For example:
{ "name": "projects/123456/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf", "metadata": { "@type": "", "state": "RUNNING", "totalCharacters": "19121", "submitTime": "2021-04-06T22:11:31Z" } }
When finished the result can be downloaded from Google Cloud Storage via gsutil, i.e.