Google Cloud Translate API Batch requests

After spending some time getting Google Cloud Translate API with batch requests running, I document this here for future me.

This step-by-step post needs Google Cloud SDK installed!

First the API needs to be activated.

Second, we need a way to authenticate. I chose a service-account with the rights to use the translate API and to write to Google Cloud Storage. The service-account is downloaded as a json file and the filename has to be set as an environment variable, i.e.

export GOOGLE_APPLICATION_CREDENTIALS=your-projectid-123456-d6835a365891.json

The API request is a json file too. This file has a specified structure. Mine looked like this:

{
   "sourceLanguageCode": "en",
   "targetLanguageCodes": ["ja"],
   "inputConfigs": [
     {
       "gcsSource": {
         "inputUri": "gs://YOUR-STORAGE-BUCKET/input/inputdata.tsv"
       }
     }
   ],
   "outputConfig": {
       "gcsDestination": {
         "outputUriPrefix": "gs://YOUR-STORAGE-BUCKET/output/"
       }
    }
 }

Then I uploaded the inputdata.tsv to Google Cloud Storage. I used the webinterface, but gsutil -m cp inputdata.tsv gs://YOUR-STORAGE-BUCKET/input/ should work too.

And now finally the request to translate the tsv file.

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://translation.googleapis.com/v3/projects/<PROJECT_ID>/locations/us-central1:batchTranslateText

Replace request.json with the filename of your json file (see above) and <PROJECT_ID> with the id of your Google Cloud Project.

The command returns the operation-id, i.e.

{
  "name":
      "projects/123456/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf",
  "metadata": {
      "@type": "type.googleapis.com/google.cloud.translation.v3.BatchTranslateMetadata",
      "state": "RUNNING"
  }
}

This operation-id can be used to get the status the translation request:

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) https://translation.googleapis.com/v3/projects/<PROJECT_ID>/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf

For example:

{
  "name": "projects/123456/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.translation.v3.BatchTranslateMetadata",
    "state": "RUNNING",
    "totalCharacters": "19121",
    "submitTime": "2021-04-06T22:11:31Z"
  }
}

When finished the result can be downloaded from Google Cloud Storage via gsutil, i.e.

gsutil -m cp \
  "gs://YOUR-STORAGE-BUCKET/output/index.csv" \
  "gs://YOUR-STORAGE-BUCKET/output/YOUR-STORAGE-BUCKET_input_inputdata_ja_translations.tsv" \
  .