Google Cloud Translate API Batch requests

After spending some time getting Google Cloud Translate API with batch requests running, I document this here for future me.

This step-by-step post needs Google Cloud SDK installed!

First the API needs to be activated.

Second, we need a way to authenticate. I chose a service-account with the rights to use the translate API and to write to Google Cloud Storage. The service-account is downloaded as a json file and the filename has to be set as an environment variable, i.e.

export GOOGLE_APPLICATION_CREDENTIALS=your-projectid-123456-d6835a365891.json

The API request is a json file too. This file has a specified structure. Mine looked like this:

   "sourceLanguageCode": "en",
   "targetLanguageCodes": ["ja"],
   "inputConfigs": [
       "gcsSource": {
         "inputUri": "gs://YOUR-STORAGE-BUCKET/input/inputdata.tsv"
   "outputConfig": {
       "gcsDestination": {
         "outputUriPrefix": "gs://YOUR-STORAGE-BUCKET/output/"

Then I uploaded the inputdata.tsv to Google Cloud Storage. I used the webinterface, but gsutil -m cp inputdata.tsv gs://YOUR-STORAGE-BUCKET/input/ should work too.

And now finally the request to translate the tsv file.

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \<PROJECT_ID>/locations/us-central1:batchTranslateText

Replace request.json with the filename of your json file (see above) and <PROJECT_ID> with the id of your Google Cloud Project.

The command returns the operation-id, i.e.

  "metadata": {
      "@type": "",
      "state": "RUNNING"

This operation-id can be used to get the status the translation request:

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token)<PROJECT_ID>/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf

For example:

  "name": "projects/123456/locations/us-central1/operations/20210406-15021617746540-606bd714-0000-2d87-9290-001a114b3fbf",
  "metadata": {
    "@type": "",
    "state": "RUNNING",
    "totalCharacters": "19121",
    "submitTime": "2021-04-06T22:11:31Z"

When finished the result can be downloaded from Google Cloud Storage via gsutil, i.e.

gsutil -m cp \
  "gs://YOUR-STORAGE-BUCKET/output/index.csv" \
  "gs://YOUR-STORAGE-BUCKET/output/YOUR-STORAGE-BUCKET_input_inputdata_ja_translations.tsv" \

Google Cloud Storage cleanup

I got quite some storage usage since using Google Cloud Run for a growing number of applications in the last months.
So I need to cleanup a bit. Regularly.

If you have only a few applications this can be done manually on the Google Cloud Registry website:
But I want to automate this.

First step is to get to know what is there:

gcloud container images list --repository$PROJECT_ID

This lists all services with container images in my europe registry folder.

Next list all images in one of the folders:

gcloud container images list-tags $IMAGE --limit=unlimited

$IMAGE is the full name of the folder, i.e.$PROJECT_ID/helloworld

This returns a list of images with tags and date, i.e.:

101211eecef5  latest  2021-01-19T14:01:04
83e3f7541cf5          2021-01-19T13:55:24

To delete an image we need the SHA256 of the image.

Same command as before but with a --format at the end.

gcloud container images list-tags $IMAGE --limit=unlimited --format='get(digest)'

The result is only the sha256 without tags and date. So quite hard to decide if I want to delete it. :(

Now delete one of the images, for safety reasons I chose the oldest one:

gcloud container images delete -q --force-delete-tags \

Doing this manually feels wrong and is quite time consuming.

But others had this problem before.
A good starting point for me was this gist:
I made it a bit more verbose, but kept the system of deleting until a given date.
Some forks improved the gist to keep a given number of images.

IMDB Ratings to SQLite

A few weeks ago I demoed Datasette on a small private conference.
One of my examples was the IMDB ratings CSV file converted to SQLite using csvs-to-sqlite.
The resulting SQLite database was usable but I was missing some facets and filtering possibilities.
To fix this I created imdb-to-sqlite.

IMDB to SQLite add the genres and directors as array fields.
Additionally year_rating and weekday_rating to filter more easy on year and weekday.
The filters would be possible using SQL but adding them as fields seems easier.

Try imdb-to-sqlite with your IMDB ratings and add issues if something should be added.