Run a compute job on fly.io

Using fly.io machines as compute instance is more a "how would this work". For real tasks I mainly use Github Actions, AWS CodeBuild, AWS EC2 instances and of course "PCs" idling in the office.

The plan

Start a fly.io machine with a script that downloads the data from somewhere (could be via HTTP, S3, SCP, ...). The results have to be uploaded somewhere too. Because of the write aspect I chose to use Google Cloud Storage for input and result data.

First step: Storage

Create a bucket. I chose Google Cloud, but there are alternatives: AWS S3, Azure, Digital Ocean Spaces, …. The bucket has no public access, one region is enough.

Then create a service-user that can only rw-access this bucket. A good tutorial on limiting the access to one bucket is https://tsmx.net/accessing-a-single-bucket-in-gcs/. Then create a private key (the json one) for the new IAM service account. Save the json file in your project folder.

Second step: the compute code

We start with uploading files to the bucket. This requires google-cloud-storage.

The minimal script to upload

from pathlib import Path
from google.cloud import storage

def upload_blob(bucket_name, file_name):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob("data/" + file_name.name)
    blob.upload_from_filename(str(file_name))

if __name__ == "__main__":
    for fn in Path(".").glob("*jpg"):
        upload_blob("mfa-compute-demo", Path(fn.name))

Replace mfa-compute-demo with your bucket! And run this with the first json file found as credential file in an environment variable:

GOOGLE_APPLICATION_CREDENTIALS=$(ls *json) python upload.py

The full code for the compute jobs does:

  • downloads every file in the bucket with prefix="data/" (one at a time)

  • processes the file

  • uploads the result file to results/

  • moves the input file from data/ to done/

The full example code using opencv to generate a histogram of the image is in this repository: https://github.com/mfa/compute-fly.

Third step: run on fly.io machines

We need a Dockerfile, for example:

FROM python:3.11-slim-bullseye

# for opencv2 (a lot of packages; maybe too much; but works)
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y

COPY . /app
RUN pip install -r /app/requirements.txt
WORKDIR app

CMD ["sh", "start.sh"]

and a start script:

#!/bin/sh

GOOGLE_APPLICATION_CREDENTIALS=$(ls *json) python compute.py

We should not package files not necessary in the current folder, so add a .dockerignore:

# ignore everything
**

# except
!compute.py
!start.sh
!*json
!requirements.txt

Then create a fly machines app with:

fly apps create --machines

and run it:

fly m run . -a broken-sun-3007 --restart no --rm

The parameter -a broken-sun-3007 is the name of your app given in the create command. The other parameters are for only run once and remove the machine after a successful run.

On the monitoring site in the fly.io dashboard everything seems to work out:

app[4d891d64c05618] ams [info] Starting init (commit: ed64554)...
app[4d891d64c05618] ams [info] Preparing to run: `sh start.sh` as root
app[4d891d64c05618] ams [info] 2023/04/16 14:33:13 listening on [fdaa:0:2d6a:a7b:10e:1572:a783:2]:22 (DNS: [fdaa::3]:53)
app[4d891d64c05618] ams [info] process data/PXL_20220822_160749634.jpg
app[4d891d64c05618] ams [info] process data/PXL_20220830_132336336.jpg
app[4d891d64c05618] ams [info] process data/PXL_20220902_082232091.jpg
app[4d891d64c05618] ams [info] Starting clean up.
app[4d891d64c05618] ams [info] [ 6.131131] reboot: Restarting system
runner[4d891d64c05618] ams [info] machine restart policy set to 'no', not restarting

Conclussion

Running jobs on fly.io machines like on AWS Codebuild or Github Actions works. But maybe this is not the way fly.io wants us to use their infrastructure.

Add OpenStreetMap login to Datasette

I plan a project with geodata and some user interaction. Of course the plan is to use Datasette to build this idea. The first step to realize this is to build some kind of authentication and because of the geodata space using OpenStreetMap OAuth2 seems like the logic choice.

Writing a plugin to log into OSM was mostly copy/paste and some minor changes to the code from a project of Simon Willison: datasette-auth0.

The readme for the datasette-auth-osm plugin contains all the instructions to setup and a demo shows how it looks like. After successfully logged in the OpenStreetMap username is shown on the top right of the navbar.

Wayland screenshot

In a previous post I described how I upload screenshots to a bepasty-server. Since then I switched from xorg/i3 to wayland/sway and the screenshot script changed quite a bit.

My current takescreenshot.sh looks like this:

#!/bin/bash

grim -g "$(slurp)" /tmp/screen.png
if [ "$?" -eq "0" ]; then
    cat /tmp/screen.png | wl-copy
    result=$(python ~/bin/bepasty-upload-image.py -p PASSWORD -u https://some-paste-bin-url.example -f /tmp/screen.png)
    if [ "$?" -ne "0" ]; then
        echo $result; exit 1
    fi
    xdg-open $(echo $result | head -n 1)
fi

The tools used are slurp to select the screen or area on the screen and grim to save the selection as an image. The image is saved in /tmp and if the return code from grim is 0 the copy and upload is started. This return code detection allows me to press Escape to not take a screenshot without triggering the upload. My current most frequent workflow is to paste the image into slack, so the image is put into the clipboard via wl-copy. Then the image is uploaded to the paste-bin and a browser window is opened showing the image in the paste-bin.