Run a compute job on fly.io

Using fly.io machines as compute instance is more a "how would this work". For real tasks I mainly use Github Actions, AWS CodeBuild, AWS EC2 instances and of course "PCs" idling in the office.

The plan

Start a fly.io machine with a script that downloads the data from somewhere (could be via HTTP, S3, SCP, ...). The results have to be uploaded somewhere too. Because of the write aspect I chose to use Google Cloud Storage for input and result data.

First step: Storage

Create a bucket. I chose Google Cloud, but there are alternatives: AWS S3, Azure, Digital Ocean Spaces, …. The bucket has no public access, one region is enough.

Then create a service-user that can only rw-access this bucket. A good tutorial on limiting the access to one bucket is https://tsmx.net/accessing-a-single-bucket-in-gcs/. Then create a private key (the json one) for the new IAM service account. Save the json file in your project folder.

Second step: the compute code

We start with uploading files to the bucket. This requires google-cloud-storage.

The minimal script to upload

from pathlib import Path
from google.cloud import storage

def upload_blob(bucket_name, file_name):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob("data/" + file_name.name)
    blob.upload_from_filename(str(file_name))

if __name__ == "__main__":
    for fn in Path(".").glob("*jpg"):
        upload_blob("mfa-compute-demo", Path(fn.name))

Replace mfa-compute-demo with your bucket! And run this with the first json file found as credential file in an environment variable:

GOOGLE_APPLICATION_CREDENTIALS=$(ls *json) python upload.py

The full code for the compute jobs does:

  • downloads every file in the bucket with prefix="data/" (one at a time)

  • processes the file

  • uploads the result file to results/

  • moves the input file from data/ to done/

The full example code using opencv to generate a histogram of the image is in this repository: https://github.com/mfa/compute-fly.

Third step: run on fly.io machines

We need a Dockerfile, for example:

FROM python:3.11-slim-bullseye

# for opencv2 (a lot of packages; maybe too much; but works)
RUN apt-get update && apt-get install ffmpeg libsm6 libxext6 -y

COPY . /app
RUN pip install -r /app/requirements.txt
WORKDIR app

CMD ["sh", "start.sh"]

and a start script:

#!/bin/sh

GOOGLE_APPLICATION_CREDENTIALS=$(ls *json) python compute.py

We should not package files not necessary in the current folder, so add a .dockerignore:

# ignore everything
**

# except
!compute.py
!start.sh
!*json
!requirements.txt

Then create a fly machines app with:

fly apps create --machines

and run it:

fly m run . -a broken-sun-3007 --restart no --rm

The parameter -a broken-sun-3007 is the name of your app given in the create command. The other parameters are for only run once and remove the machine after a successful run.

On the monitoring site in the fly.io dashboard everything seems to work out:

app[4d891d64c05618] ams [info] Starting init (commit: ed64554)...
app[4d891d64c05618] ams [info] Preparing to run: `sh start.sh` as root
app[4d891d64c05618] ams [info] 2023/04/16 14:33:13 listening on [fdaa:0:2d6a:a7b:10e:1572:a783:2]:22 (DNS: [fdaa::3]:53)
app[4d891d64c05618] ams [info] process data/PXL_20220822_160749634.jpg
app[4d891d64c05618] ams [info] process data/PXL_20220830_132336336.jpg
app[4d891d64c05618] ams [info] process data/PXL_20220902_082232091.jpg
app[4d891d64c05618] ams [info] Starting clean up.
app[4d891d64c05618] ams [info] [ 6.131131] reboot: Restarting system
runner[4d891d64c05618] ams [info] machine restart policy set to 'no', not restarting

Conclussion

Running jobs on fly.io machines like on AWS Codebuild or Github Actions works. But maybe this is not the way fly.io wants us to use their infrastructure.