Deploy infrequently used site to Google Cloudrun

I am running an aclanthology people site to atom feed conversion proxy at https://acl-feed.madflex.de/. The site has near to no traffic and is only cpu bound. One fly.io instance running this is a waste of resources because the instance runs all the time. So lets move it over to Google Cloudrun and run the app only when called. Maybe I have to revert this and move to somewhere else because the cloudrun egress free limits are not that high.

First step is to change the Procfile to a Dockerfile. Google Cloudrun only works with Dockerfiles. For this app a Dockerfile could look like this:

FROM python:3.11

COPY app /app
COPY requirements.txt /tmp
RUN pip install -r /tmp/requirements.txt

ENV PORT 8000
EXPOSE 8000
WORKDIR /

CMD ["gunicorn", "app.main:app"]

No further changes to the code are needed.

The second step is to build the container and run it on Cloudrun:

gcloud builds submit --tag eu.gcr.io/<PROJECT_ID>/acl-feed
gcloud run deploy acl-feed --image eu.gcr.io/<PROJECT_ID>/acl-feed --allow-unauthenticated

The last command returns the deployed url.

The final step took the longest here: Setting up DNS. Adding the CNAME redirect is easy, waiting until Google picks up on the change can take a while. After 15 minutes I got impatient and deleted the DNS mapping in the Google UI and added it again. Then the custom domain in Google went to green, but browsing on the site returned:

Secure Connection Failed
An error occurred during a connection to acl-feed.madflex.de. PR_END_OF_FILE_ERROR
Error code: PR_END_OF_FILE_ERROR``.

Some more waiting fixed this. So moving over was NOT zero downtime for my app. This is not an issue for my small pet project here, but can be an issue for "real" websites.

Deploying via Github Action the same as for fly.io seems possible but feels like a lot more work to setup. Maybe another time or for another project.

Prune old statuses

I am running a gotosocial instance for my personal experiments. One of the bots I am running there is stuendlich@cress.space which toots the current time every hour to give a feeling for time in my timeline.

I wanted to prune the old toots of this bot, because they are useless when older than a few days. Mastodon has a feature to prune older statuses, but gotosocial is missing this currently. There is a ready to use Python script to use for Mastodon named ephemetoot. But this fails for gotosocial because the version reported is not bigger than 1.0 (which would be the Mastodon version needed). So I needed to write code against the api myself (again).

First getting the account id of the current logged in user:

r = requests.get(
    url="https://fedi.cress.space/api/v1/accounts/verify_credentials",
    headers=headers,
)
account_id = r.json()["id"]

Then get the next 30 (gotosocial default) statuses:

max_id = None
r = requests.get(
    url=f"https://fedi.cress.space/api/v1/accounts/{account_id}/statuses",
    headers=headers,
    params={"max_id": max_id},
)

The max_id has to be set to the previously last status to get the next 30 statuses. This is the way the Mastodon API is doing pagination.

And finally deleting a status:

r = requests.delete(
    url=f"https://fedi.cress.space/api/v1/statuses/{status_id}",
    headers=headers,
)
assert r.status_code == 200

The status code should be 200. For ratelimiting the code is 429.

Deleting a lot of statuses obviously results in a rate limit. Which is good! So maybe more than one run (after waiting for 15 minutes) is needed to prune all old toots. The script breaks (on purpose) when a rate limit happens.

The full code of the script is in the stuendlich-bot github repo.

The code is running in a Github Action once a day.

Monitor paths with Systemd

After experimenting with AWS S3 triggering events on PutObject it is time to revisit and document event triggering in Systemd.

In the past I used this to trigger an image recognition on an uploaded image in a folder. This is done using systemd-path. Two systemd config files are needed. One is monitoring a path and the other one is the service that is executed when the first one is firing. Assume we have a folder that watches on images uploaded and triggers a script when a new image is added.

First watch the /somewhere/inbox folder on changed files:

[Unit]
Description=watch inbox folder on changes

[Path]
PathChanged=/somewhere/inbox
Unit=process-image.service

[Install]
WantedBy=multi-user.target

The parameter in Unit is the service that is called on change. This service could look like this:

[Unit]
Description=Run image processing

[Service]
Type=simple
ExecStart=/somewhere/processing.sh

[Install]
WantedBy=multi-user.target

Both files have to live in /etc/systemd/system/ (or a symlink from there to a versioned folder). Now activate the two systemd configs:

sudo systemctl enable image.path image.service
sudo systemctl start image.path

The filename is not send to the service, so the service has to glob in the watched folder for the new files and should start processing by moving the file. The possibility of processing a file twice is still there if two events are triggered too fast. The triggering default limit is "2 seconds" between events. This can be changed with TriggerLimitIntervalSec= and TriggerLimitBurst=. Processing an image twice would be no problem and it is very unlikely to happen, because the rate images are added is currently one per hour.

An example for the processing script may look like this:

#!/bin/sh

IN=/somewhere/inbox
OUT=/somewhere/processing
mkdir $OUT
for fn in $(find $IN -type f); do
   mv ${fn} $OUT/
   base_fn=$(basename $fn)
   echo "process: $base_fn"
   python process.py $OUT/$base_fn
done

Everything written to stdout (echo, print, ...) can be seen in systemd journal, i.e. journalctl -u image.