Adding an Atom Feed to Authors in ACL Anthology

Andreas

2022-02-20 19:00

Reading all abstracts of all papers in the Computational Linguistics field is a full time job. So I wanted to filter the papers to the authors in my particular subfield only. Even when they submit a paper to a workshop or a conference I am not following actively I want to have a way to get notified. Having a RSS feed or an Atom feed seems like the best choice.

Unfortunately there is no author feed for the ACL Anthology. Adding one was discussed in an issues on Github. Since I have no clue about implementing this feature into Hugo, I solved this with implementing my own feed generator. My feed generator is named acl-feed and is reading the HTML page of an author on the anthology and converts their publications into an atom feed.

The code is written in Python using (async) Flask, beautifulsoup and feedgen.

The live version is hosted at https://acl-feed.madflex.de/ and looks like this:

Since there is no publication date in the HTML page the updated timestamp in the feed is always the current date and time. This could only be solved by generating the feed with the full data of the anthology. So when there is an author feed generated by the ACL Anthology I am happy to switch this service off and use their feed.

Sort files after glob in Pathlib

Andreas

2022-02-09 19:00

The data of all my sensor is stored in csv files. Today I wanted to create a dashboard showing the newest value of some sensors. Especially the current CO2 value is interesting here - to see when fresh air would be a good idea.

So I experimented with glob in pathlib to get the newest file for a sensor.

>>> from pathlib import Path
>>>
>>> # this returns an unordered list - not helpful here
>>> list(Path(".").glob("**/*SCD30.csv"))[0]
PosixPath('2021/07/24/2021-07-24_SCD30.csv')
>>>
>>> # sorted sorts by folder/filename. this works fine if the filenames are sortable
>>> sorted(Path(".").glob("**/*SCD30.csv"))[-1]
PosixPath('2022/02/09/2022-02-09_SCD30.csv')
>>>
>>> # reverse the sorting to get the newest on top
>>> sorted(Path(".").glob("**/*SCD30.csv"), reverse=True)[0]
PosixPath('2022/02/09/2022-02-09_SCD30.csv')
>>>
>>> # sort by modified and get the newest first
>>> sorted(Path(".").glob("**/*SCD30.csv"), key=lambda x: x.stat().st_mtime, reverse=True)[0]
PosixPath('2022/02/09/2022-02-09_SCD30.csv')
>>>

For my dashboard code the filename based sorting is good enough but good to know that the modified time solution could handle a more chaotic folder structure.

image tagger to train image tag recognition

Andreas

2022-02-06 18:00

For the second part of the machine learning workshop I need to tag images. My first solution was to display the image in fullscreen and then copy it to a folder named by the tag manually. But this workflow feels very inefficient. So I build a pygame-based tool for this: https://github.com/mfa/image-tagger.

The tags are given by a yaml file (the tagset) and saved for each image when the next image is shown (right arrow key). When the tagging is finished the files with a specific tag are copied via a simple python script, i.e. https://github.com/mfa/image-tagger/blob/main/copy_filtered.py.

By using pygame I learned a lot about events and image redrawing. Because the drawing performance isn't that important here I haven't optimized on that. Limiting the fps by clock.tick(30) was primarily to eliminate unnecessary CPU usage.

Screenshot of current state of the tool:

screenshot