Meinsack.click 2023 update
With the new year the city of Stuttgart changed the company collecting the recyclable waste (Gelber Sack). My version of this dataset (meinsack.click) should not break history as the Stuttgart city version does. So my solution was to crawl all calendar files from the new website and cluster them to get the 15 districts that have the same collection dates. Then match them to the old districts and verify a few of them manually.
The calendar files generated by the city website are not rfc5545 compliant and therefore couldn't be imported by the icspy library. I had to add the PRODID to the files after loading them like this:
response = httpx.get( "https://www.gelbersack-stuttgart.de/abfuhrplan/export/vaihingen-mitte", params={"type": 201} ) data = response.text.split("\n") # needed for ical file to be rfc5545 compliant data.insert(1, "PRODID:-//placeholder//text//EN") cal = Calendar("\n".join(data))
The results after clustering and mapping are a lot easier to import into a SQL database. No fiddling with manually created pdfs (and therefore very subtle differences) anymore.