Import Images from Tarfile into SQLite

Andreas

2023-07-16 20:00

After processing all my image tarfiles and save a thumbnail tarfile per folder it is time to save all the thumbnails into an SQLite database.

The great sqlite-utils library from Simon Willison has already an insert-files command, but this needs files on the disk (or stdin magic). Me reading images from a tarfile is probably too niche to add as a feature, so the code is not using the CLI part of sqlite-utils, but the Python API.

Reading images from tarfile and write them into SQLite in Python:

import datetime
import tarfile
from pathlib import Path

import click
from sqlite_utils.db import Database

@click.command()
@click.option("-f", "--thumbfile")
def main(thumbfile):
    db = Database("gopro_images.db")
    table = db["thumbnails"]

    thumb_fn = Path(thumbfile)
    folder_name = thumb_fn.parent.name
    with tarfile.open(thumb_fn, "r") as tar_thumbs:
        for member in tar_thumbs:
            name = member.name.split("/")[-1]
            fp = tar_thumbs.extractfile(member)
            contents = fp.read()

            table.upsert({
                "id": f"{folder_name}/{name}",
                "content": contents,
                "size": len(contents),
            }, pk="id")

if __name__ == "__main__":
    main()

The "id" field is the foldername and the name of the image from the tarfile.

Thanks to the datasette-render-image plugin the images (stored as binary content) can be rendered within datasette. Example screenshot with a some metadata (date and bike):

Update CPU cooling after some years

Andreas

2023-07-09 12:00

I still use the PC I built in 2013. At that time the CPU was an AMD FX-4350. A few years later in 2017 I replaced the CPU with a then cheap AMD FX-8350 with 8 cores at 4Ghz. This was the maximum possible for the AM3+-mainboard but needed better cooling then the previous CPU. So I added a "be quiet! BK009" cooler and some thermal paste. Of course I changed other things since then too: the GPU, more memory, SSDs, ….

Now in 2023, the summers get warmer and the CPU was running at 80°C while gaming. Time to update the thermal paste. This time I used MX-4 from Arctic.

CPU after removing the cooler (1), clean (2), and after applying the new thermal paste (3):

Now, after updating the thermal paste and adding the cooler again, the CPU runs at 20°C in idle (which is below room temperature atm) and even with maximum load (playing an AAA game) I didn't get the temperature above 70°C.

Parse Wikidata Entries

Andreas

2023-07-02 20:00

Introduction

Parsing wikidata entries is not that trivial. Every information associated to an Entry is based on claims which are identified with a Property id. So either build your own lookup for the Properties you need and correlate them with the JSON export of an Entry or find a library that does that.

For example the Entry for the old Stuttgart train station, the Bonatzbau:
The wikidata page: https://www.wikidata.org/wiki/Q613766
and the the JSON export: https://www.wikidata.org/wiki/Special:EntityData/Q613766.json.

The inception year is "P571" which looks in the JSON like this (references removed):

"P571": [
  {
    "mainsnak": {
      "snaktype": "value",
      "property": "P571",
      "hash": "51b760062c35d828aa817b95777ea0830b3c21ba",
      "datavalue": {
        "value": {
          "time": "+1922-00-00T00:00:00Z",
          "timezone": 0,
          "before": 0,
          "after": 0,
          "precision": 9,
          "calendarmodel": "http://www.wikidata.org/entity/Q1985727"
        },
        "type": "time"
      },
      "datatype": "time"
    },
    "type": "statement",
    "id": "Q613766$5A06D69F-843D-4761-9228-537E0F56DB53",
    "rank": "normal",
    "references": [
       "<snip>"
    ]
  }
]

But the only thing I need is the year, which is defined by the timestamp and precision (9 is for year). This seems like a lot of work to figure this out for every Property.

But thanks to opensource, someone did this already.

Wikipedia tools (for Humans)

Link: https://github.com/siznax/wptools

Example usage:

import wptools

page = wptools.page(wikibase="Q613766", silent=True)
page.get_wikidata()

# inception year
print(page.data["claims"]["P571"])
# -> ['+1922-00-00T00:00:00Z']

# show label of P571
print(page.data['labels']["P571"])
# -> inception

This is a lot easier than parsing the JSON, but information is missing. For example the precision of the year.

qwikidata

Link: https://github.com/kensho-technologies/qwikidata

Example usage:

from qwikidata.linked_data_interface import get_entity_dict_from_api
from qwikidata.entity import WikidataItem

entity = WikidataItem(get_entity_dict_from_api("Q613766"))

# first P571 claim
claim = entity.get_truthy_claim_group("P571")[0]

# datavalue
datavalue = claim.mainsnak.datavalue

# value
print(datavalue.value)
# -> {'time': '+1922-00-00T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0,
#     'precision': 9, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}

# parsed value
print(datavalue.get_parsed_datetime_dict())
# -> {'year': 1922, 'month': 0, 'day': 0, 'hour': 0, 'minute': 0, 'second': 0}

The precision is not applied automatically, but all information is there. The structure given feels very similar to the json export, so this doesn't feel as magic as wptools.

Conclusion

Depending on your needs "wptools" is abstracted a bit more and needs less knowledge about the interna of wikidata json. But some information is omitted. On the other hand "qwikidata" is pretty close to wikidata json but everything is there.