Parse Wikidata Entries

Introduction

Parsing wikidata entries is not that trivial. Every information associated to an Entry is based on claims which are identified with a Property id. So either build your own lookup for the Properties you need and correlate them with the JSON export of an Entry or find a library that does that.

For example the Entry for the old Stuttgart train station, the Bonatzbau:

The inception year is "P571" which looks in the JSON like this (references removed):

"P571": [
  {
    "mainsnak": {
      "snaktype": "value",
      "property": "P571",
      "hash": "51b760062c35d828aa817b95777ea0830b3c21ba",
      "datavalue": {
        "value": {
          "time": "+1922-00-00T00:00:00Z",
          "timezone": 0,
          "before": 0,
          "after": 0,
          "precision": 9,
          "calendarmodel": "http://www.wikidata.org/entity/Q1985727"
        },
        "type": "time"
      },
      "datatype": "time"
    },
    "type": "statement",
    "id": "Q613766$5A06D69F-843D-4761-9228-537E0F56DB53",
    "rank": "normal",
    "references": [
       "<snip>"
    ]
  }
]

But the only thing I need is the year, which is defined by the timestamp and precision (9 is for year). This seems like a lot of work to figure this out for every Property.

But thanks to opensource, someone did this already.

Wikipedia tools (for Humans)

Link: https://github.com/siznax/wptools

Example usage:

import wptools

page = wptools.page(wikibase="Q613766", silent=True)
page.get_wikidata()

# inception year
print(page.data["claims"]["P571"])
# -> ['+1922-00-00T00:00:00Z']

# show label of P571
print(page.data['labels']["P571"])
# -> inception
This is a lot easier than parsing the JSON, but information is missing. For example the precision of the year.

qwikidata

Link: https://github.com/kensho-technologies/qwikidata

Example usage:

from qwikidata.linked_data_interface import get_entity_dict_from_api
from qwikidata.entity import WikidataItem

entity = WikidataItem(get_entity_dict_from_api("Q613766"))

# first P571 claim
claim = entity.get_truthy_claim_group("P571")[0]

# datavalue
datavalue = claim.mainsnak.datavalue

# value
print(datavalue.value)
# -> {'time': '+1922-00-00T00:00:00Z', 'timezone': 0, 'before': 0, 'after': 0,
#     'precision': 9, 'calendarmodel': 'http://www.wikidata.org/entity/Q1985727'}

# parsed value
print(datavalue.get_parsed_datetime_dict())
# -> {'year': 1922, 'month': 0, 'day': 0, 'hour': 0, 'minute': 0, 'second': 0}
The precision is not applied automatically, but all information is there. The structure given feels very similar to the json export, so this doesn't feel as magic as wptools.

Conclusion

Depending on your needs "wptools" is abstracted a bit more and needs less knowledge about the interna of wikidata json. But some information is omitted. On the other hand "qwikidata" is pretty close to wikidata json but everything is there.