Spa towns near Rivers

Andreas

2023-08-27 20:00

As followup to the previous post, I was asked if spa towns are near rivers. Let's see if we can answer this by using overpass, shapely and geopandas.

In the previous blog post I used a bounding box to get all spa towns in Germany. There is a better way by using area ids queried via Nominatim. For Example select all rivers in Germany this way:

from OSMPythonTools import overpass
from OSMPythonTools.nominatim import Nominatim

nominatim = Nominatim()
# get areaId for Germany
area_id = nominatim.query("Germany").areaId()

api = overpass.Overpass()
query = overpass.overpassQueryBuilder(
    # use area_id instead of bbox
    area=area_id,
    # some rivers are relations
    elementType=["way", "relation"],
    # only natural rivers
    selector='"natural"="water"',
    out="body",
    includeGeometry=True,
)
result = api.query(query, timeout=300)

# write to json
json.dump(result.toJSON()["elements"], open("rivers.json", "w"), indent=2, ensure_ascii=False)

The selector is "natural=water", more information about waterways in the openstreetmap wiki. This query will take a bit of time to run and writes a lot of files in the cache folder. To count the datasets we could use jq:

jq '.[].id' rivers.json | wc -l
# returns: 403851

A lot of natural waterways!

For testing I selected the first 3 rivers and wrote them into an extra file to speed up development of the following Python code:

jq '.[0:2]' rivers.json > 3rivers.json

Next calculate the distance between every river and every spa town. We have latitude/longitude from Openstreetmap and can calculate which river is the nearest, but the resulting values are the euclidian distance (and not in meters).

The following code is a non-optimized brute-force approach and would be a lot faster by limiting the rivers with their bounding boxes. But for now start with the least code possible.

import json

import geopandas as gpd
from shapely import LineString, Point, Polygon
from tqdm import tqdm


# loading spas
def _spas():
    for d in json.load(open("spa_towns.json")):
        yield {
            "id": d["id"],
            "name": d.get("tags", {}).get("name", f"--{d['id']}"),
            "geometry": Point(d["lat"], d["lon"]),
        }


df = gpd.GeoDataFrame.from_records(_spas())

print("loading rivers")
rivers = json.load(open("rivers.json"))


def get_nearest(row):
    print(row["name"])
    shortest_distance = 100
    nearest_river = None
    for river in tqdm(rivers):
        # filter everything that has no wikidata entry
        if river.get("tags", {}).get("wikidata") is None:
            continue

        river_name = river.get("tags", {}).get("name", f"noname-{river['id']}")

        if river.get("type") == "way":
            ways = [river]
        elif river.get("type") == "relation":
            # relations contain more than one way
            ways = river.get("members")
        else:
            # don't calculate anything
            continue

        for way in ways:
            if not way.get("geometry"):
                continue
            d = row["geometry"].distance(
                LineString([(i["lat"], i["lon"]) for i in way.get("geometry")])
            )
            if d < shortest_distance:
                nearest_river = river_name
                shortest_distance = d
    return f"{nearest_river} ({shortest_distance})"


# fill a new column with nearest_river and distance in brackets
df["nearest_river"] = df.apply(lambda row: get_nearest(row), axis=1)
# write data as GeoJson
df.to_file("spa_river.json", driver="GeoJSON")

The code filters on only rivers/lakes with a wikidata entry, to get rid of a lot of "Kneipp-Bad" or "Teich mit Enten" or for Bad Canstatt the "Brillenpinguine" in the local zoo.

Looking at some of the results:

Bad Wildbad is next to the river "Enz" but the nearest river here is Nagoldtalsperre (0.19372965182449883) which is not that far away, but not the "Enz". The Enz is actually missing because it is not tagged with natural="water". Bad Honnef is next to the "Rhein" (Rhine) but the value we get is Blauer See (0.11373349109158684). Bad Teinach should be next to the river "Nagold", but the value is Gültlinger See (0.09095129123228435).

So the result is not good enough to answer the question if a river is near every spa town. The rivers would have to be manually filtered and maybe even selected beforehand. For automatic selection with only Openstreetmap there is not enough information.