Using Datasette JSON Write API

In my previous post I described how I used GPT-4o-mini to detect digits on a display. To evaluate the accuracy of the LLM I wanted to manually verify the GPT responses. For this I needed to see the image and the value from the LLM, so a page in a Browser was the obvious choice. All my data is already in a Datasette, so I looked what seems the easiest way to solve this. Datasette 1.0 will get a new write API which is the perfect fit to write an update into one field in the SQLite database. So I installed datasette==1.0a14 which is the current version when writing this post.

The JSON write API needs a token which I will add to a field on the website (seen in the screenshot below). To create this token the Datasette instance needs a secret for hashing.

I created my token this way (obviously with a real secret):

DATASETTE_SECRET=SOME_SECRET_HERE datasette create-token root --database minol update-row

And started the Datasette with the secret and a templates folder with pages/label.html in it.

DATASETTE_SECRET=SOME_SECRET_HERE datasette minol.db --template-dir templates

Shot of the label interface in action:

/images/using-datasette-json-api.png

The full code of my label.html with fetching the next entry and update on submit.

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8" /><title>Label</title>
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css">
    </head>
    <body>
      <div class="container">
        <div class="is-full hero">
          <div class="hero-title">
            <h2 class="is-size-1">Labeling</h2>
            <form autocomplete="off">
              <div class="columns">
                <div class="column is-12">
                  <div class="field has-addons">
                    <div class="control is-expanded">
                      <input id="token" class="input" type="text" placeholder="Insert token">
                    </div> </div> </div> </div>
              <div class="columns">
                <div class="column is-3">
                  Id: <span id="id"></span>
                </div>
                <div class="column is-3">
                  GPT 4o mini response: <span id="gpt_response"></span>
                </div>
                <div class="column is-6"> <span id="image"></span>
                </div> </div>
              <div class="columns mt-5">
                <div class="column is-3">
                  <div class="field has-addons">
                    <div class="control is-expanded">
                      <input id="digits" class="input" type="text">
                    </div>
                    <div class="control">
                      <button id="submit" class="button is-info is-primary">Submit</button>
                    </div> </div> </div> </div>
            </form> </div> </div> </div>
    </body>
    <script type="text/javascript">
      function get_next() {
          return fetch('http://localhost:8001/minol/images.json?display_image__notblank=1&gpt_response__notblank=1&human_response__isnull=1&_shape=array&_nocol=thumbnail&_size=1')
              .then((response) => {
                  response.json().then((data) => {
                      document.getElementById('digits').value = data[0]['gpt_response']
                      document.getElementById('id').innerHTML = data[0]['id']
                      document.getElementById('gpt_response').innerHTML = data[0]['gpt_response']
                      document.getElementById('image').innerHTML = "<img src='data:image/jpeg;base64," + data[0]["display_image"]["encoded"] + "'/>"
                  })})
              .catch(error => console.log(error))}

      function update() {
          const id=document.getElementById('id').innerHTML
          console.log(document.getElementById('token').value)
          fetch('/minol/images/'+id+'/-/update', {
              method: 'post',
              headers: {
                  'Content-Type': 'application/json',
                  'Authorization': 'Bearer ' + document.getElementById('token').value
              },
              body: JSON.stringify({
                  "update": {
                      "human_response": document.getElementById('digits').value,
                  }})})
              .then((response) => {
                  if(response.status === 200){
                      get_next()
                  } else {
                      response.json().then((data) => {
                          console.log(data)
                      })}})
              .catch(error => console.log(error))}

      window.onload = function () {
          get_next()
          document.getElementById("submit").onclick = function() {
              event.preventDefault()
              update()}
      }
    </script>
</html>

The JSON write API part in update() was as described in the docs. If the result is 200 then a new row is fetched. Otherwise the error will be printed to the console, which is good enough for me.

Because the GPT value is already prefilled the whole labeling procedure was fast. Of course I approved a wrong value once. To fix this I updated the human_response to NULL in a sqlite3 shell, so the dataset will return eventually. If this happens more often a way to reopen the previous row would be nice. I may add this, but only if it annoys me enough.

Detect Numbers from a Display with LLMs

For the displays cut out from my Minol heat usage meter I trained a CNN to classify the cut out numbers years ago. I plan on retraining and reevaluating this code, but in 2024 I have to try to use the current GPT4 model and the current image processing capable of Gemini for this.

Gemini 1.5 Flash

So I did some exploration on how good this works and first started with Gemini 1.5 Flash. This failed horrendously. Gemini most of the time returns 255 as number on the display, and I was not encouraged to experiment further with Gemini.

Examples:

/images/digits-llm-gemini.png/images/digits-llm-gemini-2.png

GPT-4o-mini

Next I tried OpenAIs GPT-4o-mini and the results felt really good. Even when nothing is on the display it didn't hallucinate:

/images/digits-llm-gpt4o.png/images/digits-llm-gpt4o-no-result.png

Automate

So I decided that a few Euros are worth it to automate this and try a reasonable sample of images.

Calling OpenAI completion with an image inline:

def query_gpt(img_binary):
    img_base64 = base64.b64encode(img_binary).decode("utf-8")
    img_str = f"data:image/jpeg;base64,{img_base64}"

    client = OpenAI(api_key=API_KEY)

    response = client.chat.completions.create(
      model="gpt-4o-mini",
      messages=[{
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What are the numbers on the display? Please answer only with the number.",
          },
          {"type": "image_url", "image_url": {"url": img_str}},
        ],}], max_tokens=300,
    )
    return response.choices[0].message.content

The img_binary is either the field from the SQLite table, or a fp.read() of an actual image file.

Scrolling over the results I see errors -- time to evaluate this. So I manually labeled the images by writing the correct value into a new field in the SQLite table. To do this I used a bit of Javascript and the new Datasette JSON write API from the current 1.0alpha version of Datasette. More on this (including some code) in the next blog post.

So after manually either accept the GPT value or set a new value, the results are not as good as it felt when scrolling over them. Only 57% of the GPT answers are correct.

/images/gpt4-minol-results.png

The manual labeling was really fast. And I didn't need to fiddle around with the messy images: Some were cut-off, some not perfectly aligned and the ones that are really good have correct values from the LLM. So I will use GPT-4o-mini to label the rest of the images and then decide if I manually correct them or not. To get a trend it is enough to have a good value per day. Filtering should not be that hard, because the value should be higher than the previous one and the difference per day cannot be too much.

Shrinking a lot of JPGs

The images from the previous post are not using any JPEG compression. I fiddled a bit with the compression values of raspistill but the scale seem to be not aligned with the JPEG standards. So I use raspistill without any compression and compress the images later with ImageMagick.

First iteration

This script converts all images into a new folder by keeping my folder structure (year/month):

#!/bin/bash

# example input
# images/2017/12/1512699901.jpg

# example output
# images.smaller/2017/12/1512699901.jpg

# input files
files=$(find images -name \*jpg)

for fn in $files;
do
    echo "$fn"
    # target folder name
    target=images.smaller/$(dirname "$fn" | cut -d"/" -f2,3)
    # generate folder if needed
    mkdir -p $target
    # actual image processing; shrinking the image
    magick -quality 90 $fn $target/$(basename $fn)
# add progressbar; counting echos until length of files list processed
done | pv -l -s "${#files}" > /dev/null

Because I have more than 50k images a progressbar using pv was added. This takes quite some time. Single threaded about 10 images per second are converted. Clearly not fast enough.

Second iteration

Using GNU parallel.

The default is to run the given bash function with as many threads as cores in your computer. I wanted to limit this, so I set it to 8. There is a lot of magic possible for filename replacement and processing, but because I needed a bash function for the mkdir, I didn't care to read up on a more parallel native solution. Everything is happening in the bash function.

#!/bin/bash

convert_func() {
    target=images.smaller/$(dirname "$1" | cut -d"/" -f2,3)
    mkdir -p $target
    magick -quality 90 $1 $target/$(basename $1)
}
export -f convert_func

find images -name \*jpg | parallel --progress -j 8 convert_func

This runs a lot faster than the first iteration by maintaining 8 concurrent magick calls at the same time. The result is that this needed about 20 minutes to process all 50k images.