Compare Image Labels

After using the image labeller tool with more labellers than only me, there was a need to compare the resulting yaml files.

The image labeller is a pygame-based tool to show images and add boolean based labels to the images. This is all described in this blogpost: https://madflex.de/image-tagging/.

To compare the files I wrote a script to reads the tags.yml from every labeller and exports a csv that looks like this:

screenshot

The screenshot is from a csv file uploaded to Github for easier preview.

But the actually interesting things are easier to query/generate on the shell:

# sum of how many blurred/not_blurred we agreed on with a majority
cat comparision.csv | grep -e ".*,True" | cut -d"," -f 8 | sort | uniq -c

# create train/test folders
mkdir -p {train,test}/{blurred,not_blurred}

# generate script to copy majority voted files to their train/test folder
cat comparision.csv | grep -e ".*,True" | cut -d"," -f1,2,8 | sed s/,False,/,train,/ | sed s/,True,/,test,/ | awk -F "," '{ print "cp " $1 " " $2"/"$3 }' > run.sh

# run the generated script
sh run.sh

The code of the comparision script is in the image-tagger repository: https://github.com/mfa/image-tagger/blob/main/compare_tags.py.