Compare Image Labellers Votes

After using the image labeller tool with more labellers than only me, there was a need to compare the resulting yaml files.

The image labeller is a pygame-based tool to show images and add boolean based labels to the images. This is all described in this blogpost: https://madflex.de/image-tagging/.

To compare the files I wrote a script to reads the tags.yml from every labeller and exports a csv that looks like this:

screenshot

The screenshot is from a csv file uploaded to Github for easier preview.

But the actually interesting things are easier to query/generate on the shell:

# sum of how many blurred/not_blurred we agreed on with a majority
cat comparison.csv | grep -e ".*,True" | cut -d"," -f 8 | sort | uniq -c

# create train/test folders
mkdir -p {train,test}/{blurred,not_blurred}

# generate script to copy majority voted files to their train folder
#                     get only this cols    only non-test imgs  only majority=True   only col 1,3
cat comparison.csv | cut -d"," -f1,2,8,9 | grep "JPG,False" | grep "blurred,True" | cut -d"," -f1,3 | awk -F "," '{ print "cp " $1 " train/"$2 }' > run.sh

# run the generated script
sh run.sh

# and test files based only on majority decision (for all test images)
#                     get only this cols  only test imgs
cat comparison.csv | cut -d"," -f1,2,8 | grep "JPG,True" | awk -F "," '{ print "cp " $1 " test/"$3 }' > run.sh

# run the generated script
sh run.sh

The code of the comparison script is in the image-tagger repository: https://github.com/mfa/image-tagger/blob/main/compare_tags.py.