Compare Image Labellers Votes
After using the image labeller tool with more labellers than only me, there was a need to compare the resulting yaml files.
The image labeller is a pygame-based tool to show images and add boolean based labels to the images. This is all described in this blogpost: https://madflex.de/image-tagging/.
To compare the files I wrote a script to reads the tags.yml
from every labeller and exports a csv that looks like this:
The screenshot is from a csv file uploaded to Github for easier preview.
But the actually interesting things are easier to query/generate on the shell:
# sum of how many blurred/not_blurred we agreed on with a majority cat comparison.csv | grep -e ".*,True" | cut -d"," -f 8 | sort | uniq -c # create train/test folders mkdir -p {train,test}/{blurred,not_blurred} # generate script to copy majority voted files to their train folder # get only this cols only non-test imgs only majority=True only col 1,3 cat comparison.csv | cut -d"," -f1,2,8,9 | grep "JPG,False" | grep "blurred,True" | cut -d"," -f1,3 | awk -F "," '{ print "cp " $1 " train/"$2 }' > run.sh # run the generated script sh run.sh # and test files based only on majority decision (for all test images) # get only this cols only test imgs cat comparison.csv | cut -d"," -f1,2,8 | grep "JPG,True" | awk -F "," '{ print "cp " $1 " test/"$3 }' > run.sh # run the generated script sh run.sh
The code of the comparison script is in the image-tagger repository: https://github.com/mfa/image-tagger/blob/main/compare_tags.py.