How do you compare two text classfiers?
Tony Russell-Rose writes:
I need to compare two text classifiers – one human, one machine. They are assigning multiple tags from an ontology. We have an initial corpus of ~700 records tagged by both classifiers. The goal is to measure the ‘value added’ by the human. However, we don’t yet have any ground truth data (i.e. agreed annotations).
Any ideas on how best to approach this problem in a commercial environment (i.e. quickly, simply, with minimum fuss), or indeed what’s possible?
I thought of measuring the absolute delta between the two profiles (regardless of polarity) to give a ceiling on the value added, and/or comparing the profile of tags added by each human coder against the centroid to give a crude measure of inter-coder agreement (and hence difficulty of the task). But neither really measures the ‘value added’ that I’m looking for, so I’m sure there must better solutions.
Suggestions, anyone? Or is this as far as we can go without ground truth data?
Some useful comments have been made. Do you have others?
PS: I wrote at Tony’s blog in a comment:
Tony,
The ‘value added’ by human taggers concept is unclear. The tagging in both cases is the result of human adding of semantics. Once through the rules for the machine tagger and once via the “human” taggers.
Can you say a bit more about what you see as a separate ‘value added’ by the human taggers?
What do you think? Is Tony’s question clear enough?