Tang and Lease (2011) Semi-Supervised Consensus Labeling for Crowdsourcing
From the post:
I came across this paper, which, among other things, describes the data collection being used for the 2011 TREC Crowdsourcing Track:
- Tang, Wei and Matthew Lease. 2011. Semi-supervised consensus labeling for crowdsourcing. SIGIR Workshop on Crowdsourcing for Information Retrieval.
But that’s not why we’re here today. I want to talk about their modeling decisions.
Tang and Lease apply a Dawid-and-Skene-style model to crowdsourced binary relevance judgments for highly-ranked system responses from a previous TREC information retrieval evaluation. The workers judge document/query pairs as highly relevant, relevant, or irrelevant (though highly relevant and relevant are collapsed in the paper).
The Dawid and Skene model was relatively unsupervised, imputing all of the categories for items being classified as well as the response distribution for each annotator for each category of input (thus characterizing both bias and accuracy of each annotator).
I post this in part for the review of the model in question and also as a warning that competent people really do read research papers in their areas. Yes, on the WWW you can publish anything you want, of whatever quality. But, others in your field will notice. Is that what you want?