Cascade: Crowdsourcing Taxonomy Creation by Lydia B. Chilton, Greg Little, Darren Edge, Daniel S. Weld, James A. Landay.
Taxonomies are a useful and ubiquitous way of organizing information. However, creating organizational hierarchies is difficult because the process requires a global understanding of the objects to be categorized. Usually one is created by an individual or a small group of people working together for hours or even days. Unfortunately, this centralized approach does not work well for the large, quickly-changing datasets found on the web. Cascade is an automated workflow that creates a taxonomy from the collective efforts of crowd workers who spend as little as 20 seconds each. We evaluate Cascade and show that on three datasets its quality is 80-90% of that of experts. The cost of Cascade is competitive with expert information architects, despite taking six times more human labor. Fortunately, this labor can be parallelized such that Cascade will run in as fast as five minutes instead of hours or days.
In the introduction the authors say:
Crowdsourcing has become a popular way to solve problems that are too hard for today’s AI techniques, such as translation, linguistic tagging, and visual interpretation. Most successful crowdsourcing systems operate on problems that naturally break into small units of labor, e.g., labeling millions of independent photographs. However, taxonomy creation is much harder to decompose, because it requires a global perspective. Cascade is a unique, iterative workflow that emergently generates this global view from the distributed actions of hundreds of people working on small, local problems.
The authors demonstrate the potential for time and cost savings in the creation of taxonomies but I take the significance of their paper to be something different.
As the paper demonstrates, taxonomy creation does not require a global perspective.
Any one of the individuals who participated, contributed localized knowledge that when combined with other localized knowledge, can be formed into what an observer would call a taxonomy.
A critical point since every user represents/reflects slightly varying experiences and viewpoints, while the most learned expert represents only one.
Does “your” taxonomy reflect your views or some expert’s?