Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz.
From the post:
TREC Legal Track, an annual government-sponsored project for evaluating document review methods, on Friday released its 2011 results containing a virtual vote of confidence for technology-assisted review.
“[T]he results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review,” the report states.
The term “technology-assisted review” refers to “any semi-automated process in which a human codes documents as relevant or not, and the system uses that information to code or prioritize further documents,” said TREC co-leader Gordon Cormack, of the University of Waterloo. Its meaning is far wider than just the software method known as predictive coding, he noted.
As such, “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that ‘enough is enough’ and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies,” the report states.
Good snapshot of current results, plus fertile data sets for testing alternative methodologies.
The report mentions the 100 GB data set size was a problem for some participants? (Overview of the TREC 2011 Legal Track, page 2)
Suggestion: Post the 2013 data set as a public data set to AWS. Would be available to everyone and if not using local clusters, they can fire up capacity on demand. More realistic scenario than local data processing.
Perhaps an informal survey of the amortized cost of processing by different methods (cloud, local cluster) would be of interest to the legal community.
I can hear the claims of “security, security” from here. The question to ask is: What disclosed premium your client is willing to pay for security on data you are going to give to the other side if responsive and non-privileged? 25% 50% 125% or more?
BTW, looking forward to the 2013 competition. Particularly if it gets posted to the AWS or similar cloud.
Let me know if you are interested in forming an ad hoc team or investigating the potential for an ad hoc team.