Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 11, 2014

…Technology-Assisted Review in Electronic Discovery…

Filed under: Machine Learning,Spark — Patrick Durusau @ 7:16 pm

Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery by Gordon V. Cormack & Maura R. Grossman.

Abstract:

Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in document review for discovery in legal proceedings. Our comparison addresses a central question in the deployment of technology-assisted review: Should training documents be selected at random, or should they be selected using one or more non-random methods, such as keyword search or active learning? On eight review tasks — four derived from the TREC 2009 Legal Track and four derived from actual legal matters — recall was measured as a function of human review effort. The results show that entirely non-random training methods, in which the initial training documents are selected using a simple keyword search, and subsequent training documents are selected by active learning, require substantially and significantly less human review effort (P<0.01) to achieve any given level of recall, than passive learning, in which the machine-learning algorithm plays no role in the selection of training documents. Among passive-learning methods, significantly less human review effort (P<0.01) is required when keywords are used instead of random sampling to select the initial training documents. Among active-learning methods, continuous active learning with relevance feedback yields generally superior results to simple active learning with uncertainty sampling, while avoiding the vexing issue of "stabilization" -- determining when training is adequate, and therefore may stop.

New acronym for me: TAR (technology-assisted review).

If you are interested in legal discovery, take special note that the authors have released a TAR evaluation toolkit.

This article and its references will repay a close reading several times over.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress