High Accuracy Metadata and Machine Learning: A librarian’s success by Ashleigh Faith.
Description:
Taxonomy is a field that spans both LIS and SIS studies. Ashleigh Faith will be presenting how a librarian can use information science to set up a taxonomy and machine learning process for complex content. Faith created a taxonomy, based in engineering mobility and science terminology, from scratch. Developing a cohesive taxonomy that would also facilitate automatic indexing on content for an engineering database that reaches more than 218,000 documents (and growing) across eight different content types was a challenge. The nature of scientific content makes automatic indexing difficult because it is considered complex –or outside the established standards of taxonomy. Faith discusses the process used to establish a taxonomy to capture content and create the bedrock in which the indexing software could be trained. Using traditionally linguistic techniques, Faith improved the database taxonomy metadata assignment accuracy to 89% accuracy, well above the typically accepted 75% accuracy rate of automatic indexing, and established a repeatable process that was also implemented successfully on NASA Tech Brief content, NATO Terminology Directives, and DOD content. Learn from concrete examples, lessons learned from a librarians perspective, and how to duplicate the process.
Slides: http://prezi.com/jr61rhjoqotb/?utm_campaign=share&utm_medium=copy
Just working from the slides, this was a presentation to see!
In a nutshell: Ahsleigh’s approach netted 89% accuracy, as compared to human indexer accuracy of 91% and typical automated indexing at 75%.
Good illustration of the content finding rules:
Rule 1: If you don’t want to find content, don’t hire a librarian.
Rule 2: If you do want to find content, hire a librarian.
Clear enough?