From the post:
Most university programming courses ignore entity extraction. Some professors talk about the challenges of identifying people, places, things, events, Social Security Numbers and leave the rest to the students. Other professors may have an assignment related to parsing text and detecting anomalies or bound phrases. But most of those emerging with a degree in computer science consign the challenge of entity extraction to the Miscellaneous file.
Entity extraction means processing text to identify, tag, and properly account for those elements that are the names of person, numbers, organizations, locations, and expressions such as a telephone number, among other items. An entity can consist of a single word like Cher or a bound sequence of words like White House. The challenge of figuring out names is tough one for several reasons. Many names exist in richly varied forms. You can find interesting naming conventions in street addresses in Madrid, Spain, and for the owner of a falafel shop in Tripoli.
Entities, as information retrieval experts have learned since the first DARPA conference on the subject in 1987, are quite important to certain types of content analysis. Digital Reasoning has been working for more than 11 years on entity extraction and related content processing problems. Entity oriented analytics have become a very important issue these days as companies deal with too much data, the need to understand the meaning and not the just the statistics of the data and finally to understand entities in context – critical to understanding code terms, etc.
I want to highlight the six weaknesses of traditional entity extraction and highlight Digital Reasoning’s patented, fully automated method. Let’s look at the weaknesses.
For my library class: No, I am not endorsing this product and yes it is a promotional piece. You are going to encounter those as librarians for your entire careers. And you are going to need to be able to ask questions that focus on the information needs of your library and its patrons. Not what the software is said to do well.
Read the full piece and visit the product’s website. What would you ask? Why? What more information do you think you would need?