Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 19, 2014

Tokenizing and Named Entity Recognition with Stanford CoreNLP

Filed under: Named Entity Mining,Natural Language Processing,Stanford NLP — Patrick Durusau @ 2:58 pm

Tokenizing and Named Entity Recognition with Stanford CoreNLP by Sujit Pal.

From the post:

I got into NLP using Java, but I was already using Python at the time, and soon came across the Natural Language Tool Kit (NLTK), and just fell in love with the elegance of its API. So much so that when I started working with Scala, I figured it would be a good idea to build a NLP toolkit with an API similar to NLTKs, primarily as a way to learn NLP and Scala but also to build something that would be as enjoyable to work with as NLTK and have the benefit of Java’s rich ecosystem.

The project is perenially under construction, and serves as a test bed for my NLP experiments. In the past, I have used OpenNLP and LingPipe to build Tokenizer implementations that expose an API similar to NLTK’s. More recently, I have built an Named Entity Recognizer (NER) with OpenNLP’s NameFinder. At the recommendation of one of my readers, I decided to take a look at Stanford CoreNLP, with which I ended up building a Tokenizer and a NER implementation. This post describes that work.

Truly a hard core way to learn NLP and Scala!

Excellent!

Looking forward to hearing more about this project.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress