Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 16, 2014

NLTK-like Wordnet Interface in Scala

Filed under: NLTK,Scala — Patrick Durusau @ 2:49 pm

NLTK-like Wordnet Interface in Scala by Sujit Pal.

From the post:

I recently figured out how to setup the Java WordNet Library (JWNL) for something I needed to do at work. Prior to this, I have been largely unsuccessful at figuring out how to access Wordnet from Java, unless you count my one attempt to use the Java Wordnet Interface (JWI) described here. I think there are two main reason for this. First, I just didn’t try hard enough, since I could get by before this without having to hook up Wordnet from Java. The second reason was the over-supply of libraries (JWNL, JWI, RiTa, JAWS, WS4j, etc), each of which annoyingly stops short of being full-featured in one or more significant ways.

The one Wordnet interface that I know that doesn’t suffer from missing features comes with the Natural Language ToolKit (NLTK) library (written in Python). I have used it in the past to access Wordnet for data pre-processing tasks. In this particular case, I needed to call it at runtime from within a Java application, so I finally bit the bullet and chose a library to integrate into my application – I chose JWNL based on seeing it being mentioned in the Taming Text book (and used in the code samples). I also used code snippets from Daniel Shiffman’s Wordnet page to learn about the JWNL API.

After I had successfully integrated JWNL, I figured it would be cool (and useful) if I could build an interface (in Scala) that looked like the NLTK Wordnet interface. Plus, this would also teach me how to use JWNL beyond the basic stuff I needed for my webapp. My list of functions were driven by the examples from the Wordnet section (2.5) from the NLTK book and the examples from the NLTK Wordnet Howto. My Scala class implements most of the functions mentioned on these two pages. The following session will give you an idea of the coverage – even though it looks a Python interactive session, it was generated by my JUnit test. I do render the Synset and Word (Lemma) objects using custom format() methods to preserve the illusion (and to make the output readable), but if you look carefully, you will notice the rendering of List() is Scala’s and not Python’s.

NLTK is amazing in its own right and creating a Scala interface will give you an excuse to learn Scala. That’s a win-win situation!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress