Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 18, 2014

Saving Output of nltk Text.Concordance()

Filed under: Concordance,NLTK,Text Mining — Patrick Durusau @ 4:15 pm

Saving Output of NLTK Text.Concordance() by Kok Hua.

From the post:

In NLP, sometimes users would like to search for series of phrases that contain particular keyword in a passage or web page.

NLTK provides the function concordance() to locate and print series of phrases that contain the keyword. However, the function only print the output. The user is not able to save the results for further processing unless redirect the stdout.

Below function will emulate the concordance function and return the list of phrases for further processing. It uses the NLTK concordance Index which keeps track of the keyword index in the passage/text and retrieve the surrounding words.

Text mining is a very common part of topic map construction so tools that help with that task are always welcome.

To be honest, I am citing this because it may become part of several small tools for processing standards drafts. Concordance software is not rare but a full concordance of a document seems to frighten some proof readers.

The current thinking being if only the “important” terms are highlighted in context, that some proof readers will be more likely to use the work product.

The same principal applies to the authoring of topic maps as well.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress