Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 26, 2012

Index your blog using tags and lucene.net

Filed under: .Net,Lucene — Patrick Durusau @ 4:56 am

Index your blog using tags and lucene.net by Ricci Gian Maria.

From the post:

In the last part of my series on Lucene I show how simple is adding tags to document to do a simple tag based categorization, now it is time to explain how you can automate this process and how to use some advanced characteristic of lucene. First of all I write a specialized analyzer called TagSnowballAnalyzer, based on standard SnowballAnalyzer plus a series of keywords associated to various tags, here is how I construct it.

There are various code around the net on how to add synonyms with weight, like described in this stackoverflow question, standard java lucene code has a SynonymTokenFilter in the codebase, but this example shows how simple is to write a Filter to add tags as synonym of related words.   First of all the filter was initialized with a dictionary of keyword and Tags, where Tag is a simple helper class that stores Tag string and relative weight, it also have a ConvertToToken() method that returns the tag enclosed by | (pipe) character. The use of pipe character is done to explicitly mark tags in the token stream, any word that is enclosed by pipe is by convention a tag.

Not the answer for every situation involving synonymy (as in “same subject,” i.e., topic maps) but certainly a useful one.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress