Archive for the ‘AutoSuggestion’ Category

Autocompletion and Heavy Metal

Wednesday, June 13th, 2012

Building an Autocompletion on GWT with RPC, ContextListener and a Suggest Tree: Part 0

René Pickhardt has started a series of posts that should interest anyone with search applications (or an interest metal bands).

From the post:

Over the last weeks there was quite some quality programming time for me. First of all I built some indices on the typology data base in which way I was able to increase the retrieval speed of typology by a factor of over 1000 which is something that rarely happens in computer science. I will blog about this soon. But heaving those techniques at hand I also used them to built a better auto completion for the search function of my online social network metalcon.de.

The search functionality is not deployed to the real site yet. But on the demo page you can find a demo showing how the completion is helping you typing. Right now the network requests are faster than google search (which I admit it is quite easy if you only have to handle a request a second and also have a much smaller concept space). Still I was amazed by the ease and beauty of the program and the fact that the suggestions for autocompletion are actually more accurate than our current data base search. So feel free to have a look at the demo:

http://gwt.metalcon.de/GWT-Modelling/#AutoCompletionTest

Right now it consists of about 150 thousand concepts which come from 4 different data sources (Metal Bands, Metal records, Tracks and Germen venues for Heavy metal) I am pretty sure that increasing the size of the concept space by 2 orders of magnitude should not be a problem. And if everything works out fine I will be able to test this hypothesis on my joint project related work which will have a data base with at least 1 mio. concepts that need to be autocompleted.

Well, I must admit that 150,000 concepts sounds a bit “lite” for heavy metal but then being an admirer of the same, that comes as no real surprise. ;-)

Still, it also sounds like a very good starting place.

Enjoy!

Different ways to make auto suggestions with Solr

Monday, June 4th, 2012

Different ways to make auto suggestions with Solr

From the post:

Nowadays almost every website has a full text search box as well as the auto suggestion feature in order to help users to find what they are looking for, by typing the least possible number of characters possible. The example below shows what this feature looks like in Google. It progressively suggests how to complete the current word and/or phrase, and corrects typo errors. That’s a meaningful example which contains multi-term suggestions depending on the most popular queries, combined with spelling correction.

Starts with seven (7) questions you should ask yourself about auto-suggestions and then covers four methods for implementing them in Solr.

You can have the typical word completion seen in most search engines or you can be more imaginative, using custom dictionaries.

Designing Search (part 2): As-you-type suggestions

Wednesday, February 29th, 2012

Designing Search (part 2): As-you-type suggestions by Tony Russell-Rose.

From the post:

Have you ever tried the “I’m Feeling Lucky” button on Google? The idea is, of course, that Google will take you directly to the result you want, rather than return a list of results. It’s a simple idea, and when it works, it seems like magic.

(graphic omitted)

But most of the time we are not so lucky. Instead, we submit a query and review the results; only to find that they’re not quite what we were looking for. Occasionally, we review a further page or two of results, but in most cases it’s quicker just to enter a new query and try again. In fact, this pattern of behaviour is so common that techniques have been developed specifically to help us along this part of our information journey. In particular, three versions of as-you-type suggestions—auto-complete, auto-suggest, and instant results—subtly guide us in creating and reformulating queries.

Tony guides the reader through auto-complete, auto-suggest, and instant results in his usual delightful manner. He illustrates the principles under discussion with well known examples from the WWW.

A collection of his posts should certainly be supplemental (if not primary) reading for any course on information interfaces.

How to Store Google n-gram in Neo4j

Saturday, February 18th, 2012

How to Store Google n-gram in Neo4j by r.schiessler.

From the post:

In the end of September I discovered an amazing data set which is provided by Google! It is called the Google n gram data set. Even thogh the english wikipedia article about ngrams needs some clen up it explains nicely what an ngram is. http://en.wikipedia.org/wiki/N-gram The data set is available in several languages and I am sure it is very useful for many tasks in web retrieval, data mining, information retrieval and natural language processing.

This data set is very well described on the official google n gram page which I also include as an iframe directly here on my blog.

Schiessler describes the project as follows:

The idea is that once a user has started to type a sentence statistics about the n-grams can be used to semantically and syntactically correctly predict what the next word will be and in this way increase the speed of typing by making suggestions to the user. This will be in particular usefull with all these mobile devices where typing is really annoying.

Another suggestion project!

Worth your time both for its substance and use of Neo4j.

Different ways to make auto suggestions with Solr

Saturday, February 18th, 2012

Different ways to make auto suggestions with Solr

From the post:

Nowadays almost every website has a full text search box as well as the auto suggestion feature in order to help users to find what they are looking for, by typing the least possible number of characters possible. The example below shows what this feature looks like in Google. It progressively suggests how to complete the current word and/or phrase, and corrects typo errors. That’s a meaningful example which contains multi-term suggestions depending on the most popular queries, combined with spelling correction.

(graphic omitted)

There are different ways to make auto complete suggestions with Solr. You can find many articles and examples on the internet, but making the right choice is not always easy. The goal of this post is compare the available options in order to identify the best solution tailored to your needs, rather than describe any one specific approach in depth.

It’s common practice to make auto-suggestions based on the indexed data. In fact a user is usually looking for something that can be found within the index, that’s why we’d like to show the words that are similar to the current query and at the same time relevant within the index. On the other hand, it is recommended to provide query suggestions; we can for example capture and index on a specific solr core all the user queries which return more than zero results, so we can use those information to make auto-suggestions as well. What actually matters is that we are going to make suggestions based on what’s inside the index; for this purpose it’s not relevant if the index contains user queries or “normal data”, the solutions we are going to consider can be applied in both cases.

The Suggester module is the method that looks the most promising:

This solution has its own separate index which you can automatically build on every commit. Using collation you can have multi-term suggestions. Furthermore, it is possible to use a custom dictionary instead of the index content, which makes the current solution even more flexible.

I like to think of multi-term suggestions as tuneable query expansions that return materials on a subject more precisely than the original query.

The custom dictionary has even more potential:

When a file-based dictionary is used (non-empty sourceLocation parameter above) then it’s expected to be a plain text file in UTF-8 encoding. Blank lines and lines that start with a ‘#’ are ignored. The remaining lines must consist of either a string without literal TAB (\u0007) character, or a string and a TAB separated floating-point weight. (http://wiki.apache.org/solr/Suggester)

The custom dictionary can contain single terms or phrases.

Hmmm, a custom dictionary:

  1. Is easy to author
  2. Contains words and phrases
  3. Is an editorial artifact
  4. Not limited to a single Solr installation
  5. Could be domain specific
  6. Assists in returning more, not less precise results

The handling of the more precise results is up to your imagination.