Archive for the ‘AutoComplete’ Category

Time-varying social networks in a graph database…

Thursday, September 26th, 2013

Time-varying social networks in a graph database: a Neo4j use case by Ciro Cattuto, Marco Quaggiotto, André Panisson, and Alex Averbuch.

Abstract:

Representing and efficiently querying time-varying social network data is a central challenge that needs to be addressed in order to support a variety of emerging applications that leverage high-resolution records of human activities and interactions from mobile devices and wearable sensors. In order to support the needs of specific applications, as well as general tasks related to data curation, cleaning, linking, post-processing, and data analysis, data models and data stores are needed that afford efficient and scalable querying of the data. In particular, it is important to design solutions that allow rich queries that simultaneously involve the topology of the social network, temporal information on the presence and interactions of individual nodes, and node metadata. Here we introduce a data model for time-varying social network data that can be represented as a property graph in the Neo4j graph database. We use time-varying social network data collected by using wearable sensors and study the performance of real-world queries, pointing to strengths, weaknesses and challenges of the proposed approach.

A good start on modeling networks that vary based on time.

If the overhead sounds daunting, remember the graph data used here measured the proximity of actors every 20 seconds for three days.

Imagine if you added social connections between those actors, attended the same schools/conferences, co-authored papers, etc.

We are slowly loosing our reliance on simplification of data and models to make them computationally tractable.

Advanced autocomplete with Solr Ngrams

Monday, July 8th, 2013

Advanced autocomplete with Solr Ngrams by Peter Tyrrell.

From the post:

The following approach is a good one if you require:

  • phrase suggestions, not just words
  • the ability to match user input against multiple fields
  • multiple fields returned
  • multiple field values to make up a unique suggestion
  • suggestion results collapsed (grouped) on a field or fields
  • the ability to filter the query
  • images with suggestions

I needed a typeahead suggestion (autocomplete) solution for a textbox that searches titles. In my case, I have a lot of magazines that are broken down so that each page is a document in the Solr index, and has metadata that describes its parentage. For example, page 1 of Dungeon Magazine 100 has a title: “Dungeon 100”; a collection; “Dungeon Magazine”; and a universe: “Dungeons and Dragons”. (Yes, all the material in my index is related to RPG in some way.) A magazine like this might consist of 70 pages or so, whereas a sourcebook like the Core Rulebook for Pathfinder, a D&D variant, boasts 578, so title suggestions have to group on title and ignore counts. Further, the Warhammer 40k game Dark Heresy also has a Core Rulebook, so title suggestions have to differentiate between them.

(…)

Topic map interfaces with autosuggest/complete could ease users into searching and authoring topic maps.

Autocomplete Search with Redis

Sunday, December 9th, 2012

Autocomplete Search with Redis

From the post:

When we launched GetGlue HD, we built a faster and more powerful search to help users find the titles they were looking for when they want to check-in to their favorite shows and movies as they typed into the search box. To accomplish that, we used the in-memory data structures of the Redis data store to build an autocomplete search index.

Search Goals

The results we wanted to autocomplete for are a little different than the usual result types. The Auto complete with Redis writeup by antirez explores using the lexicographical ordering behavior of sorted sets to autocomplete for names. This is a great approach for things like usernames, where the prefix typed by the user is also the prefix of the returned results: typing mar could return Mara, Marabel, and Marceline. The deal-breaking limitation is that it will not return Teenagers From Mars, which is what we want our autocomplete to be able to do when searching for things like show and movie titles. To do that, we decided to roll our own autocomplete engine to fit our requirements. (Updated the link to the “Auto complete with Redis” post.)

Rather like the idea of autocomplete being more than just string completion.

What if while typing a name, “autocompletion” returns one or more choices for what it thinks you may be talking about? With additional properties/characteristics, you can disambiguate your usage by allowing your editor to tag the term.

Perhaps another way to ease the burden of authoring a topic map.

Autocompletion and Heavy Metal

Wednesday, June 13th, 2012

Building an Autocompletion on GWT with RPC, ContextListener and a Suggest Tree: Part 0

René Pickhardt has started a series of posts that should interest anyone with search applications (or an interest metal bands).

From the post:

Over the last weeks there was quite some quality programming time for me. First of all I built some indices on the typology data base in which way I was able to increase the retrieval speed of typology by a factor of over 1000 which is something that rarely happens in computer science. I will blog about this soon. But heaving those techniques at hand I also used them to built a better auto completion for the search function of my online social network metalcon.de.

The search functionality is not deployed to the real site yet. But on the demo page you can find a demo showing how the completion is helping you typing. Right now the network requests are faster than google search (which I admit it is quite easy if you only have to handle a request a second and also have a much smaller concept space). Still I was amazed by the ease and beauty of the program and the fact that the suggestions for autocompletion are actually more accurate than our current data base search. So feel free to have a look at the demo:

http://gwt.metalcon.de/GWT-Modelling/#AutoCompletionTest

Right now it consists of about 150 thousand concepts which come from 4 different data sources (Metal Bands, Metal records, Tracks and Germen venues for Heavy metal) I am pretty sure that increasing the size of the concept space by 2 orders of magnitude should not be a problem. And if everything works out fine I will be able to test this hypothesis on my joint project related work which will have a data base with at least 1 mio. concepts that need to be autocompleted.

Well, I must admit that 150,000 concepts sounds a bit “lite” for heavy metal but then being an admirer of the same, that comes as no real surprise. 😉

Still, it also sounds like a very good starting place.

Enjoy!

Designing Search (part 2): As-you-type suggestions

Wednesday, February 29th, 2012

Designing Search (part 2): As-you-type suggestions by Tony Russell-Rose.

From the post:

Have you ever tried the “I’m Feeling Lucky” button on Google? The idea is, of course, that Google will take you directly to the result you want, rather than return a list of results. It’s a simple idea, and when it works, it seems like magic.

(graphic omitted)

But most of the time we are not so lucky. Instead, we submit a query and review the results; only to find that they’re not quite what we were looking for. Occasionally, we review a further page or two of results, but in most cases it’s quicker just to enter a new query and try again. In fact, this pattern of behaviour is so common that techniques have been developed specifically to help us along this part of our information journey. In particular, three versions of as-you-type suggestions—auto-complete, auto-suggest, and instant results—subtly guide us in creating and reformulating queries.

Tony guides the reader through auto-complete, auto-suggest, and instant results in his usual delightful manner. He illustrates the principles under discussion with well known examples from the WWW.

A collection of his posts should certainly be supplemental (if not primary) reading for any course on information interfaces.

Different ways to make auto suggestions with Solr

Saturday, February 18th, 2012

Different ways to make auto suggestions with Solr

From the post:

Nowadays almost every website has a full text search box as well as the auto suggestion feature in order to help users to find what they are looking for, by typing the least possible number of characters possible. The example below shows what this feature looks like in Google. It progressively suggests how to complete the current word and/or phrase, and corrects typo errors. That’s a meaningful example which contains multi-term suggestions depending on the most popular queries, combined with spelling correction.

(graphic omitted)

There are different ways to make auto complete suggestions with Solr. You can find many articles and examples on the internet, but making the right choice is not always easy. The goal of this post is compare the available options in order to identify the best solution tailored to your needs, rather than describe any one specific approach in depth.

It’s common practice to make auto-suggestions based on the indexed data. In fact a user is usually looking for something that can be found within the index, that’s why we’d like to show the words that are similar to the current query and at the same time relevant within the index. On the other hand, it is recommended to provide query suggestions; we can for example capture and index on a specific solr core all the user queries which return more than zero results, so we can use those information to make auto-suggestions as well. What actually matters is that we are going to make suggestions based on what’s inside the index; for this purpose it’s not relevant if the index contains user queries or “normal data”, the solutions we are going to consider can be applied in both cases.

The Suggester module is the method that looks the most promising:

This solution has its own separate index which you can automatically build on every commit. Using collation you can have multi-term suggestions. Furthermore, it is possible to use a custom dictionary instead of the index content, which makes the current solution even more flexible.

I like to think of multi-term suggestions as tuneable query expansions that return materials on a subject more precisely than the original query.

The custom dictionary has even more potential:

When a file-based dictionary is used (non-empty sourceLocation parameter above) then it’s expected to be a plain text file in UTF-8 encoding. Blank lines and lines that start with a ‘#’ are ignored. The remaining lines must consist of either a string without literal TAB (\u0007) character, or a string and a TAB separated floating-point weight. (http://wiki.apache.org/solr/Suggester)

The custom dictionary can contain single terms or phrases.

Hmmm, a custom dictionary:

  1. Is easy to author
  2. Contains words and phrases
  3. Is an editorial artifact
  4. Not limited to a single Solr installation
  5. Could be domain specific
  6. Assists in returning more, not less precise results

The handling of the more precise results is up to your imagination.

AutoComplete with Suggestion Groups

Thursday, October 20th, 2011

AutoComplete with Suggestion Groups from Sematext.

From the post:

While Otis is talking about our new Search Analytics (it’s open and free now!) and Scalable Performance Monitoring (it’s also open and free now!) services at Lucene Eurocon in Barcelona Pascal, one of the new members of our multinational team at Sematext, is working on making improvements to our search-lucene.com and search-hadoop.com sites. One of the recent improvements is on the AutoComplete functionality we have there. If you’ve used these sites before, you may have noticed that the AutoComplete there now groups suggestions. In the screen capture below you can see several suggestion groups divided with pink lines. Suggestions can be grouped by any criterion, and here we have them grouped by the source of suggestions. The very first suggestion is from “mail # general”, which is our name for the “general” mailing list that some of the projects we index have. The next two suggestions are from “mail # user”, followed by two suggestions from “mail # dev”, and so on. On the left side of each suggestion you can see icons that signify the type of suggestion and help people more quickly focus on types of suggestions they are after.

Very nice!

Curious how you would distinguish “grouping suggestions” from faceted navigation? (I have a distinction in mind but curious about yours.)