Archive for the ‘Search Behavior’ Category


Monday, May 18th, 2015


From the “about” page:

The FreeSearch project is a search system on top of DBLP data provided by Michael Ley. FreeSearch is a joint project of the L3S Research Center and iSearch IT Solutions GmbH.

In this project we develop new methods for simple literature search that works on any catalogs, without requiring in-depth knowledge of the metadata schema. The system helps users proactively and unobtrusively by guessing at each step what the user’s real information need is and providing precise suggestions.

A more detailed description of the system can be found in this publication: FreeSearch – Literature Search in a Natural Way.

You can choose to search across:

DBLP (4,552,889 documents)

TIBKat (2,079,012 documents)

CiteSeer (1,910,493 documents)

BibSonomy (448,166 documents)


One Word Twitter Search Advice

Tuesday, April 28th, 2015

The one word journalists should add to Twitter searches that you probably haven’t considered by Daniel Victor.

Daniel takes you through five results without revealing how he obtained them. A bit long but you will be impressed when he reveals the answer.

He also has some great tips for other Twitter searching. Tips that you won’t see from any SEO.

Definitely something to file with your Twitter search tips.

Introducing Splainer…

Monday, August 25th, 2014

Introducing Splainer — The Open Source Search Sandbox That Tells You Why by Doug Turnbull.

Splainer is a step towards addressing two problems:

From the post:

  • Collaboration: At OpenSource Connections, we believe that collaboration with non-techies is the secret ingredient of search relevancy. We need to arm business analysts and content experts with a human readable version of the explain information so they can inform the search tuning process.
  • Usability: I want to paste a Solr URL, full of query paramaters and all, and go! Then, once I see more helpful explain information, I want to tweak (and tweak and tweak) until I get the search results I want. Much like some of my favorite regex tools. Get out of the way and let me tune!
  • ….

    We hope you’ll give it a spin and let us know how it can be improved. We welcome your bugs, feedback, and pull requests. And if you want to try the Splainer experience over multiple queries, with diffing, results grading, a develoment history, and more — give Quepid a spin for free!

Improving the information content of the tokens you are searching is another way to improve search results.


Sunday, February 16th, 2014

SearchReSearch by Daniel M. Russell.

WARNING: SearchReSearch looks very addictive!

Truly, it really looks addictive

The description reads:

A blog about search, search skills, teaching search, learning how to search, learning how to use Google effectively, learning how to do research. It also covers a good deal of sensemaking and information foraging.

If you like searching, knowing why searches work (or don’t), sensemaking and information foraging, this is the blog for you.

Among other features, Daniel posts search challenges that are solved by commenters and himself. Interesting search challenges.

Spread the news about Daniel’s blog to every librarian (or researcher) you know.

I first saw this at Pete Warden’s Five Short Links February 13, 2014.

Google Transparency Report

Saturday, December 21st, 2013

Google Transparency Report

The Google Transparency Report consists of five parts:

  1. Government requests to remove content

    A list of the number of requests we receive from governments to review or remove content from Google products.

  2. Requests for information about our users

    A list of the number of requests we received from governments to hand over user data and account information.

  3. Requests by copyright owners to remove search results

    Detailed information on requests by copyright owners or their representatives to remove web pages from Google search results.

  4. Google product traffic

    The real-time availability of Google products around the world, historic traffic patterns since 2008, and a historic archive of disruptions to Google products.

  5. Safe Browsing

    Statistics on how many malware and phishing websites we detect per week, how many users we warn, and which networks around the world host malware sites.

I pointed out the visualizations of the copyright holder data earlier today.

There are a number of visualizations of the Google Transparency Report and I may assemble some of the more interesting ones for your viewing pleasure.

You certainly should download the data sets and/or view them as Google Docs Spreadsheets.

I say that because while Google is more “transparent” than the current White House, it’s not all that transparent at all.

Take the government take down requests for example.

According to the raw data file, the United States has made five (5) requests on the basis of national security, four (4) of which were for YouTube videos and one (1) was for one web search result.


And for no government request, is there sufficient information to identify the information that any government sought to conceal.

Google may have qualms about information governments want to conceal but that sounds like a marketing opportunity to me. (Being mindful of your availability to such governments.)

A language for search and discovery

Monday, December 2nd, 2013

A language for search and discovery by Tony Russell-Rose.


In order to design better search experiences, we need to understand the complexities of human information-seeking behaviour. In this paper, we propose a model of information behaviour based on the needs of users across a range of search and discovery scenarios. The model consists of a set of modes that users employ to satisfy their information goals.

We discuss how these modes relate to existing models of human information seeking behaviour, and identify areas where they differ. We then examine how they can be applied in the design of interactive systems, and present examples where individual modes have been implemented in interesting or novel ways. Finally, we consider the ways in which modes combine to form distinct chains or patterns of behaviour, and explore the use of such patterns both as an analytical tool for understanding information behaviour and as a generative tool for designing search and discovery experiences.

Tony’s post is also available as a pdf file.

A deeply interesting paper but consider the evidence that underlies it:

The scenarios were collected as part of a series of requirements workshops involving stakeholders and customer-facing staff from various client organisations. A proportion of these engagements focused on consumer-oriented site search applications (resulting in 277 scenarios) and the remainder on enterprise search applications (104 scenarios).

The scenarios were generated by participants in breakout sessions and subsequently moderated by the workshop facilitator in a group session to maximise consistency and minimise redundancy or ambiguity. They were also prioritised by the group to identify those that represented the highest value both to the end user and to the client organisation.

This data possesses a number of unique properties. In previous studies of information seeking behaviour (e.g. [5], [10]), the primary source of data has traditionally been interview transcripts that provide an indirect, verbal account of end user information behaviours. By contrast, the current data source represents a self-reported account of information needs, generated directly by end users (although a proportion were captured via proxy, e.g. through customer facing staff speaking on behalf of the end users). This change of perspective means that instead of using information behaviours to infer information needs and design insights, we can adopt the converse approach and use the stated needs to infer information behaviours and the interactions required to support them.

Moreover, the scope and focus of these scenarios represents a further point of differentiation. In previous studies, (e.g. [8]), measures have been taken to address the limitations of using interview data by combining it with direct observation of information seeking behaviour in naturalistic settings. However, the behaviours that this approach reveals are still bounded by the functionality currently offered by existing systems and working practices, and as such do not reflect the full range of aspirational or unmet user needs encompassed by the data in this study.

Finally, the data is unique in that is constitutes a genuine practitioner-oriented deliverable, generated expressly for the purpose of designing and delivering commercial search applications. As such, it reflects a degree of realism and authenticity that interview data or other research-based interventions might struggle to replicate.

It’s not a bad thing to use data from commercial engagements for research and is certainly better than usability studies based on 10 to 12 undergraduates, two of whom did not complete the study. 😉

However, I would be very careful about trying to generalize from a self-selected group even for commercial search, much less the fuller diversity of other search scenarios.

On the other hand, the care with which the data was analyzed makes it an excellent data point against which to compare other data points, hopefully with more diverse populations.

The Gap Between Documents and Answers

Thursday, October 24th, 2013

I mentioned the webinar: Driving Knowledge-Worker Performance with Precision Search Results a few days ago in Findability As Value Proposition.

There was one nugget (among many) in the webinar before I lose sight of how important it is to topic maps and semantic technologies in general.

Dan Taylor (Earley and Associates) was presenting a maturation diagram for knowledge technologies.

See the presentation for the details but what struck me was than on the left side (starting point) there were documents. On the right side (the goal) were answers.

Think about that for a moment.

When you search in Google or any other search engine, what do you get back? Pointers to documents, presentations, videos, etc.

What task remains? Digging out answers from those documents, presentations, videos.

A mature knowledge technology goes beyond what an average user is searching for (the Google model) and returns information based on a specific user for a particular domain, that is, an answer.

For the average user there may be no better option than to drop them off in the neighborhood of a correct answer. Or what may be a correct answer to the average user. No guarantees that you will find it.

The examples in the webinar are in specific domains where user queries can be modeled accurately enough to formulate answers (not documents) to answer queries.

Reminds me of TaxMap. You?

If you want to do a side by side comparison, try USC: Title 26 – Internal Revenue Code. From the Legal Information Institute (Cornell)

Don’t get me wrong, the Cornell materials are great but they reflect the U.S. Code, nothing more or less. That is to say the text you find there isn’t engineered to provide answers. 😉

I will update this point with the webinar address as soon as it appears.

Building better search tools: problems and solutions

Monday, September 16th, 2013

Building better search tools: problems and solutions by Vincent Granville

From the post:

Have you ever done a Google search for mining data? It returns the same results as for data mining. Yet these are two very different keywords: mining data usually means data about mining. And if you search for data about mining you still get the same results anyway.

(graphic omitted)

Yet Google has one of the best search algorithms. Imagine an e-store selling products, allowing users to search for products via a catalog powered with search capabilities, but returning irrelevant results 20% of the time. What a loss of money! Indeed, if you were an investor looking on Amazon to purchase a report on mining data, all you will find are books on data mining and you won’t buy anything: possibly a $500 loss for Amazon. Repeat this million times a year, and the opportunity cost is in billions of dollars.

There are a few issues that make this problem difficult to fix. While the problem is straightforward for decision makers, CTO’s or CEO’s to notice, understand and assess the opportunity cost (just run 200 high value random search queries, see how many return irrelevant results), the communication between the analytic teams and business people is faulty: there is a short somewhere.

There might be multiple analytics teams working as silos – computer scientists, statisticians, engineers – sometimes aggressively defending their own turfs and having conflicting opinions. What the decision makers eventually hears is a lot of noise and lots of technicalities, and they don’t know how to start, how much it will cost to fix it, and how complex the issue is, and who should fix it.

Here I discuss the solution and explain it in very simple terms, to help any business having a search engine and an analytic team, easily fix the issue.

Vincent has some clever insights into this particular type of search problem but I think it falls short of being “easily” fixed.

Read his original post and see if you think the solution is an “easy” one.

Google Search Operators [Improving over Google]

Tuesday, August 13th, 2013

How To Make Good Use Of Google’s Search Operators by Craig Snyder.

From the post:

Some of you might not have the slightest clue what an operator is, in terms of using a search engine. Luckily enough, both Google and MakeUseOf offer some pretty good examples of how to use them with the world’s most popular search engine. In plain English, an operator is a tag that you can include within your Google search to make it more precise and specific.

With operators, you’re able to display results that pertain only to certain websites, search through a range of numbers, or even completely exclude a word from your results. When you master the use of Google’s search engine, finding the answer to nearly anything you can think of is a power that you have right at your fingertips. In this article, let’s make that happen.

8 Google Search Tips To Keep Handy At All Times by Dave Parrack.

From the post:

Google isn’t the only game in town when it comes to search. Alternatives such as Bing, DuckDuckGo, and Wolphram Alpha also provide the tools necessary to search the Web. However, the figures don’t lie, and the figures suggest that the majority of Internet users choose Google over the rest of the competition.

With that in mind it’s important to make sure all of those Google users are utilizing all that Google has to offer when it comes to its search engine. Everyone knows how to conduct a normal search by typing some words and/or a phrase into the box provided and following the links that emerge from the overcrowded fog. But Google Search offers a lot more than just the basics.

If friends or colleagues are using Google, I thought these posts might come in handy.

Speaking of the numbers, as of June 13, 2013, Google’s share of the search market was 66.7 percent. Bing was 17.9%, AOL, Inc. the smallest one listed, was at 1.3%. (What does that say to you about DuckDuckGo and Wolphram Alpha?)

Google’s majority share of the search market should be encouraging to anyone working on alternatives.


Google has left so much room for better search results.

For example, let’s say you find an article and you want to find other articles that rely on it. So you enter the title as a quoted phrase. What do you get back?

If it is a popular article, you may get hundreds of results. You and I both know you are not going to look at every article.

But a number of those articles are just citing the article of interest in a block of citations. Doesn’t have much to do with the results of the article at all.

But Google returns all of those, ranked for sure but you don’t know enough about the ranking to decide if two pages of search results is enough or not. Gold may be waiting on the third page. No way to tell.

Document level search results are just that. Document level search results. You can refine them for yourself but that’s not going to be captured by Google.

What is your example of improvement over the search results we get from Google now?

Keyword Search, Plus a Little Magic

Wednesday, May 15th, 2013

Keyword Search, Plus a Little Magic by Geoffrey Pullum.

From the post:

I promised last week that I would discuss three developments that turned almost-useless language-connected technological capabilities into something seriously useful. The one I want to introduce first was introduced by Google toward the end of the 1990s, and it changed our whole lives, largely eliminating the need for having full sentences parsed and translated into database query language.

The hunch that the founders of Google bet on was that simple keyword search could be made vastly more useful by taking the entire set of pages containing all of the list of search words and not just returning it as the result but rather ranking its members by influentiality and showing the most influential first. What a page contains is not the only relevant thing about it: As with any academic publication, who values it and refers to it is also important. And that is (at least to some extent) revealed in the link structure of the Web.

In his first post, which wasn’t sympathetic to natural language processing, Geoffrey baited his critics into fits of frenzied refutation.

Fits of refutation that failed to note Geoffrey hadn’t completed his posts on natural language processing.

Take the keyword search posting for instance.

I won’t spoil the surprise for you but the fourth fact that Geoffrey says Google relies upon could have serious legs for topic map authoring and interface design.

And not a little insight into what we call natural language processing.

More posts are to follow in this series.

I suggest we savor each one as it appears and after reflection on the whole, sally forth onto the field of verbal combat.

Enabling action: Digging deeper into strategies for learning

Sunday, April 21st, 2013

Enabling action: Digging deeper into strategies for learning by Thom Haller. (Haller, T. (2013), Enabling action: Digging deeper into strategies for learning. Bul. Am. Soc. Info. Sci. Tech., 39: 42–43. doi: 10.1002/bult.2013.1720390413)


A central goal for information architects is to understand how people use information, make choices as they navigate a website and accomplish their objectives. If the goal is learning, we often assume it relates to an end point, a question to answer, a problem to which one applies new understanding. Benjamin Bloom’s 1956 taxonomy of learning breaks down the cognitive process, starting from understanding needs and progressing to action and final evaluation. Carol Kuhlthau’s 1991 outline of the information search process similarly starts with awareness of a need, progresses through exploring options, refining requirements and collecting solutions, and ends with decision making and action. Recognizing the stages of information browsing, learning and action can help information architects build sites that better meet searchers’ needs.

Thom starts with Bloom, cruises by Kahlthau and ends up with Jared Pomranky restating Kuhlthau in: Seeking Knowledge: Denver, Web Design, And The Stages of Learning:

According to Kuhlthau, the six stages of learning are:

  • Initiation — the person becomes aware that they need information. Generally, it’s assumed that visitors to your website have this awareness already, but there are circumstances in which you can generate this kind of awareness as well.
  • Exploration — the person sees the options that are available to choose between. Quite often, especially online, ‘analysis paralysis’ can set in and make a learner quit at this stage because they can’t decide which of the options are worth further pursuit.
  • Formulation — the person sees that they’re going to have to create further requirements before they’re able to make a final selection, and they make decisions to narrow the field. Confidence returns.
  • Collection — the person has clearly articulated their precise needs and is able to evaluate potential solutions. They gather all available solutions and begin to weigh them based on relevant criteria.
  • Action — the person makes their final decision and acts on it based on their understanding.

Many web designers assume that their surfers are at the Collection stage, and craft their entire webpage toward moving their reader from Collection to Action — but statistically, most people are going to be at Exploration or Formulation when they arrive at your site.

Does that mean that you should build a website that encourages people to go read other options and learn more, hoping they’ll return to your site for their Action? Not at all — but it does mean that by understanding what people are looking for at each stage of their learning process, we can design websites that guide them through the whole thing. This, by no coincidence whatsoever, also results in websites and web content that is useful, user-friendly, and entirely Google-appropriate.

We all use models of online behavior, learning if you like, but I would caution against using models disconnected from your users.

Particularly models disconnected from your users and re-interpreted by you as reflecting your users.

A better course would be to study the behavior of your users and to model your content on their behavior.

Otherwise you will be the seekers who: “… came looking for [your users], only to find Zarathustra.” Thus Spake Zarathustra

Google search:… [GDM]

Sunday, April 21st, 2013

Google search: three bugs to fix with better data science by Vincent Granville.

Vincent outlines three issues with Google search results:

  1. Outdated search results
  2. Wrongly attributed articles
  3. Favoring irrelevant pages

See Vincent’s post for advice on how Google can address these issues. (Might help with a Google interview to tell them how to fix such long standing problems.)

More practically, how does your TM application rate on the outdated search results?

Do you just dump content on the user to sort out (the Google dump model (GDM)) or are your results a bit more user friendly?

Leading People to Longer Queries

Thursday, March 14th, 2013

Leading People to Longer Queries by Elena Agapie, Gene Golovchinsky, Pernilla Qvarfordt.


Although longer queries can produce better results for information seeking tasks, people tend to type short queries. We created an interface designed to encourage people to type longer queries, and evaluated it in two Mechanical Turk experiments. Results suggest that our interface manipulation may be effective for eliciting longer queries.

The researchers encouraged longer queries by varying a halo around the search box.

Not conclusive but enough evidence to ask the questions:

What does your search interface encourage?

What other ways could you encourage query construction?

How would you encourage graph queries?

I first saw this in a tweet by Gene Golovchinsky.

Designing for Consumer Search Behaviour [Descriptive vs. Prescriptive]

Sunday, November 25th, 2012

Designing for Consumer Search Behaviour by Tony Russell-Rose.

From the post:

A short while ago I posted the slides to my talk at HCIR 2012 on Designing for Consumer Search Behaviour. Finally, as promised, here is the associated paper, which is co-authored with Stephann Makri (and is available as a pdf in the proceedings). This paper takes the ideas and concepts introduced in A Model of Consumer Search Behaviour and explores their practical design implications. As always, comments and feedback welcome :)


In order to design better search experiences, we need to understand the complexities of human information-seeking behaviour. In this paper, we propose a model of information behavior based on the needs of users of consumer-oriented websites and search applications. The model consists of a set of search modes users employ to satisfy their information search and discovery goals. We present design suggestions for how each of these modes can be supported in existing interactive systems, focusing in particular on those that have been supported in interesting or novel ways.

Tony uses nine (9) categories to classify consumer search behavior:

1. Locate….

2. Verify….

3. Monitor….

4. Compare….

5. Comprehend….

6. Explore….

7. Analyze….

8. Evaluate….

9. Synthesize….

The details will help you be a better search interface designer so see Tony’s post for the details on each category.

My point is that his nine categories are based on observation of and research on, consumer behaviour. A descriptive approach to consumer search behaviour. Not a prescriptive approach to consumer search behaviour.

In some ideal world, perhaps consumers would understand why X is a better approach to Y, but attracting users is done in present world, not an ideal one.

Think of it this way:

Every time an interface requires training of or explanation to a consumer, you have lost a percentage of the potential audience share. Some you may recover but a certain percentage is lost forever.

Ready to go through your latest interface, pencil and paper in hand to add up the training/explanation points?

Matches are the New Hotness

Friday, October 19th, 2012

Matches are the New Hotness by Max De Marzi.

From the post:

match striking image

How do you help a person without a job find one online? A search screen. How do you help a person find love online? A search screen. How do you find which camera to buy online? A search screen. How do you help a sick person self diagnose online? I have no idea, I go to the doctor. Doesn’t matter, what I want to tell you is that there is another way.

Max continues with:

Now, search is great. It usually helps people find what they’re looking for… but sometimes they have to dig through tons of stuff they don’t really want. Why? Because people can usually think of what they want, but not of what they don’t want to come back. So you end up with a tons of results that are not very relevant to your user…. and unless you are one of the major search engines, your search is not very smart. (emphasis added)

I like that, not thinking about what they want to exclude.

And why should they? How do they know how much material is available, at least until they are overwhelmed with search results.

Max walks though using Neo4j to solve this type of problem. By delivering matches, not pages of search results.

He even remarks:

Both the job candidate and job post are thinking about the same things, but if you look at a resume and a job description, you will realize they aren’t speaking the same language. Why not? It’s so obvious it has been driving me crazy for years and was one of the reasons I built Vouched and got into this Graph Database stuff in the first place. So let’s solve this problem with a graph.

I do have a quibble with his solution of “solving” the different language problem, say for job skills with sub-string matching.

What happens if the job seeker lists their skills are including “mapreduce,” and “yarn,” but the ad says “Haoop?” You or I would recognize the need for a match.

I don’t see that in Max’s solution.

Do you?

I posted the gist of this in a comment at Max’s blog.

Visit Max’s post to see his response in full but in short Max favors normalization of data.

Normalization is a choice you can make, but it should not be a default or unconscious one.

Designing for Consumer Search Behaviour (slideshow)

Thursday, October 18th, 2012

Designing for Consumer Search Behaviour (slideshow) by Tony Russell-Rose.

From the post:

Here are the slides from the talk I gave recently at HCIR 2012 on Designing for Consumer Search Behaviour. This presentation is the counterpart to the previous one: while A Model of Consumer Search Behaviour introduced the model and described the analytic work that led to it, this talk looks at the practical design implications. In particular, it addresses the observation that although the information retrieval community is blessed with an abundance of analytic models, only a tiny fraction of these make any impression at all on mainstream UX design practice.

Why is this? In part, this may be simply a reflection of imperfect channels of communication between the respective communities. However, I suspect it may also be a by-product of the way researchers are incentivized: with career progression based almost exclusively on citations in peer-reviewed academic journals, it is hard to see what motivation may be left to encourage adoption by other communities such as design practitioners. Yet from a wider perspective, it is precisely this cross-fertilisation that can make the difference between an idea gathering the dust of citations within a closed community and actually having an impact on the mainstream search experiences that we as consumers all encounter.

I have encounter the “cross-community” question before. A major academic organization where I was employed and a non-profit in the field shared members. For more than a century.

They had no projects in common all that time. Knew about each other, but kept waiting for the “other” one to call first. Eventually did have a project or two together but members of communities tend to stay in those communities.

It is a question of a member’s “comfort” zone. How will members of other community react? Will they be accepting? Judgemental? Once you know, hard to go back to ignorance. Best just to stay at home. Imagine what it would be like “over there.” Less risky.

You might find members of other communities have the same hopes, fears, dreams that you do. Then what? Had to diss others when it means dissing yourself.

A cross-over UX design practitioner/researcher poster day, with lots of finger food, tables for ad hoc conversations/demos, would be a nice way to break the ice between the two communities?

Local Search – How Hard Can It Be? [Unfolding Searches?]

Friday, September 21st, 2012

Local Search – How Hard Can It Be? by Matthew Hurst.

From the post:

This week, Apple got a rude awakening with its initial foray into the world of local search and mapping. The media and user backlash to their iOS upgrade which removes Google as the maps and local search partner and replaces it with their own application (built on licensed data) demonstrates just how important the local scenario is to the mobile space.

While the pundits are reporting various (and sometimes amusing) issues with the data and the search service, it is important to remind ourselves how hard local search can be.

For example, if you search on Google for Key Arena – a major venue in Seattle located in the famous Seattle Center, you will find some severe data quality problems.

See Matthew’s post for the detail but I am mostly interesting in his final observation:

One of the ironies of local data conflation is that landmark entities (like stadia, large complex hotels, hospitals, etc.) tend to have lots of data (everyone knows about them) and lots of complexity (the Seattle Center has lots of things within it that can be confused). These factors conspire to make the most visible entities in some ways the entities more prone to problems.

Every library student is (or should be) familiar with the “reference interview.” A patron asks a question (consider this to be the search request, “Key Arena”) and a librarian uses the reference interview to further identify the information being requested.

Contrast that unfolding of the search request, which at any juncture offers different paths to different goals, with the “if you can identify it, you can find it,” approach of most search engines.

Computers have difficulty searching complex entities such as “Key Arena” successfully. Whereas starting with the same query with a librarian does not.

Doesn’t that suggest to you that “unfolding” searches may be a better model for computer searching than simple identification?

More than static facets, but a presentation of the details most likely to distinguish subjects searched for by users under similar circumstances. Dynamically.

Sounds like the sort of heuristic knowledge that topic maps could capture quite handily.

A Model of Consumer Search Behaviour

Tuesday, September 18th, 2012

A Model of Consumer Search Behaviour by Tony Russell-Rose.

From the post:

A couple of weeks ago I posted the slides to my talk at EuroHCIR on A Model of Consumer Search Behaviour. Finally, as promised, here is the associated paper, which is co-authored with Stephann Makri (and also available as a pdf in the proceedings). I hope it addresses the questions that the slide deck provoked, and provides further food for thought 🙂


In order to design better search experiences, we need to understand the complexities of human information-seeking behaviour. In previous work [13], we proposed a model of information behavior based on an analysis of the information needs of knowledge workers within an enterprise search context. In this paper, we extend this work to the site search context, examining the needs and behaviours of users of consumer-oriented websites and search applications.

We found that site search users presented significantly different information needs to those of enterprise search, implying some key differences in the information behaviours required to satisfy those needs. In particular, the site search users focused more on simple “lookup” activities, contrasting with the more complex, problem-solving behaviours associated with enterprise search. We also found repeating patterns or ‘chains’ of search behaviour in the site search context, but in contrast to the previous study these were shorter and less complex. These patterns can be used as a framework for understanding information seeking behaviour that can be adopted by other researchers who want to take a ‘needs first’ approach to understanding information behaviour.

Take the time to read the paper.

How would you test the results?

Placeholder: Probably beyond the bounds of the topic maps course but a guest lecture on designing UI tests could be very useful for library students. They will be selecting interfaces to be used by patrons and knowing how to test candidate interfaces could be valuable.

Blame Google? Different Strategy: Let’s Blame Users! (Not!)

Saturday, September 15th, 2012

Let me quote from A Simple Guide To Understanding The Searcher Experience by Shari Thurow to start this post:

Web searchers have a responsibility to communicate what they want to find. As a website usability professional, I have the opportunity to observe Web searchers in their natural environments. What I find quite interesting is the “Blame Google” mentality.

I remember a question posed to me during World IA Day this past year. An attendee said that Google constantly gets search results wrong. He used a celebrity’s name as an example.

“I wanted to go to this person’s official website,” he said, “but I never got it in the first page of search results. According to you, it was an informational query. I wanted information about this celebrity.”

I paused. “Well,” I said, “why are you blaming Google when it is clear that you did not communicate what you really wanted?”

“What do you mean?” he said, surprised.

“You just said that you wanted information about this celebrity,” I explained. “You can get that information from a variety of websites. But you also said that you wanted to go to X’s official website. Your intent was clearly navigational. Why didn’t you type in [celebrity name] official website? Then you might have seen your desired website at the top of search results.”

The stunned silence at my response was almost deafening. I broke that silence.

“Don’t blame Google or Yahoo or Bing for your insufficient query formulation,” I said to the audience. “Look in the mirror. Maybe the reason for the poor searcher experience is the person in the mirror…not the search engine.”

People need to learn how to search. Search experts need to teach people how to search. Enough said.

What a novel concept! If the search engine/software doesn’t work, must be the user’s fault!

I can save you a trip down the hall to the marketing department. They are going to tell you that is an insane sales strategy. Satisfying to the geeks in your life but otherwise untenable, from a business perspective.

Remember the stats on using Library of Congress subject headings I posted under Subject Headings and the Semantic Web:

Overall percentages of correct meanings for subject headings in the original order of subdivisions were as follows: children, 32%, adults, 40%, reference 53%, and technical services librarians, 56%.


That is with decades of teaching people to search both manual and automated systems using Library of Congress classification.

Test Question: I have a product to sell. 60% of my all buyers can’t find it with a search engine. Do I:

  • Teach all users everywhere better search techniques?
  • Develop better search engines/interfaces to compensate for potential buyers’ poor searching?

I suspect the “stunned silence” was an audience with greater marketing skills than the speaker.

QRU-1: A Public Dataset…

Saturday, September 8th, 2012

QRU-1: A Public Dataset for Promoting Query Representation and Understanding Research by Hang Li, Gu Xu, W. Bruce Croft, Michael Bendersky, Ziqi Wang and Evelyne Viegas.


A new public dataset for promoting query representation and understanding research, referred to as QRU-1, was recently released by Microsoft Research. The QRU-1 dataset contains reformulations of Web TREC topics that are automatically generated using a large-scale proprietary web search log, without compromising user privacy. In this paper, we describe the content of this dataset and the process of its creation. We also discuss the potential uses of the dataset, including a detailed description of a query reformulation experiment.

And the data set:

Query Representation and Understanding Set

The Query Representation and Understanding (QRU) data set contains a set of similar queries that can be used in web research such as query transformation and relevance ranking. QRU contains similar queries that are related to existing benchmark data sets, such as TREC query sets. The QRU data set was created by extracting 100 TREC queries, training a query-generation model and a commercial search engine, generating similar queries from TREC queries with the model, and removal of mistakenly generated queries.

Are query reformulations in essence different identifications of the subject of a search?

But the issue isn’t “more” search results but rather higher quality search results.

Why search engines bother (other than bragging rights) to report “hits” beyond the ones displayed isn’t clear. Just have a “next N hits” button.

You could consider the number of “hits” you don’t look at as a measure of your search engine’s quality. The higher the number…., well, you know. Could be gold in those “hits” but you will never know. And your current search engine will never say.

A Model of Consumer Search Behaviour (slideshow) [Meta-analysis Anyone?]

Thursday, August 30th, 2012

A Model of Consumer Search Behaviour (slideshow) by Tony Russell-Rose.

From the post:

Here are the slides from the talk I gave at EuroHCIR last week on A Model of Consumer Search Behaviour. This talk extends and validates the taxonomy of information search strategies (aka ‘search modes’) presented at last year’s event, but applies it in this instance to the domain of site search, i.e. consumer-oriented websites and search applications. We found that site search users presented significantly different information needs to those of enterprise search, implying some key differences in the information behaviours required to satisfy those needs.

Every so often I see “meta-analysis” used in medical research that combines the data from several clinical trials.

Are you aware of anyone who has performed a meta-analysis upon search behavior research?

Same question but with regard to computer interfaces more generally?

Social Annotations in Web Search

Wednesday, June 13th, 2012

Social Annotations in Web Search by Aditi Muralidharan,
Zoltan Gyongyi, and Ed H. Chi. (CHI 2012, May 5–10, 2012, Austin, Texas, USA)


We ask how to best present social annotations on search results, and attempt to find an answer through mixed-method eye-tracking and interview experiments. Current practice is anchored on the assumption that faces and names draw attention; the same presentation format is used independently of the social connection strength and the search query topic. The key findings of our experiments indicate room for improvement. First, only certain social contacts are useful sources of information, depending on the search topic. Second, faces lose their well-documented power to draw attention when rendered small as part of a social search result annotation. Third, and perhaps most surprisingly, social annotations go largely unnoticed by users in general due to selective, structured visual parsing behaviors specific to search result pages. We conclude by recommending improvements to the design and content of social annotations to make them more noticeable and useful.

The entire paper is worth your attention but the first paragraph of the conclusion gives much food for thought:

For content, three things are clear: not all friends are equal, not all topics benefit from the inclusion of social annotation, and users prefer different types of information from different people. For presentation, it seems that learned result-reading habits may cause blindness to social annotations. The obvious implication is that we need to adapt the content and presentation of social annotations to the specialized environment of web search.

The complexity and sublty of semantics on human side keeps bumping into the search/annotate with a hammer on the computer side.

Or as the authors say: “…users prefer different types of information from different people.”

Search engineers/designers who use their preferences/intuitions as the designs to push out to the larger user universe are always going to fall short.

Because all users have their own preferences and intuitions about searching and parsing search results. What is so surprising about that?

I have had discussions with programmers who would say: “But it will be better for users to do X (as opposed to Y) in the interface.”

Know what? Users are the only measure of the fitness of an interface or success of a search result.

A “pull” model (user preferences) based search engine will gut all existing (“push” model, engineer/programmer preference) search engines.

PS: You won’t discover the range of user preferences by study groups with 11 participants. Ask one of the national survey companies and have them select several thousand participants. Then refine which preferences get used the most. Won’t happen overnight but every precentage gain will be one the existing search engines won’t regain.

PPS: Speaking of interfaces, I would pay for a web browser that put webpages back under my control (the early WWW model).

Enabling me to defeat those awful “page is loading” ads from major IT vendors who should know better. As well as strip other crap out. It is a data stream that is being parsed. I should be able to clean it up before viewing. That could be a real “hit” and make page load times faster.

I first saw this article in a list of links from Greg Linden.

A Taxonomy of Site Search

Wednesday, June 6th, 2012

A Taxonomy of Site Search by Tony Russell-Rose.

From the post:

Here are the slides from the talk I gave at Enterprise Search Europe last week on A Taxonomy of Site Search. This talk extends and validates the taxonomy of information search strategies (aka ‘search modes’) presented at last year’s event, and reviews some of their implications for design. But this year we looked specifically at site search rather than enterprise search, and explored the key differences in user needs and behaviours between the two domains. [see Tony’s post for the slides]

There is a lot to be learned (and put to use) from investigations of search behavior.

Popular Queries

Friday, May 25th, 2012

Popular Queries by Hugh Williams.

From the post:

I downloaded the (infamous) AOL query logs a few days back, so I could explore caching in search. Here’s a few things I learnt about popular queries along the way.

There isn’t that much user search data around so I thought it would be worth recording the post for that reason if no other.

Designing Search (part 4): Displaying results

Thursday, May 17th, 2012

Designing Search (part 4): Displaying results

Tony Russell-Rose writes:

In an earlier post we reviewed the various ways in which an information need may be articulated, focusing on its expression via some form of query. In this post we consider ways in which the response can be articulated, focusing on its expression as a set of search results. Together, these two elements lie at the heart of the search experience, defining and shaping much of the information seeking dialogue. We begin therefore by examining the most universal of elements within that response: the search result.

As usual, Tony does a great job of illustrating your choices and trade-offs in presentation of search results. Highly recommended.

I am curious since Tony refers to it as an “information seeking dialogue,” has anyone mapped reference interview approaches to search interfaces? I suspect that is just my ignorance of the literature on that subject so would appreciate any pointers you can throw my way.

I would update Tony’s bibliography:

Marti Hearst (2009) Search User Interfaces. Cambridge University Press

Online as full text:

History matters

Tuesday, May 15th, 2012

History matters by Gene Golovchinsky.

Whose history? Your history. Your search history. Visualized.

Interested? Read more:

Exploratory search is an uncertain endeavor. Quite often, people don’t know exactly how to express their information need, and that need may evolve over time as information is discovered and understood. This is not news.

When people search for information, they often run multiple queries to get at different aspects of the information need, to gain a better understanding of the collection, or to incorporate newly-found information into their searches. This too is not news.

The multiple queries that people run may well retrieve some of the same documents. In some cases, there may be little or no overlap between query results; at other times, the overlap may be considerable. Yet most search engines treat each query as an independent event, and leave it to the searcher to make sense of the results. This, to me, is an opportunity.

Design goal: Help people plan future actions by understanding the present in the context of the past.

While web search engines such as Bing make it easy for people to re-visit some recent queries, and early systems such as Dialog allowed Boolean queries to be constructed by combining results of previously-executed queries, these approaches do not help people make sense of the retrieval histories of specific documents with respect to a particular information need. There is nothing new under the sun, however: Mark Sanderson’s NRT system flagged documents as having been previously retrieved for a given search task, VOIR used retrieval histograms for each document, and of course a browser maintains a limited history of activity to indicate which links were followed.

Our recent work in Querium (see here and here) seeks to explore this space further by providing searchers with tools that reflect patterns of retrieval of specific documents within a search mission.

Even more interested? Read Gene’s post in full.

If not, check your pulse.

Google in the World of Academic Research (Lead by Example?)

Thursday, April 5th, 2012

Google in the World of Academic Research by Whitney Grace.

From the post:

Librarians, teachers, and college professors all press their students not to use Google to research their projects, papers, and homework, but it is a dying battle. All students have to do is type in a few key terms and millions of results are displayed. The average student or person, for that matter, is not going to scour through every single result. If they do not find what they need, they simply rethink their initial key words and hit the search button again.

The Hindu recently wrote about, “Of Google and Scholarly Search,” the troubles researchers face when they only use Google and makes several suggestions for alternate search engines and databases.

The perennial complaint (academics used to debate the perennial philosophy, now the perennial complaint).

Is Google responsible for superficial searching and consequently superficial results?

Or do superficial Google results reflect our failure to train students in “doing” research?

What research models do students have to follow? In terms of research behavior?

In my next course, I will do a research problem by example. Good as well as bad results. What worked and what didn’t. And yes, Google will be in the mix of methods.

Why not? With four and five work queries and domain knowledge, I get pretty good results from Google. You?

Designing Search (part 3): Keeping on track

Tuesday, March 20th, 2012

Designing Search (part 3): Keeping on track by Tony Russell-Rose

From the post:

In the previous post we looked at techniques to help us create and articulate more effective queries. From auto-complete for lookup tasks to auto-suggest for exploratory search, these simple techniques can often make the difference between success and failure.

But occasionally things do go wrong. Sometimes our information journey is more complex than we’d anticipated, and we find ourselves straying off the ideal course. Worse still, in our determination to pursue our original goal, we may overlook other, more productive directions, leaving us endlessly finessing a flawed strategy. Sometimes we are in too deep to turn around and start again.

(graphic omitted)

Conversely, there are times when we may consciously decide to take a detour and explore the path less trodden. As we saw earlier, what we find along the way can change what we seek. Sometimes we find the most valuable discoveries in the most unlikely places.

However, there’s a fine line between these two outcomes: one person’s journey of serendipitous discovery can be another’s descent into confusion and disorientation. And there’s the challenge: how can we support the former, while unobtrusively repairing the latter? In this post, we’ll look at four techniques that help us keep to the right path on our information journey.

Whether you are writing a search interface or simply want to know more about what factors to consider in evaluating a search interface, this series by Tony Russell-Rose is well worth your time.

If you are writing a topic map, you already have as a goal the collection of information for some purpose. It would be sad if the information you collect isn’t findable due to poor interface design.

Designing Search (part 1): Entering the query

Thursday, January 19th, 2012

Designing Search (part 1): Entering the query by Tony Russell-Rose.

From the post:

In an earlier post we reviewed models of information seeking, from an early focus on documents and queries through to a more nuanced understanding of search as an information journey driven by dynamic information needs. While each model emphasizes different aspects of the search process, what they share is the principle that search begins with an information need which is articulated in some form of query. What follows below is the first in a mini-series of articles exploring the process of query formulation, starting with the most ubiquitous of design elements: the search box.

If you are designing or using search interfaces, you will benefit from reading this post.

Suggestion: Don’t jump to the summary and best practices. Tony’s analysis is just as informative as the conclusions he reaches.

Relevance Tuning and Competitive Advantage via Search Analytics

Sunday, January 8th, 2012

Relevance Tuning and Competitive Advantage via Search Analytics

It must be all the “critical” evaluation of infographics I have been reading but I found myself wondering about the following paragraph:

This slide shows how Search Analytics can be used to help with A/B testing. Concretely, in this slide we see two Solr Dismax handlers selected on the right side. If you are not familiar with Solr, think of a Dismax handler as an API that search applications call to execute searches. In this example, each Dismax handler is configured differently and thus each of them ranks search hits slightly differently. On the graph we see the MRR (see Wikipedia page for Mean Reciprocal Rank details) for both Dismax handlers and we can see that the one corresponding to the blue line is performing much better. That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one. Once you have a system like this in place you can add more Dismax handlers and compare 2 or more of them at a time. As the result, with the help of Search Analytics you get actual, real feedback about any changes you make to your search engine. Without a tool like this, you cannot really tune your search engine’s relevance well and will be doing it blindly.

Particularly the line:

That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one.


Here is one way to test that assumption:

Report for any search as the #1 or #2 result, “private cell-phone number for …” and pick one of the top ten movie actresses for 2011. And you can do better than that, make sure the cell-phone number is one that rings at your search analytics desk. Now see how many users are “…clicking on search hits closer to the top of the search results page….”

Are your results more relevant than a movie star?

Don’t get me wrong, search analytics are very important, but let’s not get carried away about what we can infer from largely opaque actions.

Some other questions: Did users find the information they needed? Can they make use of that information? Does that use improve some measurable or important aspect of the company business? Let’s broaden search analytics to make search results less opaque.