Archive for the ‘Search Behavior’ Category

Keyword Search, Plus a Little Magic

Wednesday, May 15th, 2013

Keyword Search, Plus a Little Magic by Geoffrey Pullum.

From the post:

I promised last week that I would discuss three developments that turned almost-useless language-connected technological capabilities into something seriously useful. The one I want to introduce first was introduced by Google toward the end of the 1990s, and it changed our whole lives, largely eliminating the need for having full sentences parsed and translated into database query language.

The hunch that the founders of Google bet on was that simple keyword search could be made vastly more useful by taking the entire set of pages containing all of the list of search words and not just returning it as the result but rather ranking its members by influentiality and showing the most influential first. What a page contains is not the only relevant thing about it: As with any academic publication, who values it and refers to it is also important. And that is (at least to some extent) revealed in the link structure of the Web.

In his first post, which wasn’t sympathetic to natural language processing, Geoffrey baited his critics into fits of frenzied refutation.

Fits of refutation that failed to note Geoffrey hadn’t completed his posts on natural language processing.

Take the keyword search posting for instance.

I won’t spoil the surprise for you but the fourth fact that Geoffrey says Google relies upon could have serious legs for topic map authoring and interface design.

And not a little insight into what we call natural language processing.

More posts are to follow in this series.

I suggest we savor each one as it appears and after reflection on the whole, sally forth onto the field of verbal combat.

Enabling action: Digging deeper into strategies for learning

Sunday, April 21st, 2013

Enabling action: Digging deeper into strategies for learning by Thom Haller. (Haller, T. (2013), Enabling action: Digging deeper into strategies for learning. Bul. Am. Soc. Info. Sci. Tech., 39: 42–43. doi: 10.1002/bult.2013.1720390413)

Abstract:

A central goal for information architects is to understand how people use information, make choices as they navigate a website and accomplish their objectives. If the goal is learning, we often assume it relates to an end point, a question to answer, a problem to which one applies new understanding. Benjamin Bloom’s 1956 taxonomy of learning breaks down the cognitive process, starting from understanding needs and progressing to action and final evaluation. Carol Kuhlthau’s 1991 outline of the information search process similarly starts with awareness of a need, progresses through exploring options, refining requirements and collecting solutions, and ends with decision making and action. Recognizing the stages of information browsing, learning and action can help information architects build sites that better meet searchers’ needs.

Thom starts with Bloom, cruises by Kahlthau and ends up with Jared Pomranky restating Kuhlthau in: Seeking Knowledge: Denver, Web Design, And The Stages of Learning:

According to Kuhlthau, the six stages of learning are:

  • Initiation — the person becomes aware that they need information. Generally, it’s assumed that visitors to your website have this awareness already, but there are circumstances in which you can generate this kind of awareness as well.
  • Exploration — the person sees the options that are available to choose between. Quite often, especially online, ‘analysis paralysis’ can set in and make a learner quit at this stage because they can’t decide which of the options are worth further pursuit.
  • Formulation — the person sees that they’re going to have to create further requirements before they’re able to make a final selection, and they make decisions to narrow the field. Confidence returns.
  • Collection — the person has clearly articulated their precise needs and is able to evaluate potential solutions. They gather all available solutions and begin to weigh them based on relevant criteria.
  • Action — the person makes their final decision and acts on it based on their understanding.

Many web designers assume that their surfers are at the Collection stage, and craft their entire webpage toward moving their reader from Collection to Action — but statistically, most people are going to be at Exploration or Formulation when they arrive at your site.

Does that mean that you should build a website that encourages people to go read other options and learn more, hoping they’ll return to your site for their Action? Not at all — but it does mean that by understanding what people are looking for at each stage of their learning process, we can design websites that guide them through the whole thing. This, by no coincidence whatsoever, also results in websites and web content that is useful, user-friendly, and entirely Google-appropriate.

We all use models of online behavior, learning if you like, but I would caution against using models disconnected from your users.

Particularly models disconnected from your users and re-interpreted by you as reflecting your users.

A better course would be to study the behavior of your users and to model your content on their behavior.

Otherwise you will be the seekers who: “… came looking for [your users], only to find Zarathustra.” Thus Spake Zarathustra

Google search:… [GDM]

Sunday, April 21st, 2013

Google search: three bugs to fix with better data science by Vincent Granville.

Vincent outlines three issues with Google search results:

  1. Outdated search results
  2. Wrongly attributed articles
  3. Favoring irrelevant pages

See Vincent’s post for advice on how Google can address these issues. (Might help with a Google interview to tell them how to fix such long standing problems.)

More practically, how does your TM application rate on the outdated search results?

Do you just dump content on the user to sort out (the Google dump model (GDM)) or are your results a bit more user friendly?

Leading People to Longer Queries

Thursday, March 14th, 2013

Leading People to Longer Queries by Elena Agapie, Gene Golovchinsky, Pernilla Qvarfordt.

Abstract:

Although longer queries can produce better results for information seeking tasks, people tend to type short queries. We created an interface designed to encourage people to type longer queries, and evaluated it in two Mechanical Turk experiments. Results suggest that our interface manipulation may be effective for eliciting longer queries.

The researchers encouraged longer queries by varying a halo around the search box.

Not conclusive but enough evidence to ask the questions:

What does your search interface encourage?

What other ways could you encourage query construction?

How would you encourage graph queries?

I first saw this in a tweet by Gene Golovchinsky.

Designing for Consumer Search Behaviour [Descriptive vs. Prescriptive]

Sunday, November 25th, 2012

Designing for Consumer Search Behaviour by Tony Russell-Rose.

From the post:

A short while ago I posted the slides to my talk at HCIR 2012 on Designing for Consumer Search Behaviour. Finally, as promised, here is the associated paper, which is co-authored with Stephann Makri (and is available as a pdf in the proceedings). This paper takes the ideas and concepts introduced in A Model of Consumer Search Behaviour and explores their practical design implications. As always, comments and feedback welcome :)

ABSTRACT

In order to design better search experiences, we need to understand the complexities of human information-seeking behaviour. In this paper, we propose a model of information behavior based on the needs of users of consumer-oriented websites and search applications. The model consists of a set of search modes users employ to satisfy their information search and discovery goals. We present design suggestions for how each of these modes can be supported in existing interactive systems, focusing in particular on those that have been supported in interesting or novel ways.

Tony uses nine (9) categories to classify consumer search behavior:

1. Locate….

2. Verify….

3. Monitor….

4. Compare….

5. Comprehend….

6. Explore….

7. Analyze….

8. Evaluate….

9. Synthesize….

The details will help you be a better search interface designer so see Tony’s post for the details on each category.

My point is that his nine categories are based on observation of and research on, consumer behaviour. A descriptive approach to consumer search behaviour. Not a prescriptive approach to consumer search behaviour.

In some ideal world, perhaps consumers would understand why X is a better approach to Y, but attracting users is done in present world, not an ideal one.

Think of it this way:

Every time an interface requires training of or explanation to a consumer, you have lost a percentage of the potential audience share. Some you may recover but a certain percentage is lost forever.

Ready to go through your latest interface, pencil and paper in hand to add up the training/explanation points?

Matches are the New Hotness

Friday, October 19th, 2012

Matches are the New Hotness by Max De Marzi.

From the post:

match striking image

How do you help a person without a job find one online? A search screen. How do you help a person find love online? A search screen. How do you find which camera to buy online? A search screen. How do you help a sick person self diagnose online? I have no idea, I go to the doctor. Doesn’t matter, what I want to tell you is that there is another way.

Max continues with:

Now, search is great. It usually helps people find what they’re looking for… but sometimes they have to dig through tons of stuff they don’t really want. Why? Because people can usually think of what they want, but not of what they don’t want to come back. So you end up with a tons of results that are not very relevant to your user…. and unless you are one of the major search engines, your search is not very smart. (emphasis added)

I like that, not thinking about what they want to exclude.

And why should they? How do they know how much material is available, at least until they are overwhelmed with search results.

Max walks though using Neo4j to solve this type of problem. By delivering matches, not pages of search results.

He even remarks:

Both the job candidate and job post are thinking about the same things, but if you look at a resume and a job description, you will realize they aren’t speaking the same language. Why not? It’s so obvious it has been driving me crazy for years and was one of the reasons I built Vouched and got into this Graph Database stuff in the first place. So let’s solve this problem with a graph.

I do have a quibble with his solution of “solving” the different language problem, say for job skills with sub-string matching.

What happens if the job seeker lists their skills are including “mapreduce,” and “yarn,” but the ad says “Haoop?” You or I would recognize the need for a match.

I don’t see that in Max’s solution.

Do you?


I posted the gist of this in a comment at Max’s blog.

Visit Max’s post to see his response in full but in short Max favors normalization of data.

Normalization is a choice you can make, but it should not be a default or unconscious one.

Designing for Consumer Search Behaviour (slideshow)

Thursday, October 18th, 2012

Designing for Consumer Search Behaviour (slideshow) by Tony Russell-Rose.

From the post:

Here are the slides from the talk I gave recently at HCIR 2012 on Designing for Consumer Search Behaviour. This presentation is the counterpart to the previous one: while A Model of Consumer Search Behaviour introduced the model and described the analytic work that led to it, this talk looks at the practical design implications. In particular, it addresses the observation that although the information retrieval community is blessed with an abundance of analytic models, only a tiny fraction of these make any impression at all on mainstream UX design practice.

Why is this? In part, this may be simply a reflection of imperfect channels of communication between the respective communities. However, I suspect it may also be a by-product of the way researchers are incentivized: with career progression based almost exclusively on citations in peer-reviewed academic journals, it is hard to see what motivation may be left to encourage adoption by other communities such as design practitioners. Yet from a wider perspective, it is precisely this cross-fertilisation that can make the difference between an idea gathering the dust of citations within a closed community and actually having an impact on the mainstream search experiences that we as consumers all encounter.

I have encounter the “cross-community” question before. A major academic organization where I was employed and a non-profit in the field shared members. For more than a century.

They had no projects in common all that time. Knew about each other, but kept waiting for the “other” one to call first. Eventually did have a project or two together but members of communities tend to stay in those communities.

It is a question of a member’s “comfort” zone. How will members of other community react? Will they be accepting? Judgemental? Once you know, hard to go back to ignorance. Best just to stay at home. Imagine what it would be like “over there.” Less risky.

You might find members of other communities have the same hopes, fears, dreams that you do. Then what? Had to diss others when it means dissing yourself.

A cross-over UX design practitioner/researcher poster day, with lots of finger food, tables for ad hoc conversations/demos, would be a nice way to break the ice between the two communities?

Local Search – How Hard Can It Be? [Unfolding Searches?]

Friday, September 21st, 2012

Local Search – How Hard Can It Be? by Matthew Hurst.

From the post:

This week, Apple got a rude awakening with its initial foray into the world of local search and mapping. The media and user backlash to their iOS upgrade which removes Google as the maps and local search partner and replaces it with their own application (built on licensed data) demonstrates just how important the local scenario is to the mobile space.

While the pundits are reporting various (and sometimes amusing) issues with the data and the search service, it is important to remind ourselves how hard local search can be.

For example, if you search on Google for Key Arena – a major venue in Seattle located in the famous Seattle Center, you will find some severe data quality problems.

See Matthew’s post for the detail but I am mostly interesting in his final observation:

One of the ironies of local data conflation is that landmark entities (like stadia, large complex hotels, hospitals, etc.) tend to have lots of data (everyone knows about them) and lots of complexity (the Seattle Center has lots of things within it that can be confused). These factors conspire to make the most visible entities in some ways the entities more prone to problems.

Every library student is (or should be) familiar with the “reference interview.” A patron asks a question (consider this to be the search request, “Key Arena”) and a librarian uses the reference interview to further identify the information being requested.

Contrast that unfolding of the search request, which at any juncture offers different paths to different goals, with the “if you can identify it, you can find it,” approach of most search engines.

Computers have difficulty searching complex entities such as “Key Arena” successfully. Whereas starting with the same query with a librarian does not.

Doesn’t that suggest to you that “unfolding” searches may be a better model for computer searching than simple identification?

More than static facets, but a presentation of the details most likely to distinguish subjects searched for by users under similar circumstances. Dynamically.

Sounds like the sort of heuristic knowledge that topic maps could capture quite handily.

A Model of Consumer Search Behaviour

Tuesday, September 18th, 2012

A Model of Consumer Search Behaviour by Tony Russell-Rose.

From the post:

A couple of weeks ago I posted the slides to my talk at EuroHCIR on A Model of Consumer Search Behaviour. Finally, as promised, here is the associated paper, which is co-authored with Stephann Makri (and also available as a pdf in the proceedings). I hope it addresses the questions that the slide deck provoked, and provides further food for thought :)

ABSTRACT

In order to design better search experiences, we need to understand the complexities of human information-seeking behaviour. In previous work [13], we proposed a model of information behavior based on an analysis of the information needs of knowledge workers within an enterprise search context. In this paper, we extend this work to the site search context, examining the needs and behaviours of users of consumer-oriented websites and search applications.

We found that site search users presented significantly different information needs to those of enterprise search, implying some key differences in the information behaviours required to satisfy those needs. In particular, the site search users focused more on simple “lookup” activities, contrasting with the more complex, problem-solving behaviours associated with enterprise search. We also found repeating patterns or ‘chains’ of search behaviour in the site search context, but in contrast to the previous study these were shorter and less complex. These patterns can be used as a framework for understanding information seeking behaviour that can be adopted by other researchers who want to take a ‘needs first’ approach to understanding information behaviour.

Take the time to read the paper.

How would you test the results?

Placeholder: Probably beyond the bounds of the topic maps course but a guest lecture on designing UI tests could be very useful for library students. They will be selecting interfaces to be used by patrons and knowing how to test candidate interfaces could be valuable.

Blame Google? Different Strategy: Let’s Blame Users! (Not!)

Saturday, September 15th, 2012

Let me quote from A Simple Guide To Understanding The Searcher Experience by Shari Thurow to start this post:

Web searchers have a responsibility to communicate what they want to find. As a website usability professional, I have the opportunity to observe Web searchers in their natural environments. What I find quite interesting is the “Blame Google” mentality.

I remember a question posed to me during World IA Day this past year. An attendee said that Google constantly gets search results wrong. He used a celebrity’s name as an example.

“I wanted to go to this person’s official website,” he said, “but I never got it in the first page of search results. According to you, it was an informational query. I wanted information about this celebrity.”

I paused. “Well,” I said, “why are you blaming Google when it is clear that you did not communicate what you really wanted?”

“What do you mean?” he said, surprised.

“You just said that you wanted information about this celebrity,” I explained. “You can get that information from a variety of websites. But you also said that you wanted to go to X’s official website. Your intent was clearly navigational. Why didn’t you type in [celebrity name] official website? Then you might have seen your desired website at the top of search results.”

The stunned silence at my response was almost deafening. I broke that silence.

“Don’t blame Google or Yahoo or Bing for your insufficient query formulation,” I said to the audience. “Look in the mirror. Maybe the reason for the poor searcher experience is the person in the mirror…not the search engine.”

People need to learn how to search. Search experts need to teach people how to search. Enough said.

What a novel concept! If the search engine/software doesn’t work, must be the user’s fault!

I can save you a trip down the hall to the marketing department. They are going to tell you that is an insane sales strategy. Satisfying to the geeks in your life but otherwise untenable, from a business perspective.

Remember the stats on using Library of Congress subject headings I posted under Subject Headings and the Semantic Web:

Overall percentages of correct meanings for subject headings in the original order of subdivisions were as follows: children, 32%, adults, 40%, reference 53%, and technical services librarians, 56%.

?

That is with decades of teaching people to search both manual and automated systems using Library of Congress classification.

Test Question: I have a product to sell. 60% of my all buyers can’t find it with a search engine. Do I:

  • Teach all users everywhere better search techniques?
  • Develop better search engines/interfaces to compensate for potential buyers’ poor searching?

I suspect the “stunned silence” was an audience with greater marketing skills than the speaker.

QRU-1: A Public Dataset…

Saturday, September 8th, 2012

QRU-1: A Public Dataset for Promoting Query Representation and Understanding Research by Hang Li, Gu Xu, W. Bruce Croft, Michael Bendersky, Ziqi Wang and Evelyne Viegas.

ABSTRACT

A new public dataset for promoting query representation and understanding research, referred to as QRU-1, was recently released by Microsoft Research. The QRU-1 dataset contains reformulations of Web TREC topics that are automatically generated using a large-scale proprietary web search log, without compromising user privacy. In this paper, we describe the content of this dataset and the process of its creation. We also discuss the potential uses of the dataset, including a detailed description of a query reformulation experiment.

And the data set:

Query Representation and Understanding Set

The Query Representation and Understanding (QRU) data set contains a set of similar queries that can be used in web research such as query transformation and relevance ranking. QRU contains similar queries that are related to existing benchmark data sets, such as TREC query sets. The QRU data set was created by extracting 100 TREC queries, training a query-generation model and a commercial search engine, generating similar queries from TREC queries with the model, and removal of mistakenly generated queries.

Are query reformulations in essence different identifications of the subject of a search?

But the issue isn’t “more” search results but rather higher quality search results.

Why search engines bother (other than bragging rights) to report “hits” beyond the ones displayed isn’t clear. Just have a “next N hits” button.

You could consider the number of “hits” you don’t look at as a measure of your search engine’s quality. The higher the number…., well, you know. Could be gold in those “hits” but you will never know. And your current search engine will never say.

A Model of Consumer Search Behaviour (slideshow) [Meta-analysis Anyone?]

Thursday, August 30th, 2012

A Model of Consumer Search Behaviour (slideshow) by Tony Russell-Rose.

From the post:

Here are the slides from the talk I gave at EuroHCIR last week on A Model of Consumer Search Behaviour. This talk extends and validates the taxonomy of information search strategies (aka ‘search modes’) presented at last year’s event, but applies it in this instance to the domain of site search, i.e. consumer-oriented websites and search applications. We found that site search users presented significantly different information needs to those of enterprise search, implying some key differences in the information behaviours required to satisfy those needs.

Every so often I see “meta-analysis” used in medical research that combines the data from several clinical trials.

Are you aware of anyone who has performed a meta-analysis upon search behavior research?

Same question but with regard to computer interfaces more generally?

Social Annotations in Web Search

Wednesday, June 13th, 2012

Social Annotations in Web Search by Aditi Muralidharan,
Zoltan Gyongyi, and Ed H. Chi. (CHI 2012, May 5–10, 2012, Austin, Texas, USA)

Abstract:

We ask how to best present social annotations on search results, and attempt to find an answer through mixed-method eye-tracking and interview experiments. Current practice is anchored on the assumption that faces and names draw attention; the same presentation format is used independently of the social connection strength and the search query topic. The key findings of our experiments indicate room for improvement. First, only certain social contacts are useful sources of information, depending on the search topic. Second, faces lose their well-documented power to draw attention when rendered small as part of a social search result annotation. Third, and perhaps most surprisingly, social annotations go largely unnoticed by users in general due to selective, structured visual parsing behaviors specific to search result pages. We conclude by recommending improvements to the design and content of social annotations to make them more noticeable and useful.

The entire paper is worth your attention but the first paragraph of the conclusion gives much food for thought:

For content, three things are clear: not all friends are equal, not all topics benefit from the inclusion of social annotation, and users prefer different types of information from different people. For presentation, it seems that learned result-reading habits may cause blindness to social annotations. The obvious implication is that we need to adapt the content and presentation of social annotations to the specialized environment of web search.

The complexity and sublty of semantics on human side keeps bumping into the search/annotate with a hammer on the computer side.

Or as the authors say: “…users prefer different types of information from different people.”

Search engineers/designers who use their preferences/intuitions as the designs to push out to the larger user universe are always going to fall short.

Because all users have their own preferences and intuitions about searching and parsing search results. What is so surprising about that?

I have had discussions with programmers who would say: “But it will be better for users to do X (as opposed to Y) in the interface.”

Know what? Users are the only measure of the fitness of an interface or success of a search result.

A “pull” model (user preferences) based search engine will gut all existing (“push” model, engineer/programmer preference) search engines.


PS: You won’t discover the range of user preferences by study groups with 11 participants. Ask one of the national survey companies and have them select several thousand participants. Then refine which preferences get used the most. Won’t happen overnight but every precentage gain will be one the existing search engines won’t regain.

PPS: Speaking of interfaces, I would pay for a web browser that put webpages back under my control (the early WWW model).

Enabling me to defeat those awful “page is loading” ads from major IT vendors who should know better. As well as strip other crap out. It is a data stream that is being parsed. I should be able to clean it up before viewing. That could be a real “hit” and make page load times faster.

I first saw this article in a list of links from Greg Linden.

A Taxonomy of Site Search

Wednesday, June 6th, 2012

A Taxonomy of Site Search by Tony Russell-Rose.

From the post:

Here are the slides from the talk I gave at Enterprise Search Europe last week on A Taxonomy of Site Search. This talk extends and validates the taxonomy of information search strategies (aka ‘search modes’) presented at last year’s event, and reviews some of their implications for design. But this year we looked specifically at site search rather than enterprise search, and explored the key differences in user needs and behaviours between the two domains. [see Tony's post for the slides]

There is a lot to be learned (and put to use) from investigations of search behavior.

Popular Queries

Friday, May 25th, 2012

Popular Queries by Hugh Williams.

From the post:

I downloaded the (infamous) AOL query logs a few days back, so I could explore caching in search. Here’s a few things I learnt about popular queries along the way.

There isn’t that much user search data around so I thought it would be worth recording the post for that reason if no other.

Designing Search (part 4): Displaying results

Thursday, May 17th, 2012

Designing Search (part 4): Displaying results

Tony Russell-Rose writes:

In an earlier post we reviewed the various ways in which an information need may be articulated, focusing on its expression via some form of query. In this post we consider ways in which the response can be articulated, focusing on its expression as a set of search results. Together, these two elements lie at the heart of the search experience, defining and shaping much of the information seeking dialogue. We begin therefore by examining the most universal of elements within that response: the search result.

As usual, Tony does a great job of illustrating your choices and trade-offs in presentation of search results. Highly recommended.

I am curious since Tony refers to it as an “information seeking dialogue,” has anyone mapped reference interview approaches to search interfaces? I suspect that is just my ignorance of the literature on that subject so would appreciate any pointers you can throw my way.

I would update Tony’s bibliography:

Marti Hearst (2009) Search User Interfaces. Cambridge University Press

Online as full text: http://searchuserinterfaces.com/

History matters

Tuesday, May 15th, 2012

History matters by Gene Golovchinsky.

Whose history? Your history. Your search history. Visualized.

Interested? Read more:

Exploratory search is an uncertain endeavor. Quite often, people don’t know exactly how to express their information need, and that need may evolve over time as information is discovered and understood. This is not news.

When people search for information, they often run multiple queries to get at different aspects of the information need, to gain a better understanding of the collection, or to incorporate newly-found information into their searches. This too is not news.

The multiple queries that people run may well retrieve some of the same documents. In some cases, there may be little or no overlap between query results; at other times, the overlap may be considerable. Yet most search engines treat each query as an independent event, and leave it to the searcher to make sense of the results. This, to me, is an opportunity.

Design goal: Help people plan future actions by understanding the present in the context of the past.

While web search engines such as Bing make it easy for people to re-visit some recent queries, and early systems such as Dialog allowed Boolean queries to be constructed by combining results of previously-executed queries, these approaches do not help people make sense of the retrieval histories of specific documents with respect to a particular information need. There is nothing new under the sun, however: Mark Sanderson’s NRT system flagged documents as having been previously retrieved for a given search task, VOIR used retrieval histograms for each document, and of course a browser maintains a limited history of activity to indicate which links were followed.

Our recent work in Querium (see here and here) seeks to explore this space further by providing searchers with tools that reflect patterns of retrieval of specific documents within a search mission.

Even more interested? Read Gene’s post in full.

If not, check your pulse.

Google in the World of Academic Research (Lead by Example?)

Thursday, April 5th, 2012

Google in the World of Academic Research by Whitney Grace.

From the post:

Librarians, teachers, and college professors all press their students not to use Google to research their projects, papers, and homework, but it is a dying battle. All students have to do is type in a few key terms and millions of results are displayed. The average student or person, for that matter, is not going to scour through every single result. If they do not find what they need, they simply rethink their initial key words and hit the search button again.

The Hindu recently wrote about, “Of Google and Scholarly Search,” the troubles researchers face when they only use Google and makes several suggestions for alternate search engines and databases.

The perennial complaint (academics used to debate the perennial philosophy, now the perennial complaint).

Is Google responsible for superficial searching and consequently superficial results?

Or do superficial Google results reflect our failure to train students in “doing” research?

What research models do students have to follow? In terms of research behavior?

In my next course, I will do a research problem by example. Good as well as bad results. What worked and what didn’t. And yes, Google will be in the mix of methods.

Why not? With four and five work queries and domain knowledge, I get pretty good results from Google. You?

Designing Search (part 3): Keeping on track

Tuesday, March 20th, 2012

Designing Search (part 3): Keeping on track by Tony Russell-Rose

From the post:

In the previous post we looked at techniques to help us create and articulate more effective queries. From auto-complete for lookup tasks to auto-suggest for exploratory search, these simple techniques can often make the difference between success and failure.

But occasionally things do go wrong. Sometimes our information journey is more complex than we’d anticipated, and we find ourselves straying off the ideal course. Worse still, in our determination to pursue our original goal, we may overlook other, more productive directions, leaving us endlessly finessing a flawed strategy. Sometimes we are in too deep to turn around and start again.

(graphic omitted)

Conversely, there are times when we may consciously decide to take a detour and explore the path less trodden. As we saw earlier, what we find along the way can change what we seek. Sometimes we find the most valuable discoveries in the most unlikely places.

However, there’s a fine line between these two outcomes: one person’s journey of serendipitous discovery can be another’s descent into confusion and disorientation. And there’s the challenge: how can we support the former, while unobtrusively repairing the latter? In this post, we’ll look at four techniques that help us keep to the right path on our information journey.

Whether you are writing a search interface or simply want to know more about what factors to consider in evaluating a search interface, this series by Tony Russell-Rose is well worth your time.

If you are writing a topic map, you already have as a goal the collection of information for some purpose. It would be sad if the information you collect isn’t findable due to poor interface design.

Designing Search (part 1): Entering the query

Thursday, January 19th, 2012

Designing Search (part 1): Entering the query by Tony Russell-Rose.

From the post:

In an earlier post we reviewed models of information seeking, from an early focus on documents and queries through to a more nuanced understanding of search as an information journey driven by dynamic information needs. While each model emphasizes different aspects of the search process, what they share is the principle that search begins with an information need which is articulated in some form of query. What follows below is the first in a mini-series of articles exploring the process of query formulation, starting with the most ubiquitous of design elements: the search box.

If you are designing or using search interfaces, you will benefit from reading this post.

Suggestion: Don’t jump to the summary and best practices. Tony’s analysis is just as informative as the conclusions he reaches.

Relevance Tuning and Competitive Advantage via Search Analytics

Sunday, January 8th, 2012

Relevance Tuning and Competitive Advantage via Search Analytics

It must be all the “critical” evaluation of infographics I have been reading but I found myself wondering about the following paragraph:

This slide shows how Search Analytics can be used to help with A/B testing. Concretely, in this slide we see two Solr Dismax handlers selected on the right side. If you are not familiar with Solr, think of a Dismax handler as an API that search applications call to execute searches. In this example, each Dismax handler is configured differently and thus each of them ranks search hits slightly differently. On the graph we see the MRR (see Wikipedia page for Mean Reciprocal Rank details) for both Dismax handlers and we can see that the one corresponding to the blue line is performing much better. That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one. Once you have a system like this in place you can add more Dismax handlers and compare 2 or more of them at a time. As the result, with the help of Search Analytics you get actual, real feedback about any changes you make to your search engine. Without a tool like this, you cannot really tune your search engine’s relevance well and will be doing it blindly.

Particularly the line:

That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one.

Really?

Here is one way to test that assumption:

Report for any search as the #1 or #2 result, “private cell-phone number for …” and pick one of the top ten movie actresses for 2011. And you can do better than that, make sure the cell-phone number is one that rings at your search analytics desk. Now see how many users are “…clicking on search hits closer to the top of the search results page….”

Are your results more relevant than a movie star?

Don’t get me wrong, search analytics are very important, but let’s not get carried away about what we can infer from largely opaque actions.

Some other questions: Did users find the information they needed? Can they make use of that information? Does that use improve some measurable or important aspect of the company business? Let’s broaden search analytics to make search results less opaque.

Google Correlate expands to 49 additional countries

Wednesday, January 4th, 2012

Google Correlate expands to 49 additional countries

Matt Mohebbi, Software Engineer, writes:

From the post:

In May of this year we launched Google Correlate on Google Labs. This system enables a correlation search between a user-provided time series and millions of time series of Google search traffic. Since our initial launch, we’ve graduated to Google Trends and we’ve seen a number of great applications of Correlate in several domains, including economics (consumer spending, unemployment rate and housing inventory), sociology and meteorology. The correspondence of gas prices and search activity for fuel efficient cars was even briefly discussed in a Fox News presidential debate and NPR recently covered correlations related to political commentators.

Google has added 49 countries for use with Correlate, bring the total to 50.

Just in case you are curious:

Country Table for Google Correlate – 4 Jan. 2012
  • Argentina
  • Australia
  • Austria
  • Belgium
  • Brazil
  • Bulgaria
  • Canada
  • Chile
  • China
  • Colombia
  • Croatia
  • Czech Republic
  • Denmark
  • Egypt
  • Finland
  • France
  • Germany
  • Greece
  • Hungary
  • India
  • Indonesia
  • Ireland
  • Israel
  • Italy
  • Japan
  • Malaysia
  • Mexico
  • Morocco
  • Netherlands
  • New Zealand
  • Norway
  • Peru
  • Philippines
  • Poland
  • Portugal
  • Romania
  • Russian Federation
  • Saudi Arabia
  • Singapore
  • Spain
  • Sweden
  • Switzerland
  • Taiwan
  • Thailand
  • Turkey
  • Ukraine
  • United Kingdom
  • United States
  • Venezuela
  • Viet Nam

What correlations are you going to find? (Bearing in mind that correlation is not causation.)

Semantic Prediction?

Saturday, December 17th, 2011

Bug Prediction at Google

From the post:

I first read this post because of the claim that 50% of the code base at Google changes each month. So it says but perhaps more on that another day.

While reading the post I ran across the following:

In order to help identify these hot spots and warn developers, we looked at bug prediction. Bug prediction uses machine-learning and statistical analysis to try to guess whether a piece of code is potentially buggy or not, usually within some confidence range. Source-based metrics that could be used for prediction are how many lines of code, how many dependencies are required and whether those dependencies are cyclic. These can work well, but these metrics are going to flag our necessarily difficult, but otherwise innocuous code, as well as our hot spots. We’re only worried about our hot spots, so how do we only find them? Well, we actually have a great, authoritative record of where code has been requiring fixes: our bug tracker and our source control commit log! The research (for example, FixCache) indicates that predicting bugs from the source history works very well, so we decided to deploy it at Google.

How it works

In the literature, Rahman et al. found that a very cheap algorithm actually performs almost as well as some very expensive bug-prediction algorithms. They found that simply ranking files by the number of times they’ve been changed with a bug-fixing commit (i.e. a commit which fixes a bug) will find the hot spots in a code base. Simple! This matches our intuition: if a file keeps requiring bug-fixes, it must be a hot spot because developers are clearly struggling with it.

So, if that is true for software bugs, doesn’t it stand to reason the same is true for semantic impedance? That is when a user selects one result and then within some time window selects one different from the first, the reason is the first failed to meet their criteria for a match? Same intuition. Users change because the match, in their view, failed.

Rather than trying to “reason” about the semantics of terms, we can simply observe user behavior with regard to those terms in the aggregate. And perhaps even salt the mine as it were with deliberate cases to test theories about the semantics of terms.

I haven’t done the experiment, yet, but it is certainly something that I will be looking into this next year. I think it has definite potential and would scale.

A Task-based Model of Search

Wednesday, December 14th, 2011

A Task-based Model of Search by Tony Russell-Rose.

From the post:

A little while ago I posted an article called Findability is just So Last Year, in which I argued that the current focus (dare I say fixation) of the search community on findability was somewhat limiting, and that in my experience (of enterprise search, at least), there are a great many other types of information-seeking behaviour that aren’t adequately accommodated by the ‘search as findability’ model. I’m talking here about things like analysis, sensemaking, and other problem-solving oriented behaviours.

Now, I’m not the first person to have made this observation (and I doubt I’ll be the last), but it occurs to me that one of the reasons the debate exists in the first place is that the community lacks a shared vocabulary for defining these concepts, and when we each talk about “search tasks” we may actually be referring to quite different things. So to clarify how I see the landscape, I’ve put together the short piece below. More importantly, I’ve tried to connect the conceptual (aka academic) material to current design practice, so that we can see what difference it might make if we had a shared perspective on these things. As always, comments & feedback welcome.

High marks for a start on what complex and intertwined issues.

Not so much that we will reach a common vocabulary but so we can be clearer about where we get confused when moving from one paradigm to another.

A Taxonomy of Enterprise Search and Discovery

Friday, November 4th, 2011

A Taxonomy of Enterprise Search and Discovery by Tony Russell-Rose.

Abstract:

Classic IR (information retrieval) is predicated on the notion of users searching for information in order to satisfy a particular “information need”. However, it is now accepted that much of what we recognize as search behaviour is often not informational per se. Broder (2002) has shown that the need underlying a given web search could in fact be navigational (e.g. to find a particular site) or transactional (e.g. through online shopping, social media, etc.). Similarly, Rose & Levinson (2004) have identified the consumption of online resources as a further common category of search behaviour.

In this paper, we extend this work to the enterprise context, examining the needs and behaviours of individuals across a range of search and discovery scenarios within various types of enterprise. We present an initial taxonomy of “discovery modes”, and discuss some initial implications for the design of more effective search and discovery platforms and tools.

If you are flogging software/interfaces for search/discovery in an enterprise context, you really need to read this paper. In part because of their initial findings but in part to establish the legitimacy of evaluating how users search before designing an interface for them to search with. They may not be able to articulate all their search behaviors which means you will have to do some observation to establish what may be the elements that make a difference in a successful interface and one that is less so. (No one wants to be the next Virtual Case Management project at the FBI.)

Read the various types of searching as rough guides to what you may find true for your users. When in doubt, trust your observations of and feedback from your users. Otherwise you will have an interface that fits an abstract description in a paper but not your users. I leave it for you to judge which one results in repeat business.

Don’t take that as a criticism of the paper, I think it is one of the best I have read lately. My concern is that the evaluation of user needs/behaviour be an ongoing process and not prematurely fixed or obscured by categories or typologies of how users “ought” to act.

The paper is also available in PDF format.