Archive for the ‘Relevance’ Category

Relevant Search

Tuesday, June 2nd, 2015

Relevant Search – With examples using Elasticsearch and Solr by Doug Turnbull and John Berryman.

From the webpage:

Users expect search to be simple: They enter a few terms and expect perfectly-organized, relevant results instantly. But behind this simple user experience, complex machinery is at work. Whether using Solr, Elasticsearch, or another search technology, the solution is never one size fits all. Returning the right search results requires conveying domain knowledge and business rules in the search engine’s data structures, text analytics, and results ranking capabilities.

Relevant Search demystifies relevance work. Using Elasticsearch, it teaches you how to return engaging search results to your users, helping you understand and leverage the internals of Lucene-based search engines. Relevant Search walks through several real-world problems using a cohesive philosophy that combines text analysis, query building, and score shaping to express business ranking rules to the search engine. It outlines how to guide the engineering process by monitoring search user behavior and shifting the enterprise to a search-first culture focused on humans, not computers. You’ll see how the search engine provides a deeply pluggable platform for integrating search ranking with machine learning, ontologies, personalization, domain-specific expertise, and other enriching sources.

  • Creating a foundation for Lucene-based search (Solr, Elasticsearch) relevance internals
  • Bridging the field of Information Retrieval and real-world search problems
  • Building your toolbelt for relevance work
  • Solving search ranking problems by combining text analysis, query building, and score shaping
  • Providing users relevance feedback so that they can better interact with search
  • Integrating test-driven relevance techniques based on A/B testing and content expertise
  • Exploring advanced relevance solutions through custom plug-ins and machine learning

Now imagine relevancy searching where a topic map contains multiple subject identifications for a single subject, from different perspectives.

Relevant Search is in early release but the sooner you participate, the fewer errata there will be in the final version.

Measuring Search Relevance

Monday, October 13th, 2014

Measuring Search Relevance by Hugh E. Williams.

From the post:

The process of asking many judges to assess search performance is known as relevance judgment: collecting human judgments on the relevance of search results. The basic task goes like this: you present a judge with a search result, and a search engine query, and you ask the judge to assess how relevant the item is to the query on (say) a four-point scale.

Suppose the query you want to assess is ipod nano 16Gb. Imagine that one of the results is a link to Apple’s page that describes the latest Apple iPod nano 16Gb. A judge might decide that this is a “great result” (which might be, say, our top rating on the four-point scale). They’d then click on a radio button to record their vote and move on to the next task. If the result we showed them was a story about a giraffe, the judge might decide this result is “irrelevant” (say the lowest rating on the four point scale). If it were information about an iPhone, it might be “partially relevant” (say the second-to-lowest), and if it were a review of the latest iPod nano, the judge might say “relevant” (it’s not perfect, but it sure is useful information about an Apple iPod).

The human judgment process itself is subjective, and different people will make different choices. You could argue that a review of the latest iPod nano is a “great result” — maybe you think it’s even better than Apple’s page on the topic. You could also argue that the definitive Apple page isn’t terribly useful in making a buying decision, and you might only rate it as relevant. A judge who knows everything about Apple’s products might make a different decision to someone who’s never owned an digital music player. You get the idea. In practice, judging decisions depend on training, experience, context, knowledge, and quality — it’s an art at best.

There are a few different ways to address subjectivity and get meaningful results. First, you can ask multiple judges to assess the same results to get an average score. Second, you can judge thousands of queries, so that you can compute metrics and be confident statistically that the numbers you see represent true differences in performance between algorithms. Last, you can train your judges carefully, and give them information about what you think relevance means.

An illustrated walk through measuring search relevance. Useful for a basic understanding of the measurement process and its parameters.

Bookmark this post so When you tell your judges what “…relevance means”, you can return here and post what you told your judges.

I ask because I deeply suspect that our ideas of “relevance” vary widely from subject to subject.

Thanks!

Practical Relevance Ranking for 11 Million Books, Part 1

Wednesday, May 21st, 2014

Practical Relevance Ranking for 11 Million Books, Part 1 by Tom Burton-West.

From the post:

This is the first in a series of posts about our work towards practical relevance ranking for the 11 million books in the HathiTrust full-text search application.

Relevance is a complex concept which reflects aspects of a query, a document, and the user as well as contextual factors. Relevance involves many factors such as the user’s preferences, the user’s task, the user’s stage in their information-seeking, the user’s domain knowledge, the user’s intent, and the context of a particular search.

While many different kinds of relevance have been discussed in the literature, topical relevance is the one most often used in testing relevance ranking algorithms. Topical relevance is a measure of “aboutness”, and attempts to measure how much a document is about the topic of a user’s query.

At its core, relevance ranking depends on an algorithm that uses term statistics, such as the number of times a query term appears in a document, to provide a topical relevance score. Other ranking features that try to take into account more complex aspects of relevance are built on top of this basic ranking algorithm.

In many types of search, such as e-commerce or searching for news, factors other than the topical relevance (based on the words in the document) are important. For example, a search engine for e-commerce might have facets such as price, color, size, availability, and other attributes, that are of equal importance to how well the user’s query terms match the text of a document describing a product. In news retrieval, recency[iii] and the location of the user might be factored into the relevance ranking algorithm. (footnotes omitted)

Great post that discusses the impact of the length of a document on its relevancy ranking by Lucene/Solr. That impact is well known but how to move from studies on relevancy studies with short documents to long documents (books) isn’t known.

I am looking forward to Part 2, which will cover the relationship between relevancy and document length.

“Credibility” As “Google Killer”?

Sunday, May 4th, 2014

Nancy Baym tweets: “Nice article on flaws of ”it’s not our fault, it’s the algorithm” logic from Facebook with quotes from @TarletonG” pointing to: Facebook draws fire on ‘related articles’ push.

From the post:

A surprise awaited Facebook users who recently clicked on a link to read a story about Michelle Obama’s encounter with a 10-year-old girl whose father was jobless.

Facebook responded to the click by offering what it called “related articles.” These included one that alleged a Secret Service officer had found the president and his wife having “S*X in Oval Office,” and another that said “Barack has lost all control of Michelle” and was considering divorce.

A Facebook spokeswoman did not try to defend the content, much of which was clearly false, but instead said there was a simple explanation for why such stories are pushed on readers. In a word: algorithms.

The stories, in other words, apparently are selected by Facebook based on mathematical calculations that rely on word association and the popularity of an article. No effort is made to vet or verify the content.

Facebook’s explanation, however, is drawing sharp criticism from experts who said the company should immediately suspend its practice of pushing so-called related articles to unsuspecting users unless it can come up with a system to ensure that they are credible. (emphasis added)

Just imagine the hue and outcry had that last line read:

Imaginary Quote Google’s explanation of search results, however, is drawing sharp criticism from experts who said the company should immediately suspend its practice of pushing so-called related articles to unsuspecting users unless it can come up with a system to ensure that they are credible. End Imaginary Quote

Is demanding “credibility” of search results the long sought after “Google Killer?”

“Credibility” is closely related to the “search” problem but I think it should be treated separately from search.

In part because the “credibility” question is one that can require multiple searches upon the author of search result content, searches for reviews and comments on search result content, searches of other sources of data on the content in the search result and then a collation of that additional content to make a credibility judgement on the search result content. The procedure isn’t always that elaborate but the main point is that it requires additional searching and evaluation of content to even begin to answer a credibility question.

Not to mention why the information is being sought has a bearing on credibility. If I want to find examples of nutty things said about President Obama to cite, then finding the cases mentioned above is not only relevant (the search question) but also “credible” in the sense that Facebook did not make they up. They are published nutty statements about the current President.

What if a user wanted to search for “coffee and bagels?” The top hit on one popular search engine today is: Coffee Meets Bagel: Free Online Dating Sites, along with numerous other links to information on the first link. Was this relevant to my search? No, but search results aren’t always predictable. They are relevant to someone’s search using “coffee and bagels.”

It is the responsibility of every reader to decide for themselves what is relevant, credible, useful, etc. in terms of content, whether it is hard copy or digital.

Any other solution takes us to Plato‘s Republic, which was great to read about, would not want to live there.

Why the Feds (U.S.) Need Topic Maps

Monday, January 6th, 2014

Earlier today I saw this offer to “license” technology for commercial development:

ORNL’s Piranha & Raptor Text Mining Technology

From the post:

UT-Battelle, LLC, acting under its Prime Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy (DOE) for the management and operation of the Oak Ridge National Laboratory (ORNL), is seeking a commercialization partner for the Piranha/Raptor text mining technologies. The ORNL Technology Transfer Office will accept licensing applications through January 31, 2014.

ORNL’s Piranha and Raptor text mining technology solves the challenge most users face: finding a way to sift through large amounts of data that provide accurate and relevant information. This requires software that can quickly filter, relate, and show documents and relationships. Piranha is JavaScript search, analysis, storage, and retrieval software for uncertain, vague, or complex information retrieval from multiple sources such as the Internet. With the Piranha suite, researchers have pioneered an agent approach to text analysis that uses a large number of agents distributed over very large computer clusters. Piranha is faster than conventional software and provides the capability to cluster massive amounts of textual information relatively quickly due to the scalability of the agent architecture.

While computers can analyze massive amounts of data, the sheer volume of data makes the most promising approaches impractical. Piranha works on hundreds of raw data formats, and can process data extremely fast, on typical computers. The technology enables advanced textual analysis to be accomplished with unprecedented accuracy on very large and dynamic data. For data already acquired, this design allows discovery of new opportunities or new areas of concern. Piranha has been vetted in the scientific community as well as in a number of real-world applications.

The Raptor technology enables Piranha to run on SharePoint and MS SQL servers and can also operate as a filter for Piranha to make processing more efficient for larger volumes of text. The Raptor technology uses a set of documents as seed documents to recommend documents of interest from a large, target set of documents. The computer code provides results that show the recommended documents with the highest similarity to the seed documents.

Gee, that sounds so very hard. Using seed documents to recommend documents “…from a large, target set of documents.”?

Many ways to do that but just looking for “Latent Dirichlet Allocation” in “.gov” domains, my total is 14,000 “hits.”

If you were paying for search technology to be developed, how many times would you pay to develop the same technology?

Just curious.

In order to have a sensible development of technology process, the government needs a topic map to track its development efforts. Not only to track but prevent duplicate development.

Imagine if every web project had to develop its own httpd server, instead of the vast majority of them using Apache HTTPD.

With a common server base, a community has developed to maintain and extend that base product. That can’t happen where the same technology is contracted for over and over again.

Suggestions on what might be an incentive for the Feds to change their acquisition processes?

Relevancy 301 – The Graduate Level Course

Wednesday, November 20th, 2013

Relevancy 301 – The Graduate Level Course by Paul Nelson.

From the post:

So, I was going to write an article entitled “Relevancy 101”, but that seemed too shallow for what has become a major area of academic research. And so here we are with a Graduate-Level Course. Grab your book-bag, some Cheetos and a Mountain Dew, and let’s kick back and talk search engine relevancy.

I have blogged about relevancy before (see “What does ‘relevant’ mean?)”, but that was a more philosophical discussion of relevancy. The purpose of this blog is to go in-depth into the different types of relevancy, how they’re computed, and what they’re good for. I’ll do my best to avoid math, but no guarantees.

A very good introduction to measures of “relevancy,” most of which are no longer used.

Pay particular attention to Paul’s remarks about the weaknesses of inverse document frequency (IDF).

Before Paul posts part 2, how do you determine the relevance of documents?

Exercise:

Pick a subject covered by a journal or magazine, one with twelve issues each year and review a year’s worth of issues for “relevant” articles.

Assuming the journal is available electronically, does the search engine suggest your other “relevant” articles?

If it doesn’t, can you determine why it recommended different articles?

Webinar: Trubo-Charging Solr

Monday, October 7th, 2013

Turbo-charge your Solr instance with Entity Recognition, Business Rules and a Relevancy Workbench by Yann Yu.

Date: Thursday, October 17, 2013
Time: 10:00am Pacific Time

From the post:

LucidWorks has three new modules available in the Solr Marketplace that run on top of your existing Solr or LucidWorks Search instance. Join us for an overview of each module and learn how implementing one, two or all three will turbo-charge your Solr instance.

  • Business Rules Engine: Out of the box integration with Drools, the popular open-source business rules engine is now available for Solr and LucidWorks Search. With the LucidWorks Business Rules module, developers can write complex rules using declarative syntax with very little programming. Data can be modified, cleaned and enriched through multiple permutations and combinations.
  • Relevancy Workbench: Experiment with different search parameters to understand the impact of these changes to search results. With intuitive, color-code and side-by-side comparisons of results for different sets of parameters, users can quickly tune their application to produce the results they need. The Relevancy Workbench encourages experimentation with a visual “before and after” view of the results of parameter changes.
  • Entity Recognition: Enhance Search applications beyond simple keyword search by adding intelligence through metadata. Help classify common patterns from unstructured data/content into predefined categories. Examples include names of persons, organizations, locations, expressions of time, quantities, monetary values, percentages etc.

All of these modules will be of interest to topic mappers who are processing bulk data.

Improve search relevancy…

Wednesday, July 24th, 2013

Improve search relevancy by telling Solr exactly what you want by Doug Turnbull.

From the post:

To be successful, (e)dismax relies on avoiding a tricky problem with its scoring strategy. As we’ve discussed, dismax scores documents by taking the maximum score of all the fields that match a query. This is problematic as one field’s scores can’t easily be related to another’s. A good “text” match might have a score of 2, while a bad “title” score might be 10. Dismax doesn’t have a notion that “10” is bad for title, it only knows 10 > 2, so title matches dominate the final search results.

The best case for dismax is that there’s only one field that matches a query, so the resulting scoring reflects the consistency within that field. In short, dismax thrives with needle-in-a-haystack problems and does poorly with hay-in-a-haystack problems.

We need a different strategy for documents that have fields with a large amount of overlap. We’re trying to tell the difference between very similar pieces of hay. The task is similar to needing to find a good candidate for a job. If we wanted to query a search index of job candidates for “Solr Java Developer”, we’ll clearly match many different sections of our candidates’ resumes. Because of problems with dismax, we may end up with search results heavily sorted on the “objective” field.

(…)

Not unlike my comments yesterday about the similarity of searching and playing the lottery. The more you invest in the search, the more likely you are to get good results.

Doug analyzes what criteria should data meet in order to be a “good” result.

For a topic map, I would analyze what data does a subject need in order to be found by a typical request.

Both address the same problem, search, but from very different perspectives.

Real-Time Twitter Search by @larsonite

Saturday, September 22nd, 2012

Real-Time Twitter Search by @larsonite by Marti Hearst.

From the post:

Brian Larson gives a brilliant technical talk about how real-time search Real-Time Twitter Search by @larsoniteworks at Twitter; He really knows what he’s talking about given that he’s the tech lead for search and relevance at Twitter!

The coverage of the real-time indexing, Java memory model, safe publication were particularly good.

As a bonus, also discusses relevance near the end of the presentation.

You may want to watch this more than once!

Brian recommends Java Concurrency in Practice by Brian Goetz as having good coverage of the Java memory model.

LAILAPS

Wednesday, April 25th, 2012

LAILAPS

From the website:

LAILAPS combines a keyword driven search engine for an integrative access to life science databases, machine learning for a content driven relevance ranking, recommender systems for suggestion of related data records and query refinements with a user feedback tracking system for an self learning relevance training.

Features:

  • ultra fast keyword based search
  • non-static relevance ranking
  • user specific relevance profiles
  • suggestion of related entries
  • suggestion of related query terms
  • self learning by user tracking
  • deployable at standard desktop PC
  • 100% JAVA
  • installer for in-house deployment

I like the idea of a recommender system that “suggests” related data records and query refinements. It could be wrong.

I am as guilty as anyone of thinking in terms of “correct” recommendations that always lead to relevant data.

That is applying “crisp” set thinking to what is obviously a “rough” set situation. We as readers have to sort out the items in the “rough” set and construct for ourselves, a temporary and fleeting “crisp” set for some particular purpose.

If you are using LAILAPS, I would appreciate a note about your experiences and impressions.

Relevance Tuning and Competitive Advantage via Search Analytics

Sunday, January 8th, 2012

Relevance Tuning and Competitive Advantage via Search Analytics

It must be all the “critical” evaluation of infographics I have been reading but I found myself wondering about the following paragraph:

This slide shows how Search Analytics can be used to help with A/B testing. Concretely, in this slide we see two Solr Dismax handlers selected on the right side. If you are not familiar with Solr, think of a Dismax handler as an API that search applications call to execute searches. In this example, each Dismax handler is configured differently and thus each of them ranks search hits slightly differently. On the graph we see the MRR (see Wikipedia page for Mean Reciprocal Rank details) for both Dismax handlers and we can see that the one corresponding to the blue line is performing much better. That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one. Once you have a system like this in place you can add more Dismax handlers and compare 2 or more of them at a time. As the result, with the help of Search Analytics you get actual, real feedback about any changes you make to your search engine. Without a tool like this, you cannot really tune your search engine’s relevance well and will be doing it blindly.

Particularly the line:

That is, users are clicking on search hits closer to the top of the search results page, which is one of several signals of this Dismax handler providing better relevance ranking than the other one.

Really?

Here is one way to test that assumption:

Report for any search as the #1 or #2 result, “private cell-phone number for …” and pick one of the top ten movie actresses for 2011. And you can do better than that, make sure the cell-phone number is one that rings at your search analytics desk. Now see how many users are “…clicking on search hits closer to the top of the search results page….”

Are your results more relevant than a movie star?

Don’t get me wrong, search analytics are very important, but let’s not get carried away about what we can infer from largely opaque actions.

Some other questions: Did users find the information they needed? Can they make use of that information? Does that use improve some measurable or important aspect of the company business? Let’s broaden search analytics to make search results less opaque.

Orev: The Apache OpenRelevance Viewer

Tuesday, December 13th, 2011

Orev: The Apache OpenRelevance Viewer

From the webpage:

The OpenRelevance project is an Apache project, aimed at making materials for doing relevance testing for information retrieval (IR), Machine Learning and Natural Language Processing (NLP). Think TREC, but open-source.

These materials require a lot of managing work and many human hours to be put into collecting corpora and topics, and then judging them. Without going into too many details here about the actual process, it essentially means crowd-sourcing a lot of work, and that is assuming the OpenRelevance project had the proper tools to offer the people recruited for the work.

Having no such tool, the Viewer – Orev – is meant for being exactly that, and so to minimize the overhead required from both the project managers and the people who will be doing the actual work. By providing nice and easy facilities to add new Topics and Corpora, and to feed documents into a corpus, it will make it very easy to manage the surrounding infrastructure. And with a nice web UI to be judging documents with, the work of the recruits is going to be very easy to grok.

Focuses on judging of documents but that is a common level of granularity these days for relevance.

I don’t know of anything more granular but if you find such a tool, please sing out!

Yandex – Relevance Prediction Challenge

Wednesday, November 16th, 2011

Yandex – Relevance Prediction Challenge

Important Dates:

Oct 15, 2011 – Challenge opens

Dec 15 22, 2011 – End of challenge

Dec 25, 2011 – Winners candidacy notification

Jan 20, 2012 – Reports deadline

Feb 12, 2012 – WSCD workshop at WSDM 2012, Winners announcement

Sorry, you are late starting already, here are some of the details, see the website for more:

From the webpage:

The Relevance Prediction Challenge provides a unique opportunity to consolidate and scrutinize the work from industrial labs on predicting the relevance of URLs using user search behavior. It provides a fully anonymized dataset shared by Yandex which has clicks and relevance judgements. Predicting relevance based on clicks is difficult, and is not a solved problem. This Challenge and the shared dataset will enable a whole new set of researchers to conduct such experiments.

The Relevance Prediction Challenge is a part of series of contests organized by Yandex called Internet Mathematics. This year’s event is the sixth since 2004. Participants will again compete in finding solutions to a real-life problem based on real-life data. In previous years, participants tried to learn to rank documents, predict traffic jams and find similar images.

I can’t think of very many “better” days to find out you won such a contest!

Open Relevance Project

Sunday, October 9th, 2011

Open Relevance Project

From the website:

What Is the Open Relevance Project?

The Open Relevance Project (ORP) is a new Apache Lucene sub-project aimed at making materials for doing relevance testing for Information Retrieval (IR), Machine Learning and Natural Language Processing (NLP) into open source.

Our initial focus is on creating collections, judgments, queries and tools for the Lucene ecosystem of projects (Lucene Java, Solr, Nutch, Mahout, etc.) that can be used to judge relevance in a free, repeatable manner.

One dataset that needs attention from this project is: Apache Software Foundation Public Mail Archives, which is accessible on the Amazon cloud.

Project work products would benefit Apache software users, vendors with Apache software bases, historians, sociologists and others interested in the dynamics, technical and otherwise, of software development.

I am willing to try to learn cloud computing and the skills necessary to turn this dataset into a test collection. Are you?

…Link Analysis on EU Case Law

Thursday, September 22nd, 2011

Malmgren: Towards a Theory of Jurisprudential Relevance Ranking – Using Link Analysis on EU Case Law

From the post:

Staffan Malmgren of Stockholm University and the free access to law service of Sweden, lagen.nu, has posted his Master’s thesis, Towards a Theory of Jurisprudential Relevance Ranking – Using Link Analysis on EU Case Law (2011). Here is the abstract:

Staffan is going to be posting his thesis a chapter at a time to solicit feedback on it.

Any takers?