Archive for the ‘Recall’ Category

11 Billion Clues in 800 Million Documents:…

Saturday, July 20th, 2013

11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts by Dave Orr, Amar Subramanya, Evgeniy Gabrilovich, and Michael Ringgaard.

From the post:

When you type in a search query — perhaps Plato — are you interested in the string of letters you typed? Or the concept or entity represented by that string? But knowing that the string represents something real and meaningful only gets you so far in computational linguistics or information retrieval — you have to know what the string actually refers to. The Knowledge Graph and Freebase are databases of things, not strings, and references to them let you operate in the realm of concepts and entities rather than strings and n-grams.

We’ve previously released data to help with disambiguation and recently awarded $1.2M in research grants to work on related problems. Today we’re taking another step: releasing data consisting of nearly 800 million documents automatically annotated with over 11 billion references to Freebase entities.

These Freebase Annotations of the ClueWeb Corpora (FACC) consist of ClueWeb09 FACC and ClueWeb12 FACC. 11 billion phrases that refer to concepts and entities in Freebase were automatically labeled with their unique identifiers (Freebase MID’s). …


Based on review of a sample of documents, we believe the precision is about 80-85%, and recall, which is inherently difficult to measure in situations like this, is in the range of 70-85%….


Evaluate precision and recall by asking:

Your GPS gives you relevant directions on an average eight (8) times out of ten and it finds relevant locations on average of seven (7) times out of ten (10). (Wikipedia on Precision and Recall)

Is that a good GPS?

A useful data set but still a continuation of the approach of guessing what authors meant when they authored documents.

What if by some yet unknown technique, precision goes to nine (9) out of ten (10) and recall goes to nine (9) out of ten (10) as well?

The GPS question becomes:

Your GPS gives you relevant directions on an average nine (9) times out of ten and it finds relevant locations on average of nine (9) times out of ten (10).

Is that a good GPS?

Not that any automated technique has shown that level of performance.

Rather than focusing on data post-authoring, why not enable authors to declare their semantics?

Author declared semantics would reduce the cost and uncertainty of post-authoring semantic solutions.

I first saw this in a tweet by Nicolas Torzec.

How accurate can manual review be?

Friday, December 23rd, 2011

How accurate can manual review be?

From the post:

One of the chief pleasures for me of this year’s SIGIR in Beijing was attending the SIGIR 2011 Information Retrieval for E-Discovery Workshop (SIRE 2011). The smaller and more selective the workshop, it often seems, the more focused and interesting the discussion.

My own contribution was “Re-examining the Effectiveness of Manual Review”. The paper was inspired by an article from Maura Grossman and Gord Cormack, whose message is neatly summed up in its title: “Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review”.

Fascinating work!

Does this give you pause about automated topic map authoring? Why/why not?

Recall vs. Precision

Tuesday, November 15th, 2011

Recall vs. Precision by Gene Golovchinsky.

From the post:

Stephen Robertson’s talk at the CIKM 2011 Industry event caused me to think about recall and precision again. Over the last decade precision-oriented searches have become synonymous with web searches, while recall has been relegated to narrow verticals. But is precision@5 or NCDG@1 really the right way to measure the effectiveness of interactive search? If you’re doing a known-item search, looking up a common factoid, etc., then perhaps it is. But for most searches, even ones that might be classified as precision-oriented ones, the searcher might wind up with several attempts to get at the answer. Dan Russell’s a Google a day lists exactly those kinds of challenges: find a fact that’s hard to find.

So how should we think about evaluating the kinds of searches that take more than one query, ones we might term session-based searches?

Read the post and the comments more than once!

Then think about how you would answer the questions raised, in or out of a topic map context.

Much food for thought here.

Stephen Robertson on Why Recall Matters

Monday, November 14th, 2011

Stephen Robertson on Why Recall Matters November 14th, 2011 by Daniel Tunkelang.

Daniel has the slides and an extensive summary of the presentation. Just to give you an taste of what awaits at Daniel’s post:

Stephen started by reminding us of ancient times (i.e., before the web), when at least some IR researchers thought in terms of set retrieval rather than ranked retrieval. He reminded us of the precision and recall “devices” that he’d described in his Salton Award Lecture — an idea he attributed to the late Cranfield pioneer Cyril Cleverdon. He noted that, while set retrieval uses distinct precision and recall devices, ranking conflates both into decision of where to truncate a ranked result list. He also pointed out an interesting asymmetry in the conventional notion of precision-recall tradeoff: while returning more results can only increase recall, there is no certainly that the additional results will decrease precision. Rather, this decrease is a hypothesis that we associate with systems designed to implement the probability ranking principle, returning results in decreasing order of probability of relevance.

Interested? There’s more where that came from, see like to Daniel’s post above.

Evaluating Recommender Systems…

Friday, May 6th, 2011

Evaluating Recommender Systems – Explaining F-Score, Recall and Precision using Real Data Set from Apontador

Marcel Caraciolo says:

In this post I will introduce three metrics widely used for evaluating the utility of recommendations produced by a recommender system : Precision , Recall and F-1 Score. The F-1 Score is slightly different from the other ones, since it is a measure of a test’s accuracy and considers both the precision and the recall of the test to compute the final score.

Recommender systems are quite common and you are likely to encounter them while deploying topic maps. (Or you may wish to build one as part of a topic map system.)

Precision versus Recall

Monday, May 17th, 2010

High precision means resources are missed.

High recall means sifting garbage.

Q: Based on what assumption?

A: No assumption, observed behavior of texts and search engines.

Q: Based on what texts?

A: All texts, yes, all texts.

Q: Texts where same subjects have different works/phrases and same words/phrases mean different subjects?

A: Yes, those texts!

Q: If the subjects were identified in those texts, we could have high precision and high recall?

A: Yes, but not possible, too many texts!

Q: If the authors of new texts were to identify….

A: Sorry, no time, have to search now. Good-bye!

What Is Your TFM (To Find Me) Score?

Thursday, April 15th, 2010

I have talked about TFM (To Find Me) scores before. Take a look at How Can I Find Thee? Let me count the ways… for example.

So, you have looked at your OPAC, database, RDF datastore, topic map. What is your average TMF Score?

What do you think it needs to be for 60 to 80% retrieval?

The Furnas article from 1983 is the key to this series of posts. See the full citation in Are You Designing a 10% Solution?.

Would you believe 15 ways to identify a subject? Or aliases to use the common terminology.

Say it slowly, 15 ways to identify a subject gets on average 60 to 80% retrieval. If you are in the range of 3 – 5 ways to identify a subject on your ecommerce site, you are leaving money on the table. Lots of money on the table.

Want to leave less money on the table? Use topic maps and try for 15 aliases for a subject or more.

Are You Designing a 10% Solution?

Monday, April 5th, 2010

The most common feature on webpages is the search box. It is supposed to help readers find information, products, services; in other words, help the reader or your cash flow.

How effective is text searching? How often will your reader use the same word as your content authors for some object, product, service? Survey says: 10 to 20%!*

So the next time you insert a search box on a webpage, you or your client may be missing 80 to 90% of the potential readers or customers. Ouch!

Unlike the imaginary world of universal and unique identifiers, the odds of users choosing the same words has been established by actual research.

The data sets were:

  • verbs used to describe text-editing operations
  • descriptions of common objects, similar to PASSWORD ™ game
  • superordinate category names for swap-and-sale listings
  • main-course cooking recipes

There are a number of interesting aspects to the study that I will cover in future posts but the article offers the following assessment of text searching:

We found that random pairs of people use the same word for an object only 10 to 20 percent of the time.

This research is relevant to all information retrieval systems. Online stores, library catalogs, whether you are searching simple text, RDF or even topic maps. Ask yourself or your users: Is a 10% success rate really enough?

(There ways to improve that 10% score. More on those to follow.)

*Furnas, G. W., Landauer, T. K., Gomez, L. M., Dumais, S. T., (1983) “Statistical semantics: Analysis of the potential performance of keyword information access systems.” Bell System Technical Journal, 62, 1753-1806. Reprinted in: Thomas, J.C., and Schneider, M.L, eds. (1984) Human Factors in Computer Systems. Norwood, New Jersey: Ablex Publishing Corp., 187-242.

Size Really Does Matter…

Tuesday, March 16th, 2010

…when you are evaluating the effectiveness of full-text searching. Twenty-five years Blair and Maron, An evaluation of retrieval effectiveness for a full-text document-retrieval system, established that size effects the predicted usefulness of full text searching.

Blair and Maron used a then state of the art litigation support database containing 40,000 documents for a total of approximately 350,000 pages. Their results differ significantly from earlier, optimistic reports concerning full-text search retrieval. The earlier reports were based on sets of less than 750 documents.

The lawyers using the system, thought they were obtaining at a minimum, 75% of the relevant documents. The participants were astonished to learn they were recovering only 20% of the relevant documents.

One of the reasons cited by Blair and Maron merits quoting:

The belief in the predictability of words and phrases that may be used to discuss a particular subject is a difficult prejudice to overcome….Stated succinctly, is is impossibly difficult for users to predict the exact word, word combinations, and phrases that are used by all (or most) relevant documents and only (or primarily) by those documents….(emphasis in original, page 295)

That sounds to me like users using different ways to talk about the same subjects.

Topic maps won’t help users to predict the “exact word, word combinations, and phrases.” However, they can be used to record mappings into document collections,that collect up the “exact word, word combinations, and phrases” used in relevant documents.

Topic maps can used like the maps of early explorers that become more precise with each new expedition.