Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 1, 2010

Is 00.7% of Relevant Documents Enough?

Filed under: Information Retrieval,Searching — Tags: , , , — Patrick Durusau @ 9:04 am

Searching for implication, that is p implies q, I got:

  • “q whenever p” – 44,200 “hits” (00.7%)
  • “p is sufficient for q” – 385,000 “hits” (6%)
  • “p implies q” – 506,000 “hits” (8%)
  • “if p, then q” – 2,189,000 “hits” (36%)
  • “q if p” – 2,920,000 “hits” (48%)

What if the search was for a “smoking gun” sort of document during legal discovery? Or searching for the latest treatment for a patient dying in ER? Or engineering literature to avoid what could be a fatal flaw in a part that will go into hundreds of airplanes? Hmmm, 00.7% results don’t look all that attractive.

It isn’t possible to know what percentage of relevant documents your query returned for a document set of any size. Your query might be the 48% query but it could also be the 00.7% query.

To make matters worse, the 00.7% query could be even worse. That score assumes that those five queries return *all* the relevant documents.

The problem is that different users identify the same subjects in different ways. Or use the same identifications for different subjects. Matters get worse the more users that produce documents that need to be searched.

Available options include:

  1. Create new identifiers and ignore previous ones
  2. Create new identifiers and map previous ones
  3. Map identifiers people already use

This blog will explore all three and why I prefer the last one.

2 Comments

  1. […] I pointed on in Is 00.7% of Relevant Documents Enough? a user may get lucky and guess a popular term or terms for some […]

    Pingback by Full-Text Search “Logic” « Another Word For It — May 23, 2010 @ 6:33 am

  2. […] to think that a year has gone by since Is 00.7% of Relevant Documents Enough?, 1 March […]

    Pingback by Now We Are 1 (and a few days) « Another Word For It — March 11, 2011 @ 7:39 pm

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress