Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 2, 2010

Skillful Semantic Users?

Filed under: Usability — Tags: , , , , — Patrick Durusau @ 9:05 am

I recently discovered one reason for my unease with semantic this and that technologies, including topic map interfaces. A friend mentioned to me that he wanted users to do more than enter subject names in their topic map interface. “Users need to also enter….”

The idea of users busily populating a semantic space is an attractive one, but it hasn’t been borne out in practice. So I don’t think my friend’s interface is going to prove to be useful, but why?

Then I got to thinking, how many indexers or librarians do I know? The sort of people whose talents combined together to bring us the Reader’s Guide to Periodic Literature and useful back of the book indexes. Due to my work in computer standards I know a lot of smart people but very few of them strike me as also being good at indexing or cataloging type skills.

Any semantic solution, RDFa, RDF/OWL, SUMO, Topic Maps, etc., will fail from an authoring standpoint due to a lack of skill. No technology can magically make users competent at the indexing or cataloging skills required to enable access by others.

Semantic interface writers need to recognize most users are simply consumers of information created by others. I would not be surprised if the ratio of producers to consumers is close to the ratio in open source projects between contributors and the consumers in those projects.

March 1, 2010

Is 00.7% of Relevant Documents Enough?

Filed under: Information Retrieval,Searching — Tags: , , , — Patrick Durusau @ 9:04 am

Searching for implication, that is p implies q, I got:

  • “q whenever p” – 44,200 “hits” (00.7%)
  • “p is sufficient for q” – 385,000 “hits” (6%)
  • “p implies q” – 506,000 “hits” (8%)
  • “if p, then q” – 2,189,000 “hits” (36%)
  • “q if p” – 2,920,000 “hits” (48%)

What if the search was for a “smoking gun” sort of document during legal discovery? Or searching for the latest treatment for a patient dying in ER? Or engineering literature to avoid what could be a fatal flaw in a part that will go into hundreds of airplanes? Hmmm, 00.7% results don’t look all that attractive.

It isn’t possible to know what percentage of relevant documents your query returned for a document set of any size. Your query might be the 48% query but it could also be the 00.7% query.

To make matters worse, the 00.7% query could be even worse. That score assumes that those five queries return *all* the relevant documents.

The problem is that different users identify the same subjects in different ways. Or use the same identifications for different subjects. Matters get worse the more users that produce documents that need to be searched.

Available options include:

  1. Create new identifiers and ignore previous ones
  2. Create new identifiers and map previous ones
  3. Map identifiers people already use

This blog will explore all three and why I prefer the last one.

Powered by WordPress