Archive for the ‘Precision’ Category

How accurate can manual review be?

Friday, December 23rd, 2011

How accurate can manual review be?

From the post:

One of the chief pleasures for me of this year’s SIGIR in Beijing was attending the SIGIR 2011 Information Retrieval for E-Discovery Workshop (SIRE 2011). The smaller and more selective the workshop, it often seems, the more focused and interesting the discussion.

My own contribution was “Re-examining the Effectiveness of Manual Review”. The paper was inspired by an article from Maura Grossman and Gord Cormack, whose message is neatly summed up in its title: “Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review”.

Fascinating work!

Does this give you pause about automated topic map authoring? Why/why not?

Recall vs. Precision

Tuesday, November 15th, 2011

Recall vs. Precision by Gene Golovchinsky.

From the post:

Stephen Robertson’s talk at the CIKM 2011 Industry event caused me to think about recall and precision again. Over the last decade precision-oriented searches have become synonymous with web searches, while recall has been relegated to narrow verticals. But is precision@5 or NCDG@1 really the right way to measure the effectiveness of interactive search? If you’re doing a known-item search, looking up a common factoid, etc., then perhaps it is. But for most searches, even ones that might be classified as precision-oriented ones, the searcher might wind up with several attempts to get at the answer. Dan Russell’s a Google a day lists exactly those kinds of challenges: find a fact that’s hard to find.

So how should we think about evaluating the kinds of searches that take more than one query, ones we might term session-based searches?

Read the post and the comments more than once!

Then think about how you would answer the questions raised, in or out of a topic map context.

Much food for thought here.

Stephen Robertson on Why Recall Matters

Monday, November 14th, 2011

Stephen Robertson on Why Recall Matters November 14th, 2011 by Daniel Tunkelang.

Daniel has the slides and an extensive summary of the presentation. Just to give you an taste of what awaits at Daniel’s post:

Stephen started by reminding us of ancient times (i.e., before the web), when at least some IR researchers thought in terms of set retrieval rather than ranked retrieval. He reminded us of the precision and recall “devices” that he’d described in his Salton Award Lecture — an idea he attributed to the late Cranfield pioneer Cyril Cleverdon. He noted that, while set retrieval uses distinct precision and recall devices, ranking conflates both into decision of where to truncate a ranked result list. He also pointed out an interesting asymmetry in the conventional notion of precision-recall tradeoff: while returning more results can only increase recall, there is no certainly that the additional results will decrease precision. Rather, this decrease is a hypothesis that we associate with systems designed to implement the probability ranking principle, returning results in decreasing order of probability of relevance.

Interested? There’s more where that came from, see like to Daniel’s post above.

Is Precision the Enemy of Serendipity?

Wednesday, September 28th, 2011

I was reading claims of increased precision by software X the other day. I probably have mentioned this before (and it wasn’t original, then or now) that precision seems to me to be the enemy of serendipity.

For example, when I was an undergraduate, the library would display all the recent issues of journals on long angled shelves. So it was possible to walk along looking at the new issues in a variety of areas with ease. As a political science major I could have gone directly to journals on political science. But I would have missed the Review of Metaphysics and/or the Journal of the History of Ideas, both of which are rich sources of ideas relevant to topic maps (and information systems more generally).

But precision about the information available, a departmental page that links only to electronic versions of journals relevant to the “discipline,” reduces the opportunity to perhaps recognize relevant literature outside the confines of a discipline.

True, I still browse a lot, otherwise I would not notice titles like: k-means Approach to the Karhunen-Loéve Transform (aka PCA – Principal Component Analysis). I knew that k-means was a form of clustering that could help with gathering members of collective topics together but quite honestly did not recognize Karhunen-Loéve Transform. I know it as either PCA – Principal Component Analysis, which I inserted in my blog title to help others recognize the technique.

Of course the problem is that sometimes I really want precision, perhaps I am rushed to finish a job or need to find a reference for a standard, etc. In those cases I don’t have time to wade through a lot of search results and appreciate whatever (little) precision I can wring out of a search engine.

Whether I want more precision or more serendipity varies on a day to day basis for me. How about you?

Evaluating Recommender Systems…

Friday, May 6th, 2011

Evaluating Recommender Systems – Explaining F-Score, Recall and Precision using Real Data Set from Apontador

Marcel Caraciolo says:

In this post I will introduce three metrics widely used for evaluating the utility of recommendations produced by a recommender system : Precision , Recall and F-1 Score. The F-1 Score is slightly different from the other ones, since it is a measure of a test’s accuracy and considers both the precision and the recall of the test to compute the final score.

Recommender systems are quite common and you are likely to encounter them while deploying topic maps. (Or you may wish to build one as part of a topic map system.)

Precision versus Recall

Monday, May 17th, 2010

High precision means resources are missed.

High recall means sifting garbage.

Q: Based on what assumption?

A: No assumption, observed behavior of texts and search engines.

Q: Based on what texts?

A: All texts, yes, all texts.

Q: Texts where same subjects have different works/phrases and same words/phrases mean different subjects?

A: Yes, those texts!

Q: If the subjects were identified in those texts, we could have high precision and high recall?

A: Yes, but not possible, too many texts!

Q: If the authors of new texts were to identify….

A: Sorry, no time, have to search now. Good-bye!