Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 24, 2013

Improve search relevancy…

Filed under: Relevance,Searching,Solr — Patrick Durusau @ 4:05 pm

Improve search relevancy by telling Solr exactly what you want by Doug Turnbull.

From the post:

To be successful, (e)dismax relies on avoiding a tricky problem with its scoring strategy. As we’ve discussed, dismax scores documents by taking the maximum score of all the fields that match a query. This is problematic as one field’s scores can’t easily be related to another’s. A good “text” match might have a score of 2, while a bad “title” score might be 10. Dismax doesn’t have a notion that “10” is bad for title, it only knows 10 > 2, so title matches dominate the final search results.

The best case for dismax is that there’s only one field that matches a query, so the resulting scoring reflects the consistency within that field. In short, dismax thrives with needle-in-a-haystack problems and does poorly with hay-in-a-haystack problems.

We need a different strategy for documents that have fields with a large amount of overlap. We’re trying to tell the difference between very similar pieces of hay. The task is similar to needing to find a good candidate for a job. If we wanted to query a search index of job candidates for “Solr Java Developer”, we’ll clearly match many different sections of our candidates’ resumes. Because of problems with dismax, we may end up with search results heavily sorted on the “objective” field.

(…)

Not unlike my comments yesterday about the similarity of searching and playing the lottery. The more you invest in the search, the more likely you are to get good results.

Doug analyzes what criteria should data meet in order to be a “good” result.

For a topic map, I would analyze what data does a subject need in order to be found by a typical request.

Both address the same problem, search, but from very different perspectives.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress