Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 25, 2014

The Anatomy of a Large-Scale Hypertextual Web Search Engine (Ambiguity)

Filed under: Search Engines,WWW — Patrick Durusau @ 12:26 pm

If you search for “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by Sergey Brin and Lawrence Page, will you get the “long” version or the “short” version?

The version found at: http://infolab.stanford.edu/pub/papers/google.pdf reports in its introduction:

(Note: There are two versions of this paper — a longer full version and a shorter printed version. The full version is available on the web and the conference CD-ROM.)

However, it doesn’t say whether it is the “longer full version” or the “shorter printed version.” Length, twenty (20) pages.

The version found at: http://snap.stanford.edu/class/cs224w-readings/Brin98Anatomy.pdf claims the following citation: “Computer Networks and ISDN Systems 30 (1998) 107-117.” Length, eleven (11) pages. It “looks” like a journal printed article.

Ironic that the search engine fails to distinguish between these two versions of such an important paper.

Perhaps the search confusion is justified to some degree because Lawrence Page’s publications at: http://research.google.com/pubs/LawrencePage.html reports:

Lawrence Page pub info

But if you access the PDF, you get the twenty (20) page version, not the eleven page version published at: Computer Networks and ISDN Systems 30 (1998) 107-117.

BTW, if you want to automatically distinguish the files, the file sizes on the two versions referenced above are: 123603 (the twenty (20) page version) and 1492735 (the eleven (11) page version). (The published version has the publisher logo, etc. that boosts the file size.)

If Google had a mechanism to accept explicit crowd input, that confusion and the typical confusion between slides and papers with the same name could be easily solved.

The first reader who finds either the paper or slides, types it as paper or slides. The characteristics of that file become the basis for distinguishing those files into paper or slides. When the next searcher is returned results including those files, they get a pointer to paper or slides?

If they don’t submit a change for paper or slides, that distinction becomes more certain.

I don’t know what the carrot would be for typing resources returned in search results, perhaps five (5) minutes of freedom from ads! 😉

Thoughts?

I first saw this in a tweet by onepaperperday.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress