I just finished reading a delightful paper by W. J. Hutchins, ‘The concept of “aboutness” in subject indexing,’ which was presented at a Colloquium on Aboutness held by the Co-ordinate Indexing Group, 18 April 1977 and was reprinted in Readings in Information Retrieval, edited by Karen Sparck Jones and Peter Willett, Morgan Kaufman Publishers, Inc., San Francisco, California, 1997.
I discovered the paper in a hard copy of Readings in Information Retrieval, but it is also online, The concept of “aboutness” in subject indexing.
Hutchins writes in his abstract:
The common view of the ‘aboutness’ of documents is that the index entries (or classifications) assigned to documents represent or indicate in some way the total contents of documents; indexing and classifying are seen as processes involving the ‘summarization’ of the texts of documents. In this paper an alternative concept of ‘aboutness’ is proposed based on an analysis of the linguistic organization of texts, which is felt to be more appropriate in many indexing environments (particularly in non-specialized libraries and information services) and which has implications for the evaluation of the effectiveness of indexing systems.
You can read the details of how he suggests discovering the “aboutness” of documents but I was struck by his observation that the ‘summarization’ practice furthers the end of exhaustive search. Under Objectives of indexing, Hutchins says:
In the context of the special library and similarly specialized information services, the ‘summarization’ approach to subject indexing is most appropriate. Indexers are generally able to define clearly the interests and levels of knowledge of the readers they are serving; they are thus able to produce ‘summaries’ biased in the most helpful directions for their readers. More importantly, indexers can normally assume that most users are already very knowledgeable on most of the topics they look for in the indexes provided. They can assume that the usual search is for references to all documents treating a particular topic, since any one may have something ‘new’ to say about it that the reader did not know before. The fact that some references will lead users to texts which tell them nothing they did not previously know should not normally worry them unduly—it is the penalty they expect to pay for the assurance that the search has been as exhaustive as feasible.
Exhaustive search is one type of search that drives tests for the success of indexing:
The now traditional parameters of ‘recall’, ‘precision’ and ‘fallout’ are clearly valid for systems in which success is measured in terms of the ability to retrieve all documents which have something to say on a particular topic—that is to say, in systems based on the ‘summarization’ approach.*
You could say that full-text indexing/searching is different from ‘summarization’ by a professional indexer, but is it? Or have we simply substituted non-professional indexers into the process?
With ‘summarization,’ a professional indexer chooses terms that represent the content of a document. With full-text searching, the terms chosen on an ad-hoc basis by a user come to represent a ‘summary’ of entire documents. And in both cases, all the documents so summarized are returned to the user, in other words, the search is exhaustive.
Google/Bing/Yahoo! searches are examples of exhaustive searches of little value. I can find two or three thousand (2000-3000) new pages of material relevant to topic map issues everyday. Can you say information overload?
Or is that information volume overload? That out of the two or three thousand (2000-3000) pages per day, probably more like fifty to one hundred (50-100) pages are worth my attention. That is what “old-style” indexing brought to the professional researcher.