It isn’t difficult to find indexing terms to represent documents.
But, whatever indexing terms are used, a large portion of relevant documents will go unfound. As much as 80% of the relevant documents. See Size Really Does Matter… (A study of full text searching but the underlying problem is the same: “What term was used?”)
You read a document, are familiar with its author, concepts, literature it cites, the relationships of that literature to the document and the relationships between the ideas in the document. Now you have to choose one or more terms to represent all the semantics and semantic relationships in the document. The exercise you are engaged in is compressing the semantics in a document into one or more terms.
Unlike data compression, a la Shannon, the semantic compression algorithm used by any user is unknown. We know it isn’t possible to decompress an indexing term to recover all the semantics of a document it purports to represent. Since a term is used to represent several documents, the problem is even worse. We would have to decompress the term to recover the semantics of all the documents it represents.
Even without the algorithm used to assign indexing (or tagging) terms, investigation of semantic compression could be useful. For example, encoding the semantics of a set of documents (to a set depth) and then asking groups of users to assign those documents indexing or tagging terms. By varying the semantics in the documents, it may, emphasis on may, be possible to experimentally derive partial semantic decompression for some terms and classes of users.