Mining information across multiple domains: A case study of application to patent laws and regulations in biotechnology by Hang Yu, Siddharth Taduri, Jay Kesan, Gloria Lau and Kincho H. Law.
Abstract:
In this paper, we present a framework that can process a user query for retrieval of information from documents of different properties across multiple domains, with specific application to patent laws and regulations. The framework has three basic components. The first component is ontology mapping and generation. What happens is that the keywords entered by users are mapped into a subset of relevant keywords. This step is performed by looking up those words in an ontology database. The second component is the joint and cross search in various document domains; in our case, they are patents and scientific publications. The last component is to modify the search results by applying user feedback statistics. The results of feedback will be saved as metadata for future uses.
A case example is given to demonstrate how results from multiple domain searches can be combined using ontology and cross referencing. We use an example of well-known biotechnology patents on erythropoietin (EPO) and give detailed analysis on each document domain with this keyword. Relationships between each domain are demonstrated.
A user feedback mechanism is also discussed in this paper. The ability to take user feedback into the framework is important. There is no doubt that domain knowledge from expert or experienced users could be a very good compliment to the proposed system. Both direct and indirect user feedbacks are discussed.
The full text of this article is available now so I suggest that you grab a copy. Apparently some content from the journal is freely available but older material is not.
This a *must read* article.
I particularly liked the use of statistical user feedback to drive the feed back process. Not as exact as having experts curate every mention but a lot less expensive at the same time.
So, do all the NLP, statistics, probability, data mining, etc., posts seem a bit more relevant to topic maps now?
No one method or approach is going to produce as good a result as taking the strong parts from a number of approaches and being willing to consider both additions as well as deletions to your method matrix.