Archive for the ‘Literature-based Discovery’ Category

Literature-Related Discovery (LRD)

Wednesday, September 14th, 2011

Literature-Related Discovery (LRD) by Kostoff, Ronald N. ; Block, Joel A. ; Solka, Jeffrey L. ; Briggs, Michael B. ; Rushenberg, Robert L. ; Stump, Jesse A. ; Johnson, Dustin ; Lyons, Terence J. ; Wyatt, Jeffrey R.

Short Abstract:

Discovery in science is the generation of novel, interesting, plausible, and intelligible knowledge about the objects of study. Literature-related discovery (LRD) is the linking of two or more literature concepts that have heretofore not been linked (i.e., disjoint), in order to produce novel interesting, plausible, and intelligible knowledge (i.e., potential discovery).

From the longer abstract in the monograph:

LRD offers the promise of large amounts of potential discovery, for the following reasons:

  • the burgeoning technical literature contains a very large pool of technical concepts in myriad technical areas;
  • researchers spend full time trying to cover the literature in their own research fields and are relatively unfamiliar with research in other especially disparate fields of research;
  • the large number of technical concepts (and disparate technical concepts) means that many combinations of especially disparate technical concepts exist
  • by the laws of probability, some of these combinations will produce novel, interesting, plausible, and intelligible knowledge about the objects of study

This monograph presents the LRD methodology and voluminous discovery results from five problem areas: four medical (treatments for Parkinson’s Disease (PD), Multiple Sclerosis (MS), Raynaud’s Phenomenon (RP), and Cataracts) and one non-medical (Water Purification (WP)). In particular, the ODS aspect of LRD is addressed, rather than the CDS aspect. In the presentation of potential discovery, a ‘vetting’ process is used that insures both requirements for ODS LBD are met: concepts are linked that have not been linked previously, and novel, interesting, plausible, and intelligible knowledge is produced.

The potential discoveries for the PD, MS, Cataracts, and WP problems are the first we have seen reported by this ODS LBD approach, and the numbers of potential discoveries for the ODS LBD benchmark RP problem are almost two orders of magnitude greater than those reported in the open literature by any other ODS LBD researcher who has addressed this benchmark RP problem. The WP problem is the first non-medical technical topic to have been addressed successfully by ODS LBD.

(ODS = open discovery system)

If you are looking for validation with supporting data for the literature-related discovery method, seek no further. The text plus annexes runs 884 pages.

This is a technique that fits quite well with topic maps.

PS: Yes, I know, this monograph says “literature-related discovery” (5.8 million “hits” in a popular search engine) versus “literature-based discovery” (6.3 million “hits” in the same search engine), another name for the same technique. Sigh, even semantic integration is afflicted with semantic integration woes.

Recent Advances in Literature Based Discovery

Wednesday, August 17th, 2011

Recent Advances in Literature Based Discovery


Literature Based Discovery (LBD) is a process that searches for hidden and important connections among information embedded in published literature. Employing techniques from Information Retrieval and Natural Language Processing, LBD has potential for widespread application yet is currently implemented primarily in the medical domain. This article examines several published LBD systems, comparing their descriptions of domain and input data, techniques to locate important concepts from text, models of discovery, experimental results, visualizations, and evaluation of the results. Since there is no comprehensive “gold standard, ” or consistent formal evaluation methodology for LBD systems, the development and usage of effective metrics for such systems is also discussed, providing several options. Also, since LBD is currently often time-intensive, requiring human input at one or more points, a fully-automated system will enhance the efficiency of the process. Therefore, this article considers methods for automated systems based on data mining.

Not “recent” now because the paper dates from 2006 but it is a good overview of Literature Based Discovery (LBD) at the time.

Reflective Random Indexing and indirect inference…

Tuesday, August 16th, 2011

Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections by Trevor Cohen, Roger Schvaneveldt, Dominic Widdows.


The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus.

The term “direct inference” is used for establishing a relationship between terms with a shared “bridging” term. That is the terms don’t co-occur in a text but share a third term that occurs in both texts. “Indirect inference,” that is finding terms with no shared “bridging” term is the focus of this paper.

BTW, if you don’t have access to the Journal of Biomedical Informatics version, try the draft: Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections