Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 29, 2014

Applying linked data approaches to pharmacology:…

Applying linked data approaches to pharmacology: Architectural decisions and implementation by Alasdair J.G. Gray, et. al.

Abstract:

The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architecture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.

An alpha version of the OPS platform is currently available to the Open PHACTS consortium and a first public release will be made in late 2012, see http://www.openphacts.org/ for details.

The paper acknowledges that present database entries lack semantics.

A further challenge is the lack of semantics associated with links in traditional database entries. For example, the entry in UniProt for the protein “kinase C alpha type homo sapien4 contains a link to the Enzyme database record 5, which has complementary data about the same protein and thus the identifiers can be considered as being equivalent. One approach to resolve this, proposed by Identifiers.org, is to provide a URI for the concept which contains links to the database records about the concept [27]. However, the UniProt entry also contains a link to the DrugBank compound “Phosphatidylserine6. Clearly, these concepts are not identical as one is a protein and the other a chemical compound. The link in this case is representative of some interaction between the compound and the protein, but this is left to a human to interpret. Thus, for successful data integration one must devise strategies that address such inconsistencies within the existing data.

I would have said databases lack properties to identify the subjects in question but there is little difference in the outcome of our respective positions, i.e., we need more semantics to make robust use of existing data.

Perhaps even more importantly, the paper treats “equality” as context dependent:

Equality is context dependent

Datasets often provide links to equivalent concepts in other datasets. These result in a profusion of “equivalent” identifiers for a concept. Identifiers.org provide a single identifier that links to all the underlying equivalent dataset records for a concept. However, this constrains the system to a single view of the data, albeit an important one.

A novel approach to instance level links between the datasets is used in the OPS platform. Scientists care about the types of links between entities: different scientists will accept concepts being linked in different ways and for different tasks they are willing to accept different forms of relationships. For example, when trying to find the targets that a particular compound interacts with, some data sources may have created mappings to gene rather than protein identifiers: in such instances it may be acceptable to users to treat gene and protein IDs as being in some sense equivalent. However, in other situations this may not be acceptable and the OPS platform needs to allow for this dynamic equivalence within a scientific context. As a consequence, rather than hard coding the links into the datasets, the OPS platform defers the instance level links to be resolved during query execution by the Identity Mapping Service (IMS). Thus, by changing the set of dataset links used to execute the query, different interpretations over the data can be provided.

Opaque mappings between datasets, i.e., mappings that don’t assign properties to source, target and then say what properties or conditions must be met for the mapping to be vaild, are of little use. Rely on opaque mappings at your own risk.

On the other hand, I fully agree that equality is context dependent and the choice of the criteria for equivalence should be left up to users. I suppose in that sense if users wanted to rely on opaque mappings, that would be their choice.

While an exciting paper, it is discussing architectural decisions and so we are not at the point of debating these issues in detail. It promises to be an exciting discussion!

April 7, 2013

Open PHACTS

Open PHACTS – Open Pharmacological Space

From the homepage:

Open PHACTS is building an Open Pharmacological Space in a 3-year knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA).

The project is due to end in March 2014, and aims to deliver a sustainable service to continue after the project funding ends. The project consortium consists of leading academics in semantics, pharmacology and informatics, driven by solid industry business requirements: 28 partners, including 9 pharmaceutical companies and 3 biotechs.

Sourcecode has just appeared on GibHub: OpenPHACTS.

Important to different communities for different reasons. My interest isn’t the same as BigPharma. 😉

A project to watch as they navigate the thickets of vocabularies, ontologies and other semantically diverse information sources.

July 26, 2012

Mining the pharmacogenomics literature—a survey of the state of the art

Filed under: Bioinformatics,Genome,Pharmaceutical Research,Text Mining — Patrick Durusau @ 1:23 pm

Mining the pharmacogenomics literature—a survey of the state of the art by Udo Hahn, K. Bretonnel Cohen, and Yael Garten. (Brief Bioinform (2012) 13 (4): 460-494. doi: 10.1093/bib/bbs018)

Abstract:

This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.

At thirty-six (36) pages and well over 200 references, this is going to take a while to digest.

Some questions to be thinking about while reading:

How are entity recognition issues same/different?

What techniques have you seen before? How different/same?

What other techniques would you suggest?

December 27, 2010

Network Science – NetSci

Filed under: Bioinformatics,Cheminformatics,Drug Discovery,Pharmaceutical Research — Patrick Durusau @ 2:20 pm

Warning: NetSci has serious issues with broken links.

Network Science – NetSci: An Extensive Set of Resources for Science in Drug Discovery

From the website:

Welcome to the Network Science website. This site is dedicated to the topics of pharmaceutical research and the use of advanced techniques in the discovery of new therapeutic agents. We endeavor to provide a comprehensive look at the industry and the tools that are in use to speed drug discovery and development.

I stumbled across this website while looking for computational chemistry resources.

Pharmaceutical research is rich in topic map type issues, from mapping across the latest reported findings in journal literature to matching those identifications to results in computational software.

Questions:

  1. Develop a drug discovery account that illustrates how topic maps might or might not help in that process. (5-7 pages, citations)
  2. What benefits would a topic map bring to drug discovery and how would you illustrate those benefits for a grant application either to a pharmaceutical company or granting agency? (3-5 pages, citations)
  3. Where would you submit a grant application based on #2? (3-5 pages, citations) (Requires researching what activities in drug development are funded by particular entities.)
  4. Prepare a grant application based on the answer to #3. (length depends on grantor requirements)
  5. For extra credit, update and/or correct twenty (20) links from this site. (Check with me first, I will maintain a list of those already corrected.)

Powered by WordPress