Archive for the ‘Drug Discovery’ Category

Computational drug repositioning through heterogeneous network clustering

Tuesday, November 11th, 2014

Computational drug repositioning through heterogeneous network clustering by Wu C, Gudivada RC, Aronow BJ, Jegga AG. (BMC Syst Biol. 2013;7 Suppl 5:S6. doi: 10.1186/1752-0509-7-S5-S6. Epub 2013 Dec 9.)



Given the costly and time consuming process and high attrition rates in drug discovery and development, drug repositioning or drug repurposing is considered as a viable strategy both to replenish the drying out drug pipelines and to surmount the innovation gap. Although there is a growing recognition that mechanistic relationships from molecular to systems level should be integrated into drug discovery paradigms, relatively few studies have integrated information about heterogeneous networks into computational drug-repositioning candidate discovery platforms.


Using known disease-gene and drug-target relationships from the KEGG database, we built a weighted disease and drug heterogeneous network. The nodes represent drugs or diseases while the edges represent shared gene, biological process, pathway, phenotype or a combination of these features. We clustered this weighted network to identify modules and then assembled all possible drug-disease pairs (putative drug repositioning candidates) from these modules. We validated our predictions by testing their robustness and evaluated them by their overlap with drug indications that were either reported in published literature or investigated in clinical trials.


Previous computational approaches for drug repositioning focused either on drug-drug and disease-disease similarity approaches whereas we have taken a more holistic approach by considering drug-disease relationships also. Further, we considered not only gene but also other features to build the disease drug networks. Despite the relative simplicity of our approach, based on the robustness analyses and the overlap of some of our predictions with drug indications that are under investigation, we believe our approach could complement the current computational approaches for drug repositioning candidate discovery.

A reminder that data clustering isn’t just of academic interest but is useful in highly remunerative fields as well. 😉

There is a vast amount of literature on data clustering but I don’t know if there is a collection of data clustering patterns?

That is a work that summarizes where data clustering has been used by domain and the similarities on which clustering was performed.

In this article, the clustering was described as:

The nodes represent drugs or diseases while the edges represent shared gene, biological process, pathway, phenotype or a combination of these features.

Has that been used elsewhere in medical research?

Not that clustering should be limited to prior patterns but prior patterns could stimulate new patterns to be applied.


…[S]emantically enriched open pharmacological space…

Wednesday, July 16th, 2014

Scientific competency questions as the basis for semantically enriched open pharmacological space development by Kamal Azzaoui, et al. (Drug Discovery Today, Volume 18, Issues 17–18, September 2013, Pages 843–852)


Molecular information systems play an important part in modern data-driven drug discovery. They do not only support decision making but also enable new discoveries via association and inference. In this review, we outline the scientific requirements identified by the Innovative Medicines Initiative (IMI) Open PHACTS consortium for the design of an open pharmacological space (OPS) information system. The focus of this work is the integration of compound–target–pathway–disease/phenotype data for public and industrial drug discovery research. Typical scientific competency questions provided by the consortium members will be analyzed based on the underlying data concepts and associations needed to answer the questions. Publicly available data sources used to target these questions as well as the need for and potential of semantic web-based technology will be presented.

Pharmacology may not be your space but this is a good example of what it takes for semantic integration of resources in a complex area.

Despite the “…you too can be a brain surgeon with our new web-based app…” from various sources, semantic integration has been, is and will remain difficult under the best of circumstances.

I don’t say that to discourage anyone but to avoid the let-down when integration projects don’t provide easy returns.

It is far better to plan for incremental and measurable benefits along the way than to fashion grandiose goals that are ever receding on the horizon.

I first saw this in a tweet by ChemConnector.

Scientific Lenses over Linked Data… [Operational Equivalence]

Sunday, April 28th, 2013

Scientific Lenses over Linked Data: An approach to support task specifi c views of the data. A vision. by Christian Brenninkmeijer, Chris Evelo, Carole Goble, Alasdair J G Gray, Paul Groth, Steve Pettifer, Robert Stevens, Antony J Williams, and Egon L Willighagen.


Within complex scienti fic domains such as pharmacology, operational equivalence between two concepts is often context-, user- and task-specifi c. Existing Linked Data integration procedures and equivalence services do not take the context and task of the user into account. We present a vision for enabling users to control the notion of operational equivalence by applying scienti c lenses over Linked Data. The scientifi c lenses vary the links that are activated between the datasets which aff ects the data returned to the user.

Two additional quotes from this paper should convince you of the importance of this work:

We aim to support users in controlling and varying their view of the data by applying a scientifi c lens which govern the notions of equivalence applied to the data. Users will be able to change their lens based on the task and role they are performing rather than having one fixed lens. To support this requirement, we propose an approach that applies context dependent sets of equality links. These links are stored in a stand-off fashion so that they are not intermingled with the datasets. This allows for multiple, context-dependent, linksets that can evolve without impact on the underlying datasets and support diff ering opinions on the relationships between data instances. This flexibility is in contrast to both Linked Data and traditional data integration approaches. We look at the role personae can play in guiding the nature of relationships between the data resources and the desired a ffects of applying scientifi c lenses over Linked Data.


Within scienti fic datasets it is common to fi nd links to the “equivalent” record in another dataset. However, there is no declaration of the form of the relationship. There is a great deal of variation in the notion of equivalence implied by the links both within a dataset’s usage and particularly across datasets, which degrades the quality of the data. The scienti fic user personae have very di fferent needs about the notion of equivalence that should be applied between datasets. The users need a simple mechanism by which they can change the operational equivalence applied between datasets. We propose the use of scientifi c lenses.

Obvious questions:

Does your topic map software support multiple operational equivalences?

Does your topic map interface enable users to choose “lenses” (I like lenses better than roles) to view equivalence?

Does your topic map software support declaring the nature of equivalence?

I first saw this in the slide deck: Scientific Lenses: Supporting Alternative Views of the Data by Alasdair J G Gray at: 4th Open PHACTS Community Workshop.

BTW, the notion of equivalence being represented by “links” reminds me of a comment Peter Neubauer (Neo4j) once made to me, saying that equivalence could be modeled as edges. Imagine typing equivalence edges. Will have to think about that some more.

4th Open PHACTS Community Workshop (slides) [Operational Equivalence]

Sunday, April 28th, 2013

4th Open PHACTS Community Workshop : Using the power of Open PHACTS

From the post:

The fourth Open PHACTS Community Workshop was held at Burlington House in London on April 22 and 23, 2013. The Workshop focussed on “Using the Power of Open PHACTS” and featured the public release of the Open PHACTS application programming interface (API) and the first Open PHACTS example app, ChemBioNavigator.

The first day featured talks describing the data accessible via the Open PHACTS Discovery Platform and technical aspects of the API. The use of the API by example applications ChemBioNavigator and PharmaTrek was outlined, and the results of the Accelrys Pipeline Pilot Hackathon discussed.

The second day involved discussion of Open PHACTS sustainability and plans for the successor organisation, the Open PHACTS Foundation. The afternoon was attended by those keen to further discuss the potential of the Open PHACTS API and the future of Open PHACTS.

During talks, especially those detailing the Open PHACTS API, a good number of signup requests to the API via were received. The hashtag #opslaunch was used to follow reactions to the workshop on Twitter (see storify), and showed the response amongst attendees to be overwhelmingly positive.

This summary is followed by slides from the two days of presentations.

Not like being there but still quite useful.

As a matter of fact, I found a lead on “operational equivalence” with this data set. More to follow in a separate post.

EU-ADR Web Platform

Friday, September 7th, 2012

EU-ADR Web Platform

I was disappointed to not find the UMLS concepts and related terms mapping for participants in the EU-ADR project.

I did find these workflows at the EU-ADR Web Platform:


In the filtering process of well known signals, the aim of the “MEDLINE ADR” workflow is to automate the search of publications related to ADRs corresponding to a given drug/adverse event association. To do so, we defined an approach based on the MeSH thesaurus, using the subheadings «chemically induced» and «adverse effects» with the “Pharmacological Action” knowledge. Using a threshold of ≄3 extracted publications, the automated search method, presented a sensitivity of 93% and a specificity of 97% on the true positive and true negative sets (WP 2.2). We then determined a threshold number of extracted publications ≄ 3 to confirm the knowledge of this association in the literature. This approach offers the opportunity to automatically determine if an ADR (association of a drug and an adverse event) has already been described in MEDLINE. However, the causality relationship between the drug and an event may be judged only by an expert reading the full text article and determining if the methodology of this article was correct and if the association is statically significant.

MEDLINE Co-occurrence

The “MEDLINE Co-occurrence” workflow performs a comprehensive data processing operation, searching the given Drug-Event combination in the PubMed database. Final workflow results include a final score, measuring found drugs relevance regarding the initial Drug-Event pair, as well as pointers to web pages for the discovered drugs.


The “DailyMed” workflow performs a comprehensive data processing operation, searching the given Drug-Event combination in the DailyMed database. Final workflow results include a final score, measuring found drugs relevance regarding the initial Drug-Event pair, as well as pointers to web pages for the discovered drugs.


The “DrugBank” workflow performs a comprehensive data processing operation, searching the given Drug-Event combination in the DrugBank database. Final workflow results include a final score, measuring found drugs relevance regarding the initial Drug-Event pair, as well as pointers to web pages for the discovered drugs.


The “Substantiation” workflow tries to establish a connection between the clinical event and the drug through a gene or protein, by identifying the proteins that are targets of the drug and are also associated with the event. In addition it also considers information about drug metabolites in this process. In such cases it can be argued that the binding of the drug to the protein would lead to the observed event phenotype. Associations between the event and proteins are found by querying our integrated gene-disease association database (Bauer-Mehren, et al., 2010). As this database provides annotations of the gene-disease associations to the articles reporting the association and in case of text-mining derived associations even the exact sentence, the article or sentence can be studied in more detail in order to inspect the supporting evidence for each gene-disease association. It has to be mentioned that our gene-disease association database also contains information about genetic variants or SNPs and their association to diseases or adverse drug events. The methodology for providing information about the binding of a drug (or metabolite) to protein targets is reported in deliverable 4.2, and includes extraction from different databases (annotated chemical libraries) and application of prediction methods based on chemical similarity.

A glimpse of what is state of the art today and a basis for building better tools for tomorrow.

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

Sunday, August 12th, 2012

DATA MINING: Accelerating Drug Discovery by Text Mining of Patents

From the contest page:

Patent documents contain important research that is valuable to the industry, business, law, and policy-making communities. Take the patent documents from the United States Patent and Trademark Office (USPTO) as examples. The structured data include: filing date, application date, assignees, UPC (US Patent Classification) codes, IPC codes, and others, while the unstructured segments include: title, abstract, claims, and description of the invention. The description of the invention can be further segmented into field of the invention, background, summary, and detailed description.

Given a set of “Source” patents or documents, we can use text mining to identify patents that are “similar” and “relevant” for the purpose of discovery of drug variants. These relevant patents could further be clustered and visualized appropriately to reveal implicit, previously unknown, and potentially useful patterns.

The eventual goal is to obtain a focused and relevant subset of patents, relationships and patterns to accelerate discovery of variations or evolutions of the drugs represented by the “source” patents.


  • July 19, 2012 – Start of the Contest Part 1
  • August 23, 2012 – Deadline for Submission of Onotolgy delieverables 
  • August 24 to August 29, 2012 – Crowdsourced And Expert Evaluation for Part 1. NO SUBMISSIONS ACCEPTED for contest during this week.
  • Milestone 1: August 30, 2012 – Winner for Part 1 contest announced and Ontology release to the community for Contest Part 2
  • Aug. 31 to Sept. 21, 2012 – Contest Part 2 Begins – Data Exploration / Text Mining of Patent Data
  • Milestone 2: Sept. 21, 2012 – Deadline for Submission Contest Part 2. FULL CONTEST CLOSING.
  • Sept. 22 to Oct. 5, 2012 – Crowdsourced and Expert Evaluation for contest Part 2
  • Milestone 3: Oct. 5, 2012 – Conditional Winners Announcement 

Possibly fertile ground for demonstrating the value of topic maps.

Particularly if you think of topic maps as curating search strategies and results.

Think about that for a moment: curating search strategies and results.

We have all asked reference librarians or other power searchers for assistance and watched while they discovered resources we didn’t imagine existed.

What if for medical expert searchers, we curate the “search request” along with the “search strategy” and the “result” of that search?

Such that we can match future search requests up with likely search strategies?

What we are capturing is the experts understanding and recognition of subjects not apparent to the average user. Capturing it in such a way as to make use of it again in the future.

If you aren’t interested in medical research, how about: Accelerating Discovery of Trolls by Text Mining of Patents? 😉

I first saw this at KDNuggets.

Update: 13 August 2012

Tweet by Lars Marius Garshol points to: Patent troll Intellectual Ventures is more like a HYDRA.

Even a low-end estimate – the patents actually recorded in the USPTO as being assigned to one of those shells – identifies around 10,000 patents held by the firm.

At the upper end of the researchers’ estimates, Intellectual Ventures would rank as the fifth-largest patent holder in the United States and among the top fifteen patent holders worldwide.

As sad as that sounds, remember this is one (1) troll. There are others.

Network Science – NetSci

Monday, December 27th, 2010

Warning: NetSci has serious issues with broken links.

Network Science – NetSci: An Extensive Set of Resources for Science in Drug Discovery

From the website:

Welcome to the Network Science website. This site is dedicated to the topics of pharmaceutical research and the use of advanced techniques in the discovery of new therapeutic agents. We endeavor to provide a comprehensive look at the industry and the tools that are in use to speed drug discovery and development.

I stumbled across this website while looking for computational chemistry resources.

Pharmaceutical research is rich in topic map type issues, from mapping across the latest reported findings in journal literature to matching those identifications to results in computational software.


  1. Develop a drug discovery account that illustrates how topic maps might or might not help in that process. (5-7 pages, citations)
  2. What benefits would a topic map bring to drug discovery and how would you illustrate those benefits for a grant application either to a pharmaceutical company or granting agency? (3-5 pages, citations)
  3. Where would you submit a grant application based on #2? (3-5 pages, citations) (Requires researching what activities in drug development are funded by particular entities.)
  4. Prepare a grant application based on the answer to #3. (length depends on grantor requirements)
  5. For extra credit, update and/or correct twenty (20) links from this site. (Check with me first, I will maintain a list of those already corrected.)