DATA MINING: Accelerating Drug Discovery by Text Mining of Patents
From the contest page:
Patent documents contain important research that is valuable to the industry, business, law, and policy-making communities. Take the patent documents from the United States Patent and Trademark Office (USPTO) as examples. The structured data include: filing date, application date, assignees, UPC (US Patent Classification) codes, IPC codes, and others, while the unstructured segments include: title, abstract, claims, and description of the invention. The description of the invention can be further segmented into field of the invention, background, summary, and detailed description.
Given a set of “Source” patents or documents, we can use text mining to identify patents that are “similar” and “relevant” for the purpose of discovery of drug variants. These relevant patents could further be clustered and visualized appropriately to reveal implicit, previously unknown, and potentially useful patterns.
The eventual goal is to obtain a focused and relevant subset of patents, relationships and patterns to accelerate discovery of variations or evolutions of the drugs represented by the “source” patents.
Timeline:
- July 19, 2012 – Start of the Contest Part 1
- August 23, 2012 – Deadline for Submission of Onotolgy delieverables
- August 24 to August 29, 2012 – Crowdsourced And Expert Evaluation for Part 1. NO SUBMISSIONS ACCEPTED for contest during this week.
- Milestone 1: August 30, 2012 – Winner for Part 1 contest announced and Ontology release to the community for Contest Part 2
- Aug. 31 to Sept. 21, 2012 – Contest Part 2 Begins – Data Exploration / Text Mining of Patent Data
- Milestone 2: Sept. 21, 2012 – Deadline for Submission Contest Part 2. FULL CONTEST CLOSING.
- Sept. 22 to Oct. 5, 2012 – Crowdsourced and Expert Evaluation for contest Part 2
- Milestone 3: Oct. 5, 2012 – Conditional Winners Announcement
Possibly fertile ground for demonstrating the value of topic maps.
Particularly if you think of topic maps as curating search strategies and results.
Think about that for a moment: curating search strategies and results.
We have all asked reference librarians or other power searchers for assistance and watched while they discovered resources we didn’t imagine existed.
What if for medical expert searchers, we curate the “search request” along with the “search strategy” and the “result” of that search?
Such that we can match future search requests up with likely search strategies?
What we are capturing is the experts understanding and recognition of subjects not apparent to the average user. Capturing it in such a way as to make use of it again in the future.
If you aren’t interested in medical research, how about: Accelerating Discovery of Trolls by Text Mining of Patents? š
I first saw this at KDNuggets.
Update: 13 August 2012
Tweet by Lars Marius Garshol points to: Patent troll Intellectual Ventures is more like a HYDRA.
Even a low-end estimate ā the patents actually recorded in the USPTO as being assigned to one of those shells ā identifies around 10,000 patents held by the firm.
At the upper end of the researchersā estimates, Intellectual Ventures would rank as the fifth-largest patent holder in the United States and among the top fifteen patent holders worldwide.
As sad as that sounds, remember this is one (1) troll. There are others.