Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 25, 2012

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE

Filed under: Bioinformatics,Biomedical,Text Mining — Patrick Durusau @ 7:15 pm

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE by Neveol, A., Wilbur, W. J., Lu, Z.

Abstract:

High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation.

Database URLs: MEDLINE http://www.ncbi.nlm.nih.gov/PubMed, GEO http://www.ncbi.nlm.nih.gov/geo/, PDB http://www.rcsb.org/pdb/.

A good illustration of the use of automated means to augment the capacity of curators of data links.

Or topic map authors performing the same task.

May 29, 2012

ProseVis

Filed under: Data Mining,Graphics,Text Analytics,Text Mining,Visualization — Patrick Durusau @ 10:12 am

ProseVis

A tool for exploring texts on non-word basis.

Or in the words of the project:

ProseVis is a visualization tool developed as part of a use case supported by the Andrew W. Mellon Foundation through a grant titled “SEASR Services,” in which we seek to identify other features than the “word” to analyze texts. These features comprise sound including parts-of-speech, accent, phoneme, stress, tone, break index.

ProseVis allows a reader to map the features extracted from OpenMary (http://mary.dfki.de/) Text-to-speech System and predictive classification data to the “original” text. We developed this project with the ultimate goal of facilitating a reader’s ability to analyze and disseminate the results in human readable form. Research has shown that mapping the data to the text in its original form allows for the kind of human reading that literary scholars engage: words in the context of phrases, sentences, lines, stanzas, and paragraphs (Clement 2008). Recreating the context of the page not only allows for the simultaneous consideration of multiple representations of knowledge or readings (since every reader’s perspective on the context will be different) but it also allows for a more transparent view of the underlying data. If a human can see the data (the syllables, the sounds, the parts-of-speech) within the context in which they are used to reading, with the data mapped back onto the full text, then the reader is empowered within this familiar context to read what might otherwise be an unfamiliar representation tabular representation of the text. For these reasons, we developed ProseVis as a reader interface to allow scholars to work with the data in a language or context in which we are used to saying things about the world.

Textual analysis tools are “smoking gun” detectors.

CEO is unlikely to make inappropriate comments in a spreadsheet or data feed. Emails on the other hand… 😉

Big or little data, the goal is to have the “right” data.

May 21, 2012

Call for Papers: PLoS Text Mining Collection

Filed under: Data Mining,Text Mining — Patrick Durusau @ 7:15 pm

Call for Papers: PLoS Text Mining Collection by Camron Assadi.

From the post:

The Public Library of Science (PLoS) seeks submissions in the broad field of text-mining research for a collection to be launched across all of its journals in 2013. All submissions submitted before October 30th, 2012 will be considered for the launch of the collection. Please read the following post for further information on how to submit your article.

The scientific literature is exponentially increasing in size, with thousands of new papers published every day. Few researchers are able to keep track of all new publications, even in their own field, reducing the quality of scholarship and leading to undesirable outcomes like redundant publication. While social media and expert recommendation systems provide partial solutions to the problem of keeping up with the literature, systematically identifying relevant articles and extracting key information from them can only come through automated text-mining technologies.

Research in text mining has made incredible advances over the last decade, driven through community challenges and increasingly sophisticated computational technologies. However, the promise of text mining to accelerate and enhance research largely has not yet been fulfilled, primarily since the vast majority of the published scientific literature is not published under an Open Access model. As Open Access publishing yields an ever-growing archive of unrestricted full-text articles, text mining will play an increasingly important role in drilling down to essential research and data in scientific literature in the 21st century scholarly landscape.

As part of its commitment to realizing the maximal utility of Open Access literature, PLoS is launching a collection of articles dedicated to highlighting the importance of research in the area of text mining. The launch of this Text Mining Collection complements related PLoS Collections on Open Access and Altmetrics (forthcoming), as well as the recent release of the PLoS Application Programming Interface, which provides an open API to PLoS journal content.

Highly recommend that you follow up on this publication opportunity.

I am less certain that: “…the promise of text mining to accelerate and enhance research largely has not yet been fulfilled, primarily since the vast majority of the published scientific literature is not published under an Open Access model.”

Don’t recall seeing any research on a connection between a lack of Open Access and failure of text mining to accelerate research.

CiteSeer and arXiv have long been freely available in full text. If research were going to leap forward from open access, the opportunity has been present.

Open access does advance research and discovery but it isn’t a magic bullet. Accelerating and enhancing research is going to require more than simply indexing literature. A lot more.

« Newer Posts

Powered by WordPress