Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 9, 2015

The National Centre for Biotechnology Information (NCBI) is part…

Filed under: Bioinformatics,DOI,R — Patrick Durusau @ 4:05 pm

The National Centre for Biotechnology Information (NCBI) is part…

The National Centre for Biotechnology Information (NCBI) is part of the National Institutes of Health’s National Library of Medicine, and most well-known for hosting Pubmed, the go-to search engine for biomedical literature – every (Medline-indexed) publication goes up there.

On a separate but related note, one thing I’m constantly looking to do is get DOIs for papers on demand. Most recently I found a package for R, knitcitations that generates bibliographies automatically from DOIs in the text, which worked quite well for a 10 page literature review chock full of references (I’m a little allergic to Mendeley and other clunky reference managers).

The “Digital Object Identifier”, as the name suggests, uniquely identifies a research paper (and recently it’s being co-opted to reference associated datasets). There’re lots of interesting and troublesome exceptions which I’ve mentioned previously, but in the vast majority of cases any paper published in at least the last 10 years or so will have one.

Although NCBI Pubmed does a great job of cataloguing biomedical literature, another site, doi.org provides a consistent gateway to the original source of the paper. You only need to append the DOI to “dx.doi.org/” to generate a working redirection link.

Last week the NCBI posted a webinar detailing the inner workings of Entrez Direct, the command line interface for Unix computers (GNU/Linux, and Macs; Windows users can fake it with Cygwin). It revolves around a custom XML parser written in Perl (typical for bioinformaticians) encoding subtle ‘switches’ to tailor the output just as you would from the web service (albeit with a fair portion more of the inner workings on show).

I’ve pieced together a basic pipeline, which has a function to generate citations for knitcitations from files listing basic bibliographic information, and in the final piece of the puzzle now have a custom function (or several) that does its best to find a single unique article matching the author, publication year, and title of a paper systematically, to find DOIs for entries in such a table.

BTW, the correct Github Gist link is: https://gist.github.com/lmmx/3c9406c4ec2c42b82158

The link in:

The scripts below are available here, I’ll update them on the GitHub Gist if I make amendments.

is broken.

A clever utility, although I am more in need of one for published CS literature. 😉

The copy to clipboard feature would be perfect for pasting into blogs posts.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress