Archive for the ‘DOI’ Category

The National Centre for Biotechnology Information (NCBI) is part…

Monday, February 9th, 2015

The National Centre for Biotechnology Information (NCBI) is part…

The National Centre for Biotechnology Information (NCBI) is part of the National Institutes of Health’s National Library of Medicine, and most well-known for hosting Pubmed, the go-to search engine for biomedical literature – every (Medline-indexed) publication goes up there.

On a separate but related note, one thing I’m constantly looking to do is get DOIs for papers on demand. Most recently I found a package for R, knitcitations that generates bibliographies automatically from DOIs in the text, which worked quite well for a 10 page literature review chock full of references (I’m a little allergic to Mendeley and other clunky reference managers).

The “Digital Object Identifier”, as the name suggests, uniquely identifies a research paper (and recently it’s being co-opted to reference associated datasets). There’re lots of interesting and troublesome exceptions which I’ve mentioned previously, but in the vast majority of cases any paper published in at least the last 10 years or so will have one.

Although NCBI Pubmed does a great job of cataloguing biomedical literature, another site, provides a consistent gateway to the original source of the paper. You only need to append the DOI to “” to generate a working redirection link.

Last week the NCBI posted a webinar detailing the inner workings of Entrez Direct, the command line interface for Unix computers (GNU/Linux, and Macs; Windows users can fake it with Cygwin). It revolves around a custom XML parser written in Perl (typical for bioinformaticians) encoding subtle ‘switches’ to tailor the output just as you would from the web service (albeit with a fair portion more of the inner workings on show).

I’ve pieced together a basic pipeline, which has a function to generate citations for knitcitations from files listing basic bibliographic information, and in the final piece of the puzzle now have a custom function (or several) that does its best to find a single unique article matching the author, publication year, and title of a paper systematically, to find DOIs for entries in such a table.

BTW, the correct Github Gist link is:

The link in:

The scripts below are available here, I’ll update them on the GitHub Gist if I make amendments.

is broken.

A clever utility, although I am more in need of one for published CS literature. 😉

The copy to clipboard feature would be perfect for pasting into blogs posts.

Improving GitHub for science

Thursday, June 19th, 2014

Improving GitHub for science

From the post:

GitHub is being used today to build scientific software that’s helping find Earth-like planets in other solar systems, analyze DNA, and build open source rockets.

Seeing these projects and all this momentum within academia has pushed us to think about how we can make GitHub a better tool for research. As scientific experiments become more complex and their datasets grow, researchers are spending more of their time writing tools and software to analyze the data they collect. Right now though, these efforts often happen in isolation.

Citable code for academic software

Sharing your work is good, but collaborating while also getting required academic credit is even better. Over the past couple of months we’ve been working with the Mozilla Science Lab and data archivers, Figshare and Zenodo, to make it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.

DOIs form the backbone of the academic reference and metrics system. With a DOI for your GitHub repository archive, your code becomes citable. Our newest Guide explains how to create a DOI for your repository.

A move in the right direction to be sure but how much of a move is open to question.

Think of a DOI as the equivalent to a International Standard Book Number (ISBN). Using that as an identifier, you are sure to find a book that I cite.

But if the book is several hundred pages long, you may find my “citing it” by an ISBN identifier alone isn’t quite good enough.

The same will be true for some citations using DOIs for Github repositories. Better than nothing at all, but falls short of a robust identifier for material within a Github archive.

I first saw this in a tweet by Peter Kraker.