Improving GitHub for science « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 15, 2014

Improving GitHub for science

Filed under: Github,Identifiers,Identity — Patrick Durusau @ 1:53 pm

From the post:

GitHub is being used today to build scientific software that’s helping find Earth-like planets in other solar systems, analyze DNA, and build open source rockets.

Seeing these projects and all this momentum within academia has pushed us to think about how we can make GitHub a better tool for research. As scientific experiments become more complex and their datasets grow, researchers are spending more of their time writing tools and software to analyze the data they collect. Right now though, these efforts often happen in isolation.

Citable code for academic software

Sharing your work is good, but collaborating while also getting required academic credit is even better. Over the past couple of months we’ve been working with the Mozilla Science Lab and data archivers, Figshare and Zenodo, to make it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.

DOIs form the backbone of the academic reference and metrics system. With a DOI for your GitHub repository archive, your code becomes citable. Our newest Guide explains how to create a DOI for your repository.

A great step forward, but like http: pointing to entire resources, it is of limited utility.

Assume that I am using a DOI for a software archive and I want to point to and identify a code snippet in the archive that implements Fast Fourier Transform (FFT). My first task is to point to that snippet. A second task would be to create an association between the snippet and my annotation that it implements the Fast Fourier Transform. Yet a third task would be to gather up all the pointers that point to implementations of the Fast Fourier Transform (FFT).

For all of those tasks, I need to identify and point to a particular part of the underlying source code.

Unfortunately, a DOI is limited to identifying a single entity.

Each DOI® name is a unique “number”, assigned to identify only one entity. Although the DOI system will assure that the same DOI name is not issued twice, it is a primary responsibility of the Registrant (the company or individual assigning the DOI name) and its Registration Agency to identify uniquely each object within a DOI name prefix. (DOI Handbook

How would you extend the DOIs being used by GitHub to identify code fragments within source code repositories?

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 15, 2014

Improving GitHub for science

No Comments