Stop Hosting Data and Code on your Lab Website by Stephen Turner.
From the post:
It’s happened to all of us. You read about a new tool, database, webservice, software, or some interesting and useful data, but when you browse to http://instititution.edu/~home/professorX/lab/data, there’s no trace of what you were looking for.
THE PROBLEM
This isn’t an uncommon problem. See the following two articles:
Schultheiss, Sebastian J., et al. “Persistence and availability of web services in computational biology.” PLoS one 6.9 (2011): e24914.
Wren, Jonathan D. “404 not found: the stability and persistence of URLs published in MEDLINE.” Bioinformatics 20.5 (2004): 668-672.
The first gives us some alarming statistics. In a survey of nearly 1000 web services published in the Nucleic Acids Web Server Issue between 2003 and 2009:
- Only 72% were still available at the published address.
- The authors could not test the functionality for 33% because there was no example data, and 13% no longer worked as expected.
- The authors could only confirm positive functionality for 45%.
- Only 274 of the 872 corresponding authors answered an email.
- Of these 78% said a service was developed by a student or temporary researcher, and many had no plan for maintenance after the researcher had moved on to a permanent position.
The Wren et al. paper found that of 1630 URLs identified in Pubmed abstracts, only 63% were consistently available. That rate was far worse for anonymous login FTP sites (33%).
Is this a problem for published data in the topic map community?
What data should we be archiving? Discussion lists? Blogs? Public topic maps?
What do you think of Stephen’s solution?