## Archive for the ‘Open Science’ Category

### 5 Million Fungi

Friday, February 17th, 2017

5 Million Fungi – Every living thing is crawling with microorganisms — and you need them to survive by Dan Fost.

Fungus is growing in Brian Perry’s refrigerator — and not the kind blooming in someone’s forgotten lunch bag.

No, the Cal State East Bay assistant professor has intentionally packed his shelves with 1,500 Petri dishes, each containing a tiny sample of fungus from native and endemic Hawaiian plant leaves. The 45-year-old mycologist (a person who studies the genetic and biochemical properties of fungi, among many other things) figures hundreds of those containers hold heretofore-unknown species.

The professor’s work identifying and cataloguing fungal endophytes — microscopic fungi that live inside plants — carries several important implications. Scientists know little about the workings of these fungi, making them a particularly exciting frontier for examination: Learning about endophytes’ relationships to their host plants could save many endangered species; farmers have begun tapping into their power to help crops build resistance to pathogens; and researchers are interested in using them to unlock new compounds to make crucial medicines for people.

The only problem — finding, naming, and preserving them before it’s too late.
… (emphasis in original)

According to Naveed Davoodian in A Long Way to Go: Protecting and Conserving Endangered Fungi, you don’t need to travel to exotic locales to contribute to our knowledge of fungi in the United States.

Willow Nero, editor of McIlvainea: Journal of American Amateur Mycology writes in Commit to Mycology:

I hope you’ll do your part as a NAMA member by renewing your commitment to mycology—the science, that is. When we convene at the North American foray later this year, our leadership will present (and later publish in this journal) clear guidelines so mycologists everywhere can collect reliable data about fungi as part of the North American Mycoflora Project. We will let you know where to start and how to carry your momentum. All we ask is that you join us. Catalogue them all! Or at least set an ambitious goal for yourself or your local NAMA-affiliated club.

I did peek at the North American Mycoflora Project, which has this challenging slogan:

Without a sequenced specimen, it’s a rumor

Sounds like your kind of folks. 😉

Mycology as a hobby has three distinct positives: One, you are not in front your computer monitor. Two, you are gaining knowledge. Three, (hopefully) you will decide to defend fellow residents who cannot defend themselves.

### Open Science: Too Much Talk, Too Little Action [Lessons For Political Opposition]

Monday, February 6th, 2017

From the post:

Starting this year, I will stop traveling to any speaking engagements on open science (or, more generally, infrastructure reform), as long as these events do not entail a clear goal for action. I have several reasons for this decision, most of them boil down to a cost/benefit estimate. The time spent traveling does not seem worth the hardly noticeable benefits any more.

I got involved in Open Science more than 10 years ago. Trying to document the point when it all started for me, I found posts about funding all over my blog, but the first blog posts on publishing were from 2005/2006, the announcement of me joining the editorial board of newly founded PLoS ONE late 2006 and my first post on the impact factor in 2007. That year also saw my first post on how our funding and publishing system may contribute to scientific misconduct.

In an interview on the occasion of PLoS ONE’s ten-year anniversary, PLoS mentioned that they thought the publishing landscape had changed a lot in these ten years. I replied that, looking back ten years, not a whole lot had actually changed:

• Publishing is still dominated by the main publishers which keep increasing their profit margins, sucking the public teat dry
• Most of our work is still behind paywalls
• You won’t get a job unless you publish in high-ranking journals.
• Higher ranking journals still publish less reliable science, contributing to potential replication issues
• The increase in number of journals is still exponential
• Libraries are still told by their faculty that subscriptions are important
• The digital functionality of our literature is still laughable
• There are no institutional solutions to sustainably archive and make accessible our narratives other than text, or our code or our data

The only difference in the last few years really lies in the fraction of available articles, but that remains a small minority, less than 30% total.

So the work that still needs to be done is exactly the same as it was at the time Stevan Harnad published his “Subversive Proposal” , 23 years ago: getting rid of paywalls. This goal won’t be reached until all institutions have stopped renewing their subscriptions. As I don’t know of a single institution without any subscriptions, that task remains just as big now as it was 23 years ago. Noticeable progress has only been on the margins and potentially in people’s heads. Indeed, now only few scholars haven’t heard of “Open Access”, yet, but apparently without grasping the issues, as my librarian colleagues keep reminding me that their faculty believe open access has already been achieved because they can access everything from the computer in their institute.

What needs to be said about our infrastructure has been said, both in person, and online, and in print, and on audio, and on video. Those competent individuals at our institutions who make infrastructure decisions hence know enough to be able to make their rational choices. Obviously, if after 23 years of talking about infrastructure reform, this is the state we’re in, our approach wasn’t very effective and my contribution is clearly completely negligible, if at all existent. There is absolutely no loss if I stop trying to tell people what they already should know. After all, the main content of my talks has barely changed in the last eight or so years. Only more recent evidence has been added and my conclusions have become more radical, i.e., trying to tackle the radix (Latin: root) of the problem, rather than palliatively care for some tangential symptoms.

The line:

What needs to be said about our infrastructure has been said, both in person, and online, and in print, and on audio, and on video.

is especially relevant in light of the 2016 presidential election and the fund raising efforts of organizations that form the “political opposition.”

I agree the current US President should be opposed.

But the organizations seeking funding failed to stop his rise to power.

Whether their failure was due to organizational defects or poor strategies is really beside the point. They failed.

Why should I enable them to fail again?

One data point, the Women’s March on Washington was NOT organized by organizations with permanents staff and offices in Washington or elsewhere.

Is your contribution supporting staffs and offices of the self-righteous (the primary function of old line organizations) or investigation, research, reporting and support of boots on the ground?

Government excesses are not stopped by bewailing our losses but by making government agents bewail theirs.

Tuesday, June 21st, 2016

Abstract:

Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: U.S. academic libraries paid $1.7 billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers and their final published counterparts. This comparison had two working assumptions: 1) if the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and 2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications. The authors have performed a very detailed analysis of pre-prints, 90% – 95% of which are published as open pre-prints first, to conclude there is no appreciable difference between the pre-prints and the final published versions. I take “…no appreciable difference…” to mean academic publishers and the peer review process, despite claims to the contrary, contribute little or no value to academic publications. How’s that for a bargaining chip in negotiating subscription prices? ### Where Has Sci-Hub Gone? Saturday, June 18th, 2016 While I was writing about the latest EC idiocy (link tax), I was reminded of Sci-Hub. Just checking to see if it was still alive, I tried http://sci-hub.io/. 404 by standard DNS service. If you are having the same problem, Mike Masnick reports in Sci-Hub, The Repository Of ‘Infringing’ Academic Papers Now Available Via Telegram, you can access Sci-Hub via: I’m not on Telegram, yet, but that may be changing soon. 😉 BTW, while writing this update, I stumbled across: The New Napster: How Sci-Hub is Blowing Up the Academic Publishing Industry by Jason Shen. From the post: This is obviously piracy. And Elsevier, one of the largest academic journal publishers, is furious. In 2015, the company earned$1.1 billion in profits on \$2.9 billion in revenue [2] and Sci-hub directly attacks their primary business model: subscription service it sells to academic organizations who pay to get access to its journal articles. Elsevier filed a lawsuit against Sci-Hub in 2015, claiming Sci-hub is causing irreparable injury to the organization and its publishing partners.

But while Elsevier sees Sci-Hub as a major threat, for many scientists and researchers, the site is a gift from the heavens, because they feel unfairly gouged by the pricing of academic publishing. Elsevier is able to boast a lucrative 37% profit margin because of the unusual (and many might call exploitative) business model of academic publishing:

• Scientists and academics submit their research findings to the most prestigious journal they can hope to land in, without getting any pay.
• The journal asks leading experts in that field to review papers for quality (this is called peer-review and these experts usually aren’t paid)
• Finally, the journal turns around and sells access to these articles back to scientists/academics via the organization-wide subscriptions at the academic institution where they work or study

There’s piracy afoot, of that I have no doubt.

Elsevier:

• Relies on research it does not sponsor
• Research is published in journals of value only because of the free contributions to them
• Elsevier makes a 37% profit off of that free content

There is piracy but Jason fails to point to Elsevier as the pirate.

Sci-Hub/Alexandra Elbakyan is re-distributing intellectual property that was stolen by Elsevier from the academic community, for its own gain.

It’s time to bring Elsevier’s reign of terror against the academic community to an end. Support Sci-Hub in any way possible.

### Reproducible Research Resources for Research(ing) Parasites

Friday, June 3rd, 2016

From the post:

Two new research papers on scabies and tapeworms published today showcase a new collaboration with protocols.io. This demonstrates a new way to share scientific methods that allows scientists to better repeat and build upon these complicated studies on difficult-to-study parasites. It also highlights a new means of writing all research papers with citable methods that can be updated over time.

While there has been recent controversy (and hashtags in response) from some of the more conservative sections of the medical community calling those who use or build on previous data “research parasites”, as data publishers we strongly disagree with this. And also feel it is unfair to drag parasites into this when they can teach us a thing or two about good research practice. Parasitology remains a complex field given the often extreme differences between parasites, which all fall under the umbrella definition of an organism that lives in or on another organism (host) and derives nutrients at the host’s expense. Published today in GigaScience are articles on two parasitic organisms, scabies and on the tapeworm Schistocephalus solidus. Not only are both papers in parasitology, but the way in which these studies are presented showcase a new collaboration with protocols.io that provides a unique means for reporting the Methods that serves to improve reproducibility. Here the authors take advantage of their open access repository of scientific methods and a collaborative protocol-centered platform, and we for the first time have integrated this into our submission, review and publication process. We now also have a groups page on the portal where our methods can be stored.

A great example of how sharing data advances research.

Such self-centered as opposed to research-centered individuals do exist, but I would not malign true parasites by describing them as such, even colloquially.

The days of science data hoarders are numbered and one can only hope that the same is true for the “gatekeepers” of humanities data, manuscripts and artifacts.

The only known contribution of hoarders or “gatekeepers” has been to the retarding of their respective disciplines.

Given the choice of advancing your field along with yourself, or only yourself, which one will you choose?

### First Pirate – Sci-Hub?

Wednesday, February 10th, 2016

Sci-Hub romanticizes itself as:

Sci-Hub the first pirate website in the world to provide mass and public access to tens of millions of research papers. (from the about page)

I agree with:

But Sci-Hub is hardly:

…the first pirate website in the world

I don’t remember the first gate-keeping publisher that went from stealing from the public in print to stealing from the public online.

With careful enough research I’m sure we could track that down but I’m not sure it matters at this point.

What we do know is that academic research is funded by the public, edited and reviewed by volunteers (to the extent it is reviewed at all), and then kept from the vast bulk of humanity for profit and status (gate-keeping).

It’s heady stuff to think of yourself as a bold and swashbuckling pirate, going to stick it “…to the man.”

However, gate-keeping publishers have developed stealing from the public to an art form. If you don’t believe me, take a brief look at the provisions in the Trans-Pacific Partnership that protect traditional publisher interests.

Recovering what has been stolen from the public isn’t theft at all, its restoration!

Allow gate-keeping publishers to slowly, hopefully painfully, wither as opportunities for exploiting the public grow fewer and farther in between.

PS: You need to read: Meet the Robin Hood of Science by Simon Oxenham to get the full background on Sci-Hub and an extraordinary person, Alexandra Elbakyan.

### rOpenSci (updated tutorials) [Learn Something, Write Something]

Monday, January 4th, 2016

rOpenSci has updated 16 of its tutorials!

More are on the way!

Need a detailed walk through of what our packages allow you to do? Click on a package below, quickly install it and follow along. We’re in the process of updating existing package tutorials and adding several more in the coming weeks. If you find any bugs or have comments, drop a note in the comments section or send us an email. If a tutorial is available in multiple languages we indicate that with badges, e.g., (English) (Português).

• alm    Article-level metrics
• antweb    AntWeb data
• bold    Barcode data
• ecoengine    Biodiversity data
• ecoretriever    Retrieve ecological datasets
• elastic    Elasticsearch R client
• fulltext    Text mining client
• geojsonio    GeoJSON/TopoJSON I/O
• gistr    Work w/ GitHub Gists
• internetarchive    Internet Archive client
• lawn    Geospatial Analysis
• rAltmetric    Altmetric.com client
• rbison    Biodiversity data from USGS
• rcrossref    Crossref client
• rebird    eBird client
• rentrez    Entrez client
• rerddap    ERDDAP client
• rfisheries    OpenFisheries.org client
• rgbif    GBIF biodiversity data
• rinat    Inaturalist data
• RNeXML    Create/consume NeXML
• rnoaa    Client for many NOAA datasets
• rplos    PLOS text mining
• rsnps    SNP data access
• rvertnet    VertNet.org biodiversity data
• rWBclimate    World Bank Climate data
• solr    SOLR database client
• spocc    Biodiversity data one stop shop
• taxize    Taxonomic toolbelt
• traits    Trait data
• treebase

Treebase data
• wellknown    Well-known text <-> GeoJSON
• More tutorials on the way.

Good documentation is hard to come by and good tutorials even more so.

Yet, here are rOpenSci you will find thirty-four (34) tutorials and more on the way.

Let’s answer that moronic security saying: See Something, Say Something, with:

Learn Something, Write Something.

### How journals could “add value”

Thursday, May 28th, 2015

How journals could “add value” by Mark Watson.

From the post:

I wrote a piece for Genome Biology, you may have read it, about open science. I said a lot of things in there, but one thing I want to focus on is how journals could “add value”. As brief background: I think if you’re going to make money from academic publishing (and I have no problem if that’s what you want to do), then I think you should “add value”. Open science and open access is coming: open access journals are increasingly popular (and cheap!), preprint servers are more popular, green and gold open access policies are being implemented etc etc. Essentially, people are going to stop paying to access research articles pretty soon – think 5-10 year time frame.

So what can journals do to “add value”? What can they do that will make us want to pay to access them? Here are a few ideas, most of which focus on going beyond the PDF:

Humanities journals and their authors should take heed of these suggestions.

Not applicable in every case but certainly better than “journal editorial board as resume padding.”

### NIH-led effort launches Big Data portal for Alzheimer’s drug discovery

Tuesday, March 10th, 2015

NIH-led effort launches Big Data portal for Alzheimer’s drug discovery

From the post:

A National Institutes of Health-led public-private partnership to transform and accelerate drug development achieved a significant milestone today with the launch of a new Alzheimer’s Big Data portal — including delivery of the first wave of data — for use by the research community. The new data sharing and analysis resource is part of the Accelerating Medicines Partnership (AMP), an unprecedented venture bringing together NIH, the U.S. Food and Drug Administration, industry and academic scientists from a variety of disciplines to translate knowledge faster and more successfully into new therapies.

The opening of the AMP-AD Knowledge Portal￼ and release of the first wave of data will enable sharing and analyses of large and complex biomedical datasets. Researchers believe this approach will ramp up the development of predictive models of Alzheimer’s disease and enable the selection of novel targets that drive the changes in molecular networks leading to the clinical signs and symptoms of the disease.

“We are determined to reduce the cost and time it takes to discover viable therapeutic targets and bring new diagnostics and effective therapies to people with Alzheimer’s. That demands a new way of doing business,” said NIH Director Francis S. Collins, M.D., Ph.D. “The AD initiative of AMP is one way we can revolutionize Alzheimer’s research and drug development by applying the principles of open science to the use and analysis of large and complex human data sets.”

Developed by Sage Bionetworks ￼, a Seattle-based non-profit organization promoting open science, the portal will house several waves of Big Data to be generated over the five years of the AMP-AD Target Discovery and Preclinical Validation Project by multidisciplinary academic groups. The academic teams, in collaboration with Sage Bionetworks data scientists and industry bioinformatics and drug discovery experts, will work collectively to apply cutting-edge analytical approaches to integrate molecular and clinical data from over 2,000 postmortem brain samples.

Big data and open science, now that sounds like a winning combination:

Because no publication embargo is imposed on the use of the data once they are posted to the AMP-AD Knowledge Portal, it increases the transparency, reproducibility and translatability of basic research discoveries, according to Suzana Petanceska, Ph.D., NIA’s program director leading the AMP-AD Target Discovery Project.

“The era of Big Data and open science can be a game-changer in our ability to choose therapeutic targets for Alzheimer’s that may lead to effective therapies tailored to diverse patients,” Petanceska said. “Simply stated, we can work more effectively together than separately.”

Imagine that, academics who aren’t hoarding data for recruitment purposes.

Works for me!

Does it work for you?

### Open science in machine learning

Wednesday, February 26th, 2014

Open science in machine learning by Joaquin Vanschoren, Mikio L. Braun, and Cheng Soon Ong.

Abstract:

We present OpenML and mldata, open science platforms that provides easy access to machine learning data, software and results to encourage further study and application. They go beyond the more traditional repositories for data sets and software packages in that they allow researchers to also easily share the results they obtained in experiments and to compare their solutions with those of others.

From 2 OpenML:

OpenML (http://openml.org) is a website where researchers can share their data sets, implementations and experiments in such a way that they can easily be found and reused by others. It offers a web API through which new resources and results can be submitted automatically, and is being integrated in a number of popular machine learning and data mining platforms, such as Weka, RapidMiner, KNIME, and data mining packages in R, so that new results can be submitted automatically. Vice versa, it enables researchers to easily search for certain results (e.g. evaluations of algorithms on a certain data set), to directly compare certain techniques against each other, and to combine all submitted data in advanced queries.

From 3 mldata:

mldata (http://mldata.org) is a community-based website for the exchange of machine learning data sets. Data sets can either be raw data files or collections of files, or use one of the supported file formats like HDF5 or ARFF in which case mldata looks at meta data contained in the files to display more information. Similar to OpenML, mldata can define learning tasks based on data sets, where mldata currently focuses on supervised learning data. Learning tasks identify which features are used for input and output and also which score is used to evaluate the functions. mldata also allows to create learning challenges by grouping learning tasks together, and lets users submit results in the form of predicted labels which are then automatically evaluated.

Interesting sites.

Does raise the question of who will index the indexers of datasets?

I first saw this in a tweet by Stefano Betolo.

### Scientific Data

Friday, January 31st, 2014

Scientific Data

From the homepage:

Scientific Data is a new open-access, online-only publication for descriptions of scientifically valuable datasets. It introduces a new type of content called the Data Descriptor designed to make your data more discoverable, interpretable and reusable. Scientific Data is currently calling for submissions, and will launch in May 2014.

The Data Descriptors are described in more detail in Metadata associated with Data Descriptor articles to be released under CC0 waiver with this overview:

Box 1. Overview of information in Data Descriptor metadata

Metadata files will be released in the ISA-Tab format, and potentially in other formats in the future, such as Linked Data. An example metadata file is available here, associated with one of our sample Data Descriptors. The information in these files is designed to be a machine-readable supplement to the main Data Descriptor article.

• Article citation information: Manuscript title, Author list, DOI, publication date, etc
• Subject terms: according to NPG’s own subject categorization system
• Annotation of the experimental design and main technologies used: Annotation terms will be derived from community-based ontologies wherever possible. Fields are derived from the ISA framework and include: Design Type, Measurement Type, Factors, Technology Type, and Technology Platform.
• Information about external data records: Names of the data repositories, data record accession or DOIs, and links to the externally-stored data records
• Structured tables that provide a detailed accounting of the experimental samples and data-producing assays, including characteristics of samples or subjects of the study, such as species name and tissue type, described using standard terminologies.

For more information on the value of this structured content and how it relates to the narrative article-like content see this earlier blog post by our Honorary Academic Editor, Susanna-Assunta Sansone.

Nature is taking the lead in this effort, which should bring a sense of security to generations of researchers. Security in knowing Nature takes the rights of authors seriously but also knowing the results will be professional grade.

I am slightly concerned that there is no obvious mechanism for maintenance of “annotation terms” from community-based ontologies or other terms, as terminology changes over time. Change in the vocabulary of for any discipline is too familiar to require citation. As those terms change, so will access to valuable historical resources.

Looking at the Advisory Panel, it is heavily weighted in favor of medical and biological sciences. Is there an existing publication that performs a similar function for data sets from physics, astronomy, botany, etc.?

I first saw this in a tweet by ChemConnector.

### Open Science Leaps Forward! (Johnson & Johnson)

Friday, January 31st, 2014

From the post:

Drug companies tend to be secretive, to say the least, about studies of their medicines. For years, negative trials would not even be published. Except for the U.S. Food and Drug Administration, nobody got to look at the raw information behind those studies. The medical data behind important drugs, devices, and other products was kept shrouded.

Today, Johnson & Johnson is taking a major step toward changing that, not only for drugs like the blood thinner Xarelto or prostate cancer pill Zytiga but also for the artificial hips and knees made for its orthopedics division or even consumer products. “You want to know about Listerine trials? They’ll have it,” says Harlan Krumholz of Yale University, who is overseeing the group that will release the data to researchers.

….

Here’s how the process will work: J&J has enlisted The Yale School of Medicine’s Open Data Access Project (YODA) to review requests from physicians to obtain data from J&J products. Initially, this will only include products from the drug division, but it will expand to include devices and consumer products. If YODA approves a request, raw, anonymized data will be provided to the physician. That includes not just the results of a study, but the results collected for each patient who volunteered for it with identifying information removed. That will allow researchers to re-analyze or combine that data in ways that would not have been previously possible.

….

Scientists can make a request for data on J&J drugs by going to www.clinicaltrialstudytransparency.com.

The ability to “…re-analyze or combine that data in ways that would not have been previously possible…” is the public benefit of Johnson & Johnson’s sharing of data.

With any luck, this will be the start of a general trend among drug companies.

Mappings of the semantics of such data sets should be contributed back to the Yale School of Medicine’s Open Data Access Project (YODA), to further enhance re-use of these data sets.