Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 21, 2016

Parasitic Re-use of Data? Institutionalizing Toadyism.

Filed under: Open Access,Open Data — Patrick Durusau @ 9:11 pm

Data Sharing by Dan L. Longo, M.D., and Jeffrey M. Drazen, M.D, N Engl J Med 2016; 374:276-277 January 21, 2016 DOI: 10.1056/NEJMe1516564.

This editorial in the New England Journal of Medicine advocates the following for re-use of medical data:


How would data sharing work best? We think it should happen symbiotically, not parasitically. Start with a novel idea, one that is not an obvious extension of the reported work. Second, identify potential collaborators whose collected data may be useful in assessing the hypothesis and propose a collaboration. Third, work together to test the new hypothesis. Fourth, report the new findings with relevant coauthorship to acknowledge both the group that proposed the new idea and the investigative group that accrued the data that allowed it to be tested. What is learned may be beautiful even when seen from close up.

I had to check my calendar to make sure April the 1st hadn’t slipped up on me.

This is one of the most bizarre and malignant proposals on data re-use that I have seen.

If you have an original idea, you have to approach other researchers as a suppliant and ask them to benefit from your idea, possibly using their data in new and innovative ways?

Does that smack of a “good old boys/girls” club to you?

If anyone uses the term parasitic or parasite with regard to data re-use, be sure to respond with the question:

How much do dogs in the manger contribute to science?

That phenomena is not unknown in the humanities nor in biblical studies. There was a wave of very disgusting dissertations that began with “…X entrusted me with this fragment of the Dead Sea Scrolls….”

I suppose those professors knew their ability to attract students based on merit versus their hoarding of original text fragments better than I did. You should judge them by their choices.

January 18, 2016

Illusory Truth (Illusory Publication)

Filed under: Crowd Sourcing,Open Access — Patrick Durusau @ 8:20 pm

On Known Unknowns: Fluency and the Neural Mechanisms of Illusory Truth by Wei-Chun Wang, et al. Journal of Cognitive Neuroscience, Posted Online January 14, 2016. (doi:10.1162/jocn_a_00923)

Abstract:

The “illusory truth” effect refers to the phenomenon whereby repetition of a statement increases its likelihood of being judged true. This phenomenon has important implications for how we come to believe oft-repeated information that may be misleading or unknown. Behavioral evidence indicates that fluency or the subjective ease experienced while processing a statement underlies this effect. This suggests that illusory truth should be mediated by brain regions previously linked to fluency, such as the perirhinal cortex (PRC). To investigate this possibility, we scanned participants with fMRI while they rated the truth of unknown statements, half of which were presented earlier (i.e., repeated). The only brain region that showed an interaction between repetition and ratings of perceived truth was PRC, where activity increased with truth ratings for repeated, but not for new, statements. This finding supports the hypothesis that illusory truth is mediated by a fluency mechanism and further strengthens the link between PRC and fluency.

Whether you are crowd sourcing authoring of a topic map, measuring sentiment or having content authored by known authors, you are unlikely to want it populated by illusory truths. That is truths your sources would swear to but that are in fact false (from a certain point of view).

I would like to say more about what this article reports but it is an “illusory publication” that resides behind a pay-wall so I don’t know what is says in fact.

Isn’t that ironic? An article on illusory truth that cannot substantiate its own claims. It can only repeat them.

I first saw this in a tweet by Stefano Bertolo

December 30, 2015

Bloggers! Help Defend The Public Domain – Prepare To Host/Repost “Baby Blue”

Filed under: Intellectual Property (IP),Open Access,Public Data — Patrick Durusau @ 11:36 am

Harvard Law Review Freaks Out, Sends Christmas Eve Threat Over Public Domain Citation Guide by Mike Masnick.

From the post:

In the fall of 2014, we wrote about a plan by public documents guru Carl Malamud and law professor Chris Sprigman, to create a public domain book for legal citations (stay with me, this isn’t as boring as it sounds!). For decades, the “standard” for legal citations has been “the Bluebook” put out by Harvard Law Review, and technically owned by four top law schools. Harvard Law Review insists that this standard of how people can cite stuff in legal documents is covered by copyright. This seems nuts for a variety of reasons. A citation standard is just an method for how to cite stuff. That shouldn’t be copyrightable. But the issue has created ridiculous flare-ups over the years, with the fight between the Bluebook and the open source citation tool Zotero representing just one ridiculous example.

In looking over all of this, Sprigman and Malamud realized that the folks behind the Bluebook had failed to renew the copyright properly on the 10th edition of the book, which was published in 1958, meaning that that version of the book was in the public domain. The current version is the 19th edition, but there is plenty of overlap from that earlier version. Given that, Malamud and Sprigman announced plans to make an alternative to the Bluebook called Baby Blue, which would make use of the public domain material from 1958 (and, I’d assume, some of their own updates — including, perhaps, citations that it appears the Bluebook copied from others).

As soon as “Baby Blue” drops, one expects the Harvard Law Review with its hired thugs Ropes & Gray to swing into action against Carl Malamud and Jon Sprigman.

What if the world of bloggers even those odds just a bit?

What if as soon as Baby Blue hits the streets, law bloggers, law librarian bloggers, free speech bloggers, open access bloggers, and any other bloggers all post Baby Blue to their sites and post it to file repositories?

I’m game.

Are you?

PS: If you think this sounds risky, ask yourself how much racial change would have happened in the South in the 1960’s if Martin Luther King had marched alone?

July 28, 2015

Moving FASTR in the US Senate

Filed under: Open Access — Patrick Durusau @ 6:35 pm

Moving FASTR in the US Senate by Peter Suber.

From the post:

FASTR will go to markup tomorrow in the Senate Homeland Security and Governmental Affairs Committee (HSGAC).

Here’s a recap of my recent call-to-action post on FASTR, with some new details and background.
https://plus.google.com/+PeterSuber/posts/G2uebVhVtBv

FASTR is the strongest bill ever introduced in Congress requiring open access to federally-funded research.

We already have the 2008 NIH policy, but it only covers one agency. We already have the 2013 Obama directive requiring about two dozen federal agencies to adopt OA mandates, but the next President could rescind it.

FASTR would subsume and extend the NIH policy. FASTR would solidify the Obama directive by grounding these agency policies in legislation. Moreover, FASTR would strengthen the NIH policy and Obama directive by requiring reuse rights or open licensing. It has bipartisan support in both the House and the Senate.

FASTR has been introduced in two sessions of Congress (February 2013 and March 2015), and its predecessor, FRPAA (Federal Research Public Access Act), was introduced in three (May 2006, April 2010, February 2012). Neither FASTR nor FRPAA has gotten to the stage of markup and a committee vote. That’s why tomorrow’s markup is so big.

For the reasons why FASTR is stronger than the Obama directive, see my 2013 article comparing the two.
http://dash.harvard.edu/handle/1/10528299.


For steps you can take to support FASTR, see the action pages from the Electronic Frontier Foundation (EFF) and Scholarly Publishing and Academic Resources Coalition (SPARC).
https://action.eff.org/o/9042/p/dia/action/public/?action_KEY=9061
http://www.sparc.arl.org/advocacy/national/fastr

Even though I will be in a day long teleconference tomorrow, I will be contacting my Senators to support FASTR.

How about you?

June 17, 2015

Put Your Open Data Where Your Mouth Is (Deadline for Submission: 28 June 2015)

Filed under: Education,Open Access,Open Data — Patrick Durusau @ 4:03 pm

Open Data as Open Educational Resources – Case Studies: Call for Participation

From the call:

The Context:

Open Data is invaluable to support researchers, but we contend that open datasets used as Open Educational Resources (OER) can also be invaluable asset for teaching and learning. The use of real datasets can enable a series of opportunities for students to collaborate across disciplines, to apply quantitative and qualitative methods, to understand good practices in data retrieval, collection and analysis, to participate in research-based learning activities which develop independent research, teamwork, critical and citizenship skills. (For more detail please see: http://education.okfn.org/the-21st-centurys-raw-material-using-open-data-as-open-educational-resources)

The Call:

We are inviting individuals and teams to submit case studies describing experiences in the use of open data as open educational resources. Proposals are open to everyone who would like to promote good practices in pedagogical uses of open data in an educational context. The selected case studies will be published in a open e-book (CC_BY_NC_SA) hosted by Open Knowledge Foundation Open Education Group http://education.okfn.org by mid September 2015.

Participation in the call requires the submission of a short proposal describing the case study (of around 500 words), all proposal must be written in English, however, the selected authors will have the opportunity to submit the case both in English and another language, as our aim is to support the adoption of good practices in the use of open data in different countries.

Key dates:

  • Deadline for submission of proposals (approx. 500 words): 28th June
  • Notification to accepted proposals: 5th of July
  • Draft case study submitted for review (1500 – 2000 words): 26th of July
  • Publication-ready deadline: 16th of August
  • Publication date: September 2015

If you have any questions or comments please contact us by filling the “contact the editors” box at the end of this form

Javiera Atenas https://twitter.com/jatenas
Leo Havemann https://twitter.com/leohavemann

http://www.idea-space.eu/idea/72/info

Use of open data implies a readiness to further the use of open data. One way to honor that implied obligation is to share with others your successes and just as importantly, any failures in the use of open data in an educational context.

All too often we hear only a steady stream of success stories and we wonder where others drew such perfect students, assistants, and clean data that underlies their success. Never realizing that their students, assistants and data are no better and no worse than ours. The regular mis-steps, false starts, outright wrong paths are omitted in the story telling. For times’ sake no doubt.

If you can, do participate in this effort, even if you only have a success story to relate. 😉

June 11, 2015

Don’t Think Open Access Is Important?…

Filed under: Open Access,Open Data — Patrick Durusau @ 2:39 pm

Don’t Think Open Access Is Important? It Might Have Prevented Much Of The Ebola Outbreak by Mike Masnick

From the post:

For years now, we’ve been talking up the importance of open access to scientific research. Big journals like Elsevier have generally fought against this at every point, arguing that its profits are more important that some hippy dippy idea around sharing knowledge. Except, as we’ve been trying to explain, it’s that sharing of knowledge that leads to innovation and big health breakthroughs. Unfortunately, it’s often pretty difficult to come up with a concrete example of what didn’t happen because of locked up knowledge. And yet, it appears we have one new example that’s rather stunning: it looks like the worst of the Ebola outbreak from the past few months might have been avoided if key research had been open access, rather than locked up.

That, at least, appears to be the main takeaway of a recent NY Times article by the team in charge of drafting Liberia’s Ebola recovery plan. What they found was that the original detection of Ebola in Liberia was held up by incorrect “conventional wisdom” that Ebola was not present in that part of Africa:

Mike goes on to point out knowledge about Ebola in Liberia was published in pay-per-view medical journals, which would have been prohibitively expensive for Liberian doctors.

He has a valid point but how often do primary care physicians consult research literature? And would they have the search chops to find research from 1982?

I am very much in favor of open access but open access on its own doesn’t bring about access or meaningful use of information once accessed.

June 4, 2015

Open Review: Grammatical theory:…

Filed under: Grammar,Linguistics,Open Access,Peer Review — Patrick Durusau @ 2:22 pm

Open Review: Grammatical theory: From transformational grammar to constraint-based approaches by Stefan Müller (Author).

From the webpage:

This book is currently at the Open Review stage. You can help the author by making comments on the preliminary version: Part 1, Part 2. Read our user guide to get acquainted with the software.

This book introduces formal grammar theories that play a role in current linguistics or contributed tools that are relevant for current linguistic theorizing (Phrase Structure Grammar, Transformational Grammar/Government & Binding, Mimimalism, Generalized Phrase Structure Grammar, Lexical Functional Grammar, Categorial Grammar, Head-Driven Phrase Structure Grammar, Construction Grammar, Tree Adjoining Grammar, Dependency Grammar). The key assumptions are explained and it is shown how each theory treats arguments and adjuncts, the active/passive alternation, local reorderings, verb placement, and fronting of constituents over long distances. The analyses are explained with German as the object language.

In a final part of the book the approaches are compared with respect to their predictions regarding language acquisition and psycholinguistic plausibility. The nativism hypothesis that claims that humans posses genetically determined innate language-specific knowledge is examined critically and alternative models of language acquisition are discussed. In addition this more general part addresses issues that are discussed controversially in current theory building such as the question whether flat or binary branching structures are more appropriate, the question whether constructions should be treated on the phrasal or the lexical level, and the question whether abstract, non-visible entities should play a role in syntactic analyses. It is shown that the analyses that are suggested in the various frameworks are often translatable into each other. The book closes with a section that shows how properties that are common to all languages or to certain language classes can be captured.

(emphasis in the original)

Part of walking the walk of open access means participating in open reviews as your time and expertise permits.

Even if grammar theory isn’t your field, professionally speaking, it will be good mental exercise to see another view of the world of language.

I am intrigued by the suggestion “It shows that the analyses that are suggested in the various frameworks are often translatable into each other.” Shades of the application of category theory to linguistics? Mappings of identifications?

Reputation instead of obligation:…

Filed under: Open Access,Open Data,Transparency — Patrick Durusau @ 10:16 am

Reputation instead of obligation: forging new policies to motivate academic data sharing by Sascha Friesike, Benedikt Fecher, Marcel Hebing, and Stephanie Linek.

From the post:

Despite strong support from funding agencies and policy makers academic data sharing sees hardly any adoption among researchers. Current policies that try to foster academic data sharing fail, as they try to either motivate researchers to share for the common good or force researchers to publish their data. Instead, Sascha Friesike, Benedikt Fecher, Marcel Hebing, and Stephanie Linek argue that in order to tap into the vast potential that is attributed to academic data sharing we need to forge new policies that follow the guiding principle reputation instead of obligation.

In 1996, leaders of the scientific community met in Bermuda and agreed on a set of rules and standards for the publication of human genome data. What became known as the Bermuda Principles can be considered a milestone for the decoding of our DNA. These principles have been widely acknowledged for their contribution towards an understanding of disease causation and the interplay between the sequence of the human genome. The principles shaped the practice of an entire research field as it established a culture of data sharing. Ever since, the Bermuda Principles are used to showcase how the publication of data can enable scientific progress.

Considering this vast potential, it comes as no surprise that open research data finds prominent support from policy makers, funding agencies, and researchers themselves. However, recent studies show that it is hardly ever practised. We argue that the academic system is a reputation economy in which researchers are best motivated to perform activities if those pay in the form of reputation. Therefore, the hesitant adoption of data sharing practices can mainly be explained by the absence of formal recognition. And we should change this.

(emphasis in the original)

Understanding what motivates researchers to share data is an important step towards encouraging data sharing.

But at the same time, would we say that every researcher is as good as every other researcher at preparing data for sharing? At documenting data for sharing? At doing any number of tasks that aren’t really research, but just as important in order to share data?

Rather than focusing exclusively on researchers, funders should fund projects to include data sharing specialists who have the skills and interests necessary to effectively share data as part of a project’s output. Their reputations will be more closely tied to the successful sharing of data and researchers would gain in reputation for the high quality data that is shared. A much better fit for the recommendation of the authors.

Or to put it differently, lecturing researchers on how they should spend their limited time and resources to satisfy your goals, isn’t going to motivate anyone. “Pay the man!” (Richard Prior from Silver Streak)

May 28, 2015

How journals could “add value”

Filed under: Open Access,Open Data,Open Science,Publishing — Patrick Durusau @ 1:57 pm

How journals could “add value” by Mark Watson.

From the post:

I wrote a piece for Genome Biology, you may have read it, about open science. I said a lot of things in there, but one thing I want to focus on is how journals could “add value”. As brief background: I think if you’re going to make money from academic publishing (and I have no problem if that’s what you want to do), then I think you should “add value”. Open science and open access is coming: open access journals are increasingly popular (and cheap!), preprint servers are more popular, green and gold open access policies are being implemented etc etc. Essentially, people are going to stop paying to access research articles pretty soon – think 5-10 year time frame.

So what can journals do to “add value”? What can they do that will make us want to pay to access them? Here are a few ideas, most of which focus on going beyond the PDF:

Humanities journals and their authors should take heed of these suggestions.

Not applicable in every case but certainly better than “journal editorial board as resume padding.”

January 27, 2015

Nature: A recap of a successful year in open access, and introducing CC BY as default

Filed under: Open Access,Open Data,Publishing — Patrick Durusau @ 1:57 pm

A recap of a successful year in open access, and introducing CC BY as default by Carrie Calder, the Director of Strategy for Open Research, Nature Publishing Group/Palgrave Macmillan.

From the post:

We’re pleased to start 2015 with an announcement that we’re now using Creative Commons Attribution license CC BY 4.0 as default. This will apply to all of the 18 fully open access journals Nature Publishing Group owns, and will also apply to any future titles we launch. Two society- owned titles have introduced CC BY as default today and we expect to expand this in the coming months.

This follows a transformative 2014 for open access and open research at Nature Publishing Group. We’ve always been supporters of new technologies and open research (for example, we’ve had a liberal self-archiving policy in place for ten years now. In 2013 we had 65 journals with an open access option) but in 2014 we:

  • Built a dedicated team of over 100 people working on Open Research across journals, books, data and author services
  • Conducted research on whether there is an open access citation benefit, and researched authors’ views on OA
  • Introduced the Nature Partner Journal series of high-quality open access journals and announced our first ten NPJs
  • Launched Scientific Data, our first open access publication for Data Descriptors
  • And last but not least switched Nature Communications to open access, creating the first Nature-branded fully open access journal

We did this not because it was easy (trust us, it wasn’t always) but because we thought it was the right thing to do. And because we don’t just believe in open access; we believe in driving open research forward, and in working with academics, funders and other publishers to do so. It’s obviously making a difference already. In 2013, 38% of our authors chose to publish open access immediately upon publication – in 2014, this percentage rose to 44%. Both Scientific Reports and Nature Communications had record years in terms of submissions for publication.

Open access is on its way to becoming the expected model for publishing. That isn’t to say that there aren’t economies and kinks to be worked out, but the fundamental principles of open access have been widely accepted.

Not everywhere of course. There are areas of scholarship that think self-isolation makes them important. They shun open access as an attack on their traditions of “Doctor Fathers” and access to original materials as a privilege. Strategies that make them all the more irrelevant in the modern world. Pity because there is so much they could contribute to the public conversation. But a public conversation means you are not insulated from questions that don’t accept “because I say so” as an adequate answer.

If you are working in such an area or know of one, press for emulation of the Nature and the many other efforts to provide open access to both primary and secondary materials. There are many areas of the humanities that already follow that model, but not all. Let’s keep pressing until open access is the default for all disciplines.

Kudos to Nature for their ongoing efforts on open access.

I first saw the news about the post about Nature in a tweet by Ethan White.

December 6, 2014

Why my book can be downloaded for free

Filed under: Open Access,Perl,Publishing — Patrick Durusau @ 6:49 am

Why my book can be downloaded for free by Mark Dominus.

From the post:

People are frequently surprised that my book, Higher-Order Perl, is available as a free download from my web site. They ask if it spoiled my sales, or if it was hard to convince the publisher. No and no.

I sent the HOP proposal to five publishers, expecting that two or three would turn it down, and that I would pick from the remaining two or three, but somewhat to my dismay, all five offered to publish it, and I had to decide who.

One of the five publishers was Morgan Kaufmann. I had never heard of Morgan Kaufmann, but one day around 2002 I was reading the web site of Philip Greenspun. Greenspun was incredibly grouchy. He found fault with everything. But he had nothing but praise for Morgan Kaufmann. I thought that if Morgan Kaufmann had pleased Greenspun, who was nearly impossible to please, then they must be really good, so I sent them the proposal. (They eventually published the book, and did a superb job; I have never regretted choosing them.)

But not only Morgan Kaufmann but four other publishers had offered to publish the book. So I asked a number of people for advice. I happened to be in London one week and Greenspun was giving a talk there, which I went to see. After the talk I introduced myself and asked for his advice about picking the publisher.

Access to “free” electronic versions is on its way to becoming a norm, at least with some computer science publishers. Cambridge University Press, CUP, with Data Mining and Analysis: Fundamental Concepts and Algorithms and Basic Category Theory comes to mind.

Other publishers with similar policies? Yes, I know there are CS publishers who want to make free with content of others, not so much with their own. Not the same thing.

I first saw this in a tweet by Julia Evans.

December 2, 2014

GiveDirectly (Transparency)

Filed under: Open Access,Open Data,Transparency — Patrick Durusau @ 3:53 pm

GiveDirectly

From the post:

Today we’re launching a new website for GiveDirectly—the first major update since www.givedirectly.org went live in 2011.

Our main goal in reimagining the site was to create radical transparency into what we do and how well we do it. We’ve invested a lot to integrate cutting-edge technology into our field model so that we have real-time data to guide internal management. Why not open up that same data to the public? All we needed were APIs to connect the website and our internal field database (which is powered by our technology partner, Segovia).

Transparency is of course a non-profit buzzword, but I usually see it used in reference to publishing quarterly or annual reports, packaged for marketing purposes—not the kind of unfiltered data and facts I want as a donor. We wanted to use our technology to take transparency to an entirely new level.

Two features of the new site that I’m most excited about:

First, you can track how we’re doing on our most important performance metrics, at the same time we do. For example, the performance chart on the home page mirrors the dashboard we use internally to track performance in the field. If recipients aren’t understanding our program, you’ll learn about it when we do. If the follow-up team falls behind or outperforms, metrics will update accordingly. We want to be honest about our successes and failures alike.

Second, you can verify our claims about performance. We don’t think you should have to trust that we’re giving you accurate information. Each “Verify this” tag downloads a csv file with the underlying raw data (anonymized). Every piece of data is generated by a GiveDirectly staff member’s work in the field and is stored using proprietary software; it’s our end-to-end model in action. Explore the data for yourself and absolutely question us on what you find.

Tis the season for soliciting donations, by every known form of media.

Suggestion: Copy and print out this response:

___________________________, I would love to donate to your worthy cause but before I do, please send a weblink to the equivalent of: http://www.givedirectly.org. Wishing you every happiness this holiday season.

___________________________

Where no response or no equivalent website = no donation.

I first saw this in a tweet by Stefano Bertolo.

Nature makes all articles free to view [pay-to-say]

Filed under: Open Access,Publishing — Patrick Durusau @ 11:59 am

Nature makes all articles free to view by Richard Van Noorden.

From the post:

All research papers from Nature will be made free to read in a proprietary screen-view format that can be annotated but not copied, printed or downloaded, the journal’s publisher Macmillan announced on 2 December.

The content-sharing policy, which also applies to 48 other journals in Macmillan’s Nature Publishing Group (NPG) division, including Nature Genetics, Nature Medicine and Nature Physics, marks an attempt to let scientists freely read and share articles while preserving NPG’s primary source of income — the subscription fees libraries and individuals pay to gain access to articles.

ReadCube, a software platform similar to Apple’s iTunes, will be used to host and display read-only versions of the articles’ PDFs. If the initiative becomes popular, it may also boost the prospects of the ReadCube platform, in which Macmillan has a majority investment.

Annette Thomas, chief executive of Macmillan Science and Education, says that under the policy, subscribers can share any paper they have access to through a link to a read-only version of the paper’s PDF that can be viewed through a web browser. For institutional subscribers, that means every paper dating back to the journal’s foundation in 1869, while personal subscribers get access from 1997 on.

Anyone can subsequently repost and share this link. Around 100 media outlets and blogs will also be able to share links to read-only PDFs. Although the screen-view PDF cannot be printed, it can be annotated — which the publisher says will provide a way for scientists to collaborate by sharing their comments on manuscripts. PDF articles can also be saved to a free desktop version of ReadCube, similarly to how music files can be saved in iTunes.

I am hopeful that Macmillan will discover that allowing copying and printing are no threat to its income stream. Both are means of advertising for its journal at the expense of the user who copies a portion of the text for a citation or shares a printed copy with a colleague. Advertising paid for by users should be considered as a plus.

The annotation step is a good one, although I would modify it in some respects. First I would make all articles accessible by default with annotation capabilities. Then I would grant anyone who registers say 12 comments per year for free and offer a lower-than-subscription-cost option for more than twelve comments on articles.

If there is one thing I suspect users would be willing to pay for is the right to response to others in their fields. Either to response to articles and/or to other comments. Think of it as a pay-to-say market strategy.

It could be an “additional” option to current institutional and personal subscriptions and thus an entirely new revenue stream for Macmillan.

To head off expected objections by “free speech” advocates, I note that no journal publishes every letter to the editor. The right to free speech has never included the right to be heard on someone else’s dime. Annotation of Nature is on Macmillan’s dime.

November 30, 2014

Gates Foundation champions open access

Filed under: Funding,Open Access — Patrick Durusau @ 11:26 am

Gates Foundation champions open access by Rebecca Trager.

From the post:

The Bill & Melinda Gates Foundation, based in Washington, US, has adopted a new policy that requires free, unrestricted access and reuse of all peer-reviewed published research that the foundation funds, including any underlying data sets.

The policy, announced last week, applies to all of the research that the Gates Foundation funds entirely or partly, and will come into effect on 1 January, 2015. Specifically, the new rule dictates that published research be made available under a ‘Creative Commons’ generic license, which means that it can be copied, redistributed, amended and commercialised. During a two-year transition period, the foundation will allow publishers a 12 month embargo period on access to their research papers and data sets.

If other science and humanities sponsors follow Gates, nearly universal open access will be an accomplished fact by the end of the decade.

There will be wailing and gnashing of teeth by those who expected protectionism to further their careers at the expense of the public. I can bear their discomfort with a great deal of equanimity. Can’t you?

November 28, 2014

Open Access and the Humanities…

Filed under: Funding,Open Access — Patrick Durusau @ 7:11 pm

Open Access and the Humanities: Contexts, Controversies and the Future by Martin Paul Eve.

From the description:

If you work in a university, you are almost certain to have heard the term ‘open access’ in the past couple of years. You may also have heard either that it is the utopian answer to all the problems of research dissemination or perhaps that it marks the beginning of an apocalyptic new era of ‘pay-to-say’ publishing. In this book, Martin Paul Eve sets out the histories, contexts and controversies for open access, specifically in the humanities. Broaching practical elements alongside economic histories, open licensing, monographs and funder policies, this book is a must-read for both those new to ideas about open-access scholarly communications and those with an already keen interest in the latest developments for the humanities.

Open access to a book on open access!

I was very amused by Gary F. Daught’s comment on the title:

“Open access for scholarly communication in the Humanities faces some longstanding cultural/social and economic challenges. Deep traditions of scholarly authority, reputation and vetting, relationships with publishers, etc. coupled with relatively shallow pockets in terms of funding (at least compared to the Sciences) and perceptions that the costs associated with traditional modes of scholarly communication are reasonable (at least compared to the Sciences) can make open access a hard sell. Still, there are new opportunities and definite signs of change. Among those at the forefront confronting these challenges while exploring open access opportunities for the Humanities is Martin Paul Eve.”

In part because Gary worded his description of the humanities as: “Deep traditions of scholarly authority, reputation and vetting, relationships with publishers,…” which is true, but is a nice way of saying:

Controlling access to the Dead Sea Scrolls was a great way to attract graduate students to professors and certain universities.

Controlling access to the Dead Sea Scrolls was a great way to avoid criticism of work by denying others access to the primary materials.

Substitute current access issues to data, both in the humanities and sciences for “Dead Sea Scrolls” and you have a similar situation.

I mention the Dead Sea Scroll case because after retarding scholarship for decades, the materials are more or less accessible now. The sky hasn’t fallen, newspapers aren’t filled with bad translations, salvation hasn’t been denied (so far as we know), to anyone holding incorrect theological positions due to bad work on the Dead Sea Scrolls.

A good read but I have to differ with Martin on his proposed solution to the objection that open access has no peer review.

Unfortunately Martin treats concerns about peer review as though they were rooted in empirical experience such that contrary experimental results will lead to a different conclusion.

I fear that Martin overlooks that peer review is a religious belief and can no more be diminished by contrary evidence than transubstantiation. Consider all the peer review scandals you have read or heard about in the past year. Has that diminished anyone’s faith in peer review? What about the fact that in the humanities, up to 98% of all monographs remain uncited after a decade?

Assuming peer review is supposed to assure the quality of publishing, a reasonable person would conclude that 98% of what has been published and is uncited, either wasn’t worth writing about and/or peer review was no guarantor of quality.

The key to open access is for publishing and funding organizations to mandate open access to data used in research and/or publication. No exceptions, no on request but deposits in open access archives.

Scholars who have self-assessed themselves as needing the advantages of non-open access data will be unhappy but I can’t say that matters all that much to me.

You?

I first saw this in a tweet by Martin Haspelmath.

November 12, 2014

Preventing Future Rosetta “Tensions”

Filed under: Astroinformatics,Open Access,Open Data — Patrick Durusau @ 2:34 pm

Tensions surround release of new Rosetta comet data by Eric Hand.

From the post:


For the Rosetta mission, there is an explicit tension between satisfying the public with new discoveries and allowing scientists first crack at publishing papers based on their own hard-won data. “There is a tightrope there,” says Taylor, who’s based at ESA’s European Space Research and Technology Centre (ESTEC) in Noordwijk, the Netherlands. But some ESA officials are worried that the principal investigators for the spacecraft’s 11 instruments are not releasing enough information. In particular, the camera team, led by principal investigator Holger Sierks, has come under special criticism for what some say is a stingy release policy. “It’s a family that’s fighting, and Holger is in the middle of it, because he holds the crown jewels,” says Mark McCaughrean, an ESA senior science adviser at ESTEC.

Allowing scientists to withhold data for some period is not uncommon in planetary science. At NASA, a 6-month period is typical for principal investigator–led spacecraft, such as the MESSENGER mission to Mercury, says James Green, the director of NASA’s planetary science division in Washington, D.C. However, Green says, NASA headquarters can insist that the principal investigator release data for key media events. For larger strategic, or “flagship,” missions, NASA has tried to release data even faster. The Mars rovers, such as Curiosity, have put out images almost as immediately as they are gathered.

Sierks, of the Max Planck Institute for Solar System Research in Göttingen, Germany, feels that the OSIRIS team has already been providing a fair amount of data to the public—about one image every week. Each image his team puts out is better than anything that has ever been seen before in comet research, he says. Furthermore, he says other researchers, unaffiliated with the Rosetta team, have submitted papers based on these released images, while his team has been consumed with the daily task of planning the mission. After working on OSIRIS since 1997, Sierks feels that his team should get the first shot at using the data.

“Let’s give us a chance of a half a year or so,” he says. He also feels that his team has been pressured to release more data than other instruments. “Of course there is more of a focus on our instrument,” which he calls “the eyes of the mission.”

What if there was another solution to the Rosetta “tensions” than 1) privilege researchers with six (6) months exclusive access to data or 2) release data as soon as gathered?

I am sure everyone can gather arguments for one or the other of those sides but either gathering or repeating them isn’t going to move the discussion forward.

What if there were an agreed upon registry for data sets (not a repository but registry) where researchers could register anticipated data and, when acquired, the date the data was deposited to a public repository and a list of researchers entitled to publish using that data?

The set of publications in most subject areas are rather small and if they agreed to not accept or review papers based upon registered data, for six (6) months or some other agreed upon period, that would enable researchers to release data as acquired and yet protect their opportunity for first use of the data for publication purposes.

This simple sketch leaves a host of details to explore and answer but registering data for publication delay could answer the concerns that surround publicly funded data in general.

Thoughts?

October 28, 2014

On Excess: Susan Sontag’s Born-Digital Archive

Filed under: Archives,Library,Open Access,Preservation — Patrick Durusau @ 6:23 pm

On Excess: Susan Sontag’s Born-Digital Archive by Jeremy Schmidt & Jacquelyn Ardam.

From the post:


In the case of the Sontag materials, the end result of Deep Freeze and a series of other processing procedures is a single IBM laptop, which researchers can request at the Special Collections desk at UCLA’s Research Library. That laptop has some funky features. You can’t read its content from home, even with a VPN, because the files aren’t online. You can’t live-Tweet your research progress from the laptop — or access the internet at all — because the machine’s connectivity features have been disabled. You can’t copy Annie Leibovitz’s first-ever email — “Mat and I just wanted to let you know we really are working at this. See you at dinner. xxxxxannie” (subject line: “My first Email”) — onto your thumb drive because the USB port is locked. And, clearly, you can’t save a new document, even if your desire to type yourself into recent intellectual history is formidable. Every time it logs out or reboots, the laptop goes back to ground zero. The folders you’ve opened slam shut. The files you’ve explored don’t change their “Last Accessed” dates. The notes you’ve typed disappear. It’s like you were never there.

Despite these measures, real limitations to our ability to harness digital archives remain. The born-digital portion of the Sontag collection was donated as a pair of external hard drives, and that portion is composed of documents that began their lives electronically and in most cases exist only in digital form. While preparing those digital files for use, UCLA archivists accidentally allowed certain dates to refresh while the materials were in “thaw” mode; the metadata then had to be painstakingly un-revised. More problematically, a significant number of files open as unreadable strings of symbols because the software with which they were created is long out of date. Even the fully accessible materials, meanwhile, exist in so many versions that the hapless researcher not trained in computer forensics is quickly overwhelmed.

No one would dispute the need for an authoritative copy of Sontag‘s archive, or at least as close to authoritative as humanly possible. The heavily protected laptop makes sense to me, assuming that the archive considers that to be the authoritative copy.

What has me puzzled, particularly since there are binary formats not recognized in the archive, is why isn’t a non-authoritative copy of the archive online. Any number of people may still possess the software necessary to read the files and/or be able to decrypt the file formats. That would be a net gain to the archive if recovery could be practiced on a non-authoritative copy. They may well encounter such files in the future.

After searching the Online Archive of California, I did encounter Finding Aid for the Susan Sontag papers, ca. 1939-2004 which reports:

Restrictions Property rights to the physical object belong to the UCLA Library, Department of Special Collections. Literary rights, including copyright, are retained by the creators and their heirs. It is the responsibility of the researcher to determine who holds the copyright and pursue the copyright owner or his or her heir for permission to publish where The UC Regents do not hold the copyright.

Availability Open for research, with following exceptions: Boxes 136 and 137 of journals are restricted until 25 years after Susan Sontag’s death (December 28, 2029), though the journals may become available once they are published.

Unfortunately, this finding aid does not mention Sontag’s computer or the transfer of the files to a laptop. A search of Melvyl (library catalog) finds only one archival collection and that is the one mentioned above.

I have written to the special collections library for clarification and will update this post when an answer arrives.

I mention this collection because of Sontag’s importance for a generation and because digital archives will soon be the majority of cases. One hopes the standard practice will be to donate all rights to an archival repository to insure its availability to future generations of scholars.

October 4, 2014

SIGGRAPH 2014 Open Access Conference Content

Filed under: Open Access — Patrick Durusau @ 10:23 am

SIGGRAPH 2014 Open Access Conference Content

From the webpage:

Starting with the SIGGRAPH 2014 conference, SIGGRAPH will not produce any printed or DVD-based documentation. Conference content (technical papers, course notes, etc.) will be available for free in the ACM Digital Library starting two weeks prior to the start of the conference, and will remain available for free until one week after the end of the conference. After the one-month “free access” period, and until the start of the next SIGGRAPH conference, this content will be available for free exclusively through the open access links below. ACM SIGGRAPH members always have free access to all SIGGRAPH-sponsored materials in the ACM Digital Library.

Bizarre wording but I can attest that SIGGRAPH Proceedings from 2013 are open access and SIGGRAPH 2014 commenced on 10 August 2014.

Ask the next slate of ACM candidates if the leading CS organization in the world will become open access without qualifiers and exceptions?

September 9, 2014

PLOS Resources on Ebola

Filed under: Bioinformatics,Open Access,Open Data — Patrick Durusau @ 7:09 pm

PLOS Resources on Ebola by Virginia Barbour and PLOS Collections.

From the post:

The current Ebola outbreak in West Africa probably began in Guinea in 2013, but it was only recognized properly in early 2014 and shows, at the time of writing, no sign of subsiding. The continuous human-to-human transmission of this new outbreak virus has become increasingly worrisome.

Analyses thus far of this outbreak mark it as the most serious in recent years and the effects are already being felt far beyond those who are infected and dying; whole communities in West Africa are suffering because of its negative effects on health care and other infrastructures. Globally, countries far removed from the outbreak are considering their local responses, were Ebola to be imported; and the ripple effects on the normal movement of trade and people are just becoming apparent.

A great collection of PLOS resources on Ebola.

Even usual closed sources are making Ebola information available for free:

Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak (Science DOI: 10.1126/science.1259657) This is the gene sequencing report that establishes that one (1) person ate infected bush meat and is the source of all the following Ebola infections.

So much for needing highly specialized labs to “weaponize” biological agents. One infection is likely to result in > 20,000 deaths. You do the math.

I first saw this in a tweet by Alex Vespignani.

July 16, 2014

FDA Recall Data

Filed under: Government Data,Open Access — Patrick Durusau @ 6:53 pm

OpenFDA Provides Ready Access to Recall Data by Taha A. Kass-Hout.

From the post:

Every year, hundreds of foods, drugs, and medical devices are recalled from the market by manufacturers. These products may be labeled incorrectly or might pose health or safety issues. Most recalls are voluntary; in some cases they may be ordered by the U.S. Food and Drug Administration. Recalls are reported to the FDA, and compiled into its Recall Enterprise System, or RES. Every week, the FDA releases an enforcement report that catalogues these recalls. And now, for the first time, there is an Application Programming Interface (API) that offers developers and researchers direct access to all of the drug, device, and food enforcement reports, dating back to 2004.

The recalls in this dataset provide an illuminating window into both the safety of individual products and the safety of the marketplace at large. Recent reports have included such recalls as certain food products (for not containing the vitamins listed on the label), a soba noodle salad (for containing unlisted soy ingredients), and a pain reliever (for not following laboratory testing requirements).

You will get warnings that this data is “not for clinical use.”

Sounds like a treasure trove of data if you are looking for products still being sold despite being recalled.

Or if you want to advertise for “victims” of faulty products that have been recalled.

I think both of those are non-clinical uses. 😉

June 18, 2014

Elsevier open access mathematics

Filed under: Mathematics,Open Access — Patrick Durusau @ 10:20 am

Elsevier open access mathematics

From the webpage:

Elsevier has opened up the back archives of their mathematics journals. All articles older than 4 years are available under a license [1] [2]. This license is compatible with non-commercial redistribution, and so we have collected the PDFs and made them available here.

Each of the links below is for a torrent file; opening this in a suitable client (e.g.Transmission) will download that file. Unzipping that file creates a directory with all the PDFs, along with a copy of the relevant license file.

Although Elsevier opens their archives on a rolling basis, the collections below only contain articles up to 2009. We anticipate adding yearly updates.

You can download a zip file containing all of the torrents below, if you’d like the entire collection. You’ll need about 40GB of free space.

Excellent!

Occurs to me this corpus is suitable for testing indexing and navigation of mathematical literature.

Is your favorite mathematics publisher following Elsevier’s lead?

I first saw this in a tweet by Stephen A. Goss.

June 2, 2014

openFDA

Filed under: Government,Government Data,Medical Informatics,Open Access,Open Data — Patrick Durusau @ 4:30 pm

openFDA

Not all the news out of government is bad.

Consider openFDA which is putting

More than 3 million adverse drug event reports at your fingertips.

From the “about” page:

OpenFDA is an exciting new initiative in the Food and Drug Administration’s Office of Informatics and Technology Innovation spearheaded by FDA’s Chief Health Informatics Officer. OpenFDA offers easy access to FDA public data and highlight projects using these data in both the public and private sector to further regulatory or scientific missions, educate the public, and save lives.

What does it do?

OpenFDA provides API and raw download access to a number of high-value structured datasets. The platform is currently in public beta with one featured dataset, FDA’s publically available drug adverse event reports.

In the future, openFDA will provide a platform for public challenges issued by the FDA and a place for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data.

We’re currently focused on working on datasets in the following areas:

  • Adverse Events: FDA’s publically available drug adverse event reports, a database that contains millions of adverse event and medication error reports submitted to FDA covering all regulated drugs.
  • Recalls (coming soon): Enforcement Report and Product Recalls Data, containing information gathered from public notices about certain recalls of FDA-regulated products
  • Documentation (coming soon): Structured Product Labeling Data, containing detailed product label information on many FDA-regulated product

We’ll be releasing a number of updates and additional datasets throughout the upcoming months.

OK, I’m Twitter follower #522 @openFDA.

What’s your @openFDA number?

A good experience, i.e., people making good use of released data, asking for more data, etc., is what will drive more open data. Make every useful government data project count.

May 20, 2014

Chinese Open Access

Filed under: Open Access — Patrick Durusau @ 10:19 am

Chinese agencies announce open-access policies by Richard Van Noorden. (Nature | doi:10.1038/nature.2014.15255

From the post:

China has officially joined the international push to make research papers free to read. On 15 May, the National Natural Science Foundation of China (NSFC), one of the country’s major basic-science funding agencies, and the Chinese Academy of Sciences (CAS), which funds and conducts research at more than 100 institutions, announced that researchers they support should deposit their papers into online repositories and make them publicly accessible within 12 months of publication.

That’s certainly good news for data aggregation providers, using topic maps or no, but Richard’s post closes on an odd note:

Another unresolved issue is whether the 12-month grace period should also apply to articles in the social sciences and humanities — which are within the CAS’s purview but face “special challenges”, Zhang says.

Richard identifies Zhang as:

Xiaolin Zhang, director of the National Science Library at the CAS in Beijing, says that another major research-funding agency, the national ministry of science and technology, is also researching open-access policies.

Perhaps Richard will follow up with Zhang on what he means by “special challenges” for articles in the social sciences and the humanities.

I can’t imagine anyone thinking they are likely to obtain a patent based on social science or humanities research.

Open access would increase the opportunity for people outside of major academic institutions to read the latest social science and humanities research. I don’t see the downside to greater access to such articles.

Do you?

April 9, 2014

IRS Data?

Filed under: Government,Government Data,Open Access,Open Data — Patrick Durusau @ 7:45 pm

New, Improved IRS Data Available on OpenSecrets.org by Robert Maguire.

From the post:

Among the more than 160,000 comments the IRS received recently on its proposed rule dealing with candidate-related political activity by 501(c)(4) organizations, the Center for Responsive Politics was the only organization to point to deficiencies in a critical data set the IRS makes available to the public.

This month, the IRS released the newest version of that data, known as 990 extracts, which have been improved considerably. Now, the data is searchable and browseable on OpenSecrets.org.

“Abysmal” IRS data

Back in February, CRP had some tough words for the IRS concerning the information. In the closing pages of our comment on the agency’s proposed guidelines for candidate-related political activity, we wrote that “the data the IRS provides to the public — and the manner in which it provides it — is abysmal.”

While I am glad to see better access to 501(c) 990 data, in a very real sense, this isn’t “IRS data” is it?

This is data that the government collected under penalty of law from tax entities in the United States.

Granting it was sent in “voluntarily” but there is a lot of data that entities and individuals send to local, state and federal government “voluntarily.” Not all of it is data that most of us would want handed out because other people are curious.

As I said, I like better access to 990 data but we need to distinguish between:

  1. Government sharing data it collected from citizens or other entities, and
  2. Government sharing data about government meetings, discussions, contacts with citizens/contractors, policy making, processes and the like.

If I’m not seriously mistaken, most of the open data from government involves a great deal of #1 and very little of #2.

Is that your impression as well?

One quick example. The United States Congress, with some reluctance, seems poised on delivery of near real-time information on legislative proposals before Congress. Which is a good thing.

But there has been no discussion of tracking the final editing of bills to trace the insertion or deletion of language by who and with whose agreement? Which is a bad thing.

It makes no difference how public the process is up to final edits, if the final version is voted upon before changes can be found and charged to those responsible.

April 2, 2014

Open Access Maps at NYPL

Filed under: Maps,Open Access — Patrick Durusau @ 3:47 pm

Open Access Maps at NYPL by Matt Knutzen, Stephen A. Schwarzman Building, Map Division.

From the post:

The Lionel Pincus & Princess Firyal Map Division is very proud to announce the release of more than 20,000 cartographic works as high resolution downloads. We believe these maps have no known US copyright restrictions.* To the extent that some jurisdictions grant NYPL an additional copyright in the digital reproductions of these maps, NYPL is distributing these images under a Creative Commons CC0 1.0 Universal Public Domain Dedication. The maps can be viewed through the New York Public Library’s Digital Collections page, and downloaded (!), through the Map Warper. First, create an account, then click a map title and go. Here’s a primer and more extended blog post on the warper.

…image omitted…

What’s this all mean?

It means you can have the maps, all of them if you want, for free, in high resolution. We’ve scanned them to enable their use in the broadest possible ways by the largest number of people.

Though not required, if you’d like to credit the New York Public Library, please use the following text “From The Lionel Pincus & Princess Firyal Map Division, The New York Public Library.” Doing so helps us track what happens when we release collections like this to the public for free under really relaxed and open terms. We believe our collections inspire all kinds of creativity, innovation and discovery, things the NYPL holds very dear.

In case you were unaware of it, librarians as a class have a very subversive agenda.

They want to provide as many people as much access to information as is possible.

People + information is a revolutionary mixture.

March 31, 2014

OpenAccessReader

Filed under: Open Access — Patrick Durusau @ 9:08 pm

OpenAccessReader

From the webpage:

Open Access Reader is a project to systematically ensure that all significant open access research is cited in Wikipedia.

There’s lots of great research being published in good quality open access journals that isn’t cited in Wikipedia. It’s peer reviewed, so it should count as a reliable source. It’s available for anyone to read and probably comes with pretty decent metadata too. Can we set up a process to make it super convenient for editors to find and cite these papers?

If you are looking for a project with the potential to make a real difference this year, check this one out.

They are looking for volunteers.

March 30, 2014

Accessible Government vs. Open Government

Filed under: Government,Open Access,Open Government — Patrick Durusau @ 6:40 pm

Congressional Officials Grant Access Due To Campaign Contributions: A Randomized Field Experiment

Abstract:

Concern that lawmakers grant preferential treatment to individuals because they have contributed to political campaigns has long occupied jurists, scholars, and the public. However, the effects of campaign contributions on legislators’ behavior have proven notoriously difficult to assess. We report the first randomized field experiment on the topic. In the experiment, a political organization attempted to schedule meetings between 191 Members of Congress and their constituents who had contributed to political campaigns. However, the organization randomly assigned whether it informed legislators’ offices that individuals who would attend the meetings were contributors. Congressional offices made considerably more senior officials available for meetings when offices were informed the attendees were donors, with senior officials attending such meetings more than three times as often (p < 0.01). Influential policymakers thus appear to make themselves much more accessible to individuals because they have contributed to campaigns, even in the absence of quid pro quo arrangements. These findings have significant implications for ongoing legal and legislative debates. The hypothesis that individuals can command greater attention from influential policymakers by contributing to campaigns has been among the most contested explanations for how financial resources translate into political power. The simple but revealing experiment presented here elevates this hypothesis from extensively contested to scientifically supported.

Donors really are different from the rest of us, they have access.

One hopes the next randomized experiment distinguishes where the break points are in donations.

I suspect < $500 is one group, $500 - $1,000 is the second group, $1,000 - $2,500 is the third group and so on. Just guesses on my part but it would help the political process if potential donors had a bidding sheet for candidates. You don't want to appear foolish and pay too much for access to a junior member of Congress but on the other hand, you don't want to insult a senior member with too small of an donation. Think of it as transparency of access. I first saw this at Full Text Reports.

March 22, 2014

Opening data: Have you checked your pipes?

Filed under: Data Mining,ETL,Open Access,Open Data — Patrick Durusau @ 7:44 pm

Opening data: Have you checked your pipes? by Bob Lannon.

From the post:

Code for America alum Dave Guarino had a post recently entitled “ETL for America”. In it, he highlights something that open data practitioners face with every new project: the problem of Extracting data from old databases, Transforming it to suit a new application or analysis and Loading it into the new datastore that will support that new application or analysis. Almost every technical project (and every idea for one) has this process as an initial cost. This cost is so pervasive that it’s rarely discussed by anyone except for the wretched “data plumber” (Dave’s term) who has no choice but to figure out how to move the important resources from one place to another.

Why aren’t we talking about it?

The up-front costs of ETL don’t come up very often in the open data and civic hacking community. At hackathons, in funding pitches, and in our definitions of success, we tend to focus on outputs (apps, APIs, visualizations) and treat the data preparation as a collateral task, unavoidable and necessary but not worth “getting into the weeds” about. Quoting Dave:

The fact that I can go months hearing about “open data” without a single
mention of ETL is a problem. ETL is the pipes of your house: it’s how you
open data.

It’s difficult to point to evidence that this is really the case, but I personally share Dave’s experience. To me, it’s still the elephant in the room during the proceedings of any given hackathon or open data challenge. I worry that the open data community is somehow under the false impression that, eventually in the sunny future, data will be released in a more clean way and that this cost will decrease over time.

It won’t. Open data might get cleaner, but no data source can evolve to the point where it serves all possible needs. Regardless of how easy it is to read, the data published by government probably wasn’t prepared with your new app idea in mind.

Data transformation will always be necessary, and it’s worth considering apart from the development of the next cool interface. It’s a permanent cost of developing new things in our space, so why aren’t we putting more resources toward addressing it as a problem in its own right? Why not spend some quality time (and money) focused on data preparation itself, and then let a thousand apps bloom?

If you only take away this line:

Open data might get cleaner, but no data source can evolve to the point where it serves all possible needs. (emphasis added)

From Bob’s entire post, reading it has been time well spent.

Your “clean data” will at times be my “dirty data” and vice versa.

Documenting the semantics we “see” in data and that drives our transformations into “clean” data for us, stands a chance of helping the next person in the line to use that data.

Think of it as an accumulation of experience with a data sets and the results obtained from it.

Or you can just “wing it” with ever data set you encounter and so shall we all.

Your call.

I first saw this in a tweet by Dave Guarino.

March 17, 2014

Peyote and the International Plant Names Index

Filed under: Agriculture,Data,Names,Open Access,Open Data,Science — Patrick Durusau @ 1:30 pm

International Plant Names Index

What a great resource to find as we near Spring!

From the webpage:

The International Plant Names Index (IPNI) is a database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes. Its goal is to eliminate the need for repeated reference to primary sources for basic bibliographic information about plant names. The data are freely available and are gradually being standardized and checked. IPNI will be a dynamic resource, depending on direct contributions by all members of the botanical community.

I entered the first plant name that came to mind: Peyote.

No “hits.” ?

Wikipedia gives Peyote’s binomial name as: Lophophora williamsii (think synonym).*

Searching on Lophophora williamsii, I got three (3) “hits.”

Had I bothered to read the FAQ before searching:

10. Can I use IPNI to search by common (vernacular) name?

No. IPNI does not include vernacular names of plants as these are rarely formally published. If you are looking for information about a plant for which you only have a common name you may find the following resources useful. (Please note that these links are to external sites which are not maintained by IPNI)

I understand the need to specialize in one form of names but “formally published” means that without a useful synonyms list, the general public has an additional burden to access publicly funded research results.

Even with a synonym list there is an additional burden because you have to look up terms in the list, then read the text with that understanding and then back to the synonym list again.

What would dramatically increase public access to publicly funded research would be to have a specialized synonym list for publications that transposes the jargon in articles to selected sets of synonyms. Would not be as precise or grammatical as the original, but it would allow the reading pubic to get a sense of even very technical research.

That could be a way to hitch topic maps to the access to publicly funded data band wagon.

Thoughts?

I first saw this in a tweet by Bill Baker.

* A couple of other fun facts from Wikipedia on Peyote: 1. It’s conservation status is listed as “apparently secure,” and 2. Wikipedia has photos of Peyote “in the wild.” I suppose saying “Peyote growing in a pot” would raise too many questions.

March 15, 2014

Publishing biodiversity data directly from GitHub to GBIF

Filed under: Biodiversity,Data Repositories,Open Access,Open Data — Patrick Durusau @ 9:01 pm

Publishing biodiversity data directly from GitHub to GBIF by Roderic D. M. Page.

From the post:

Today I managed to publish some data from a GitHub repository directly to GBIF. Within a few minutes (and with Tim Robertson on hand via Skype to debug a few glitches) the data was automatically indexed by GBIF and its maps updated. You can see the data I uploaded here.

In case you don’t know about GBIF (I didn’t):

The Global Biodiversity Information Facility (GBIF) is an international open data infrastructure, funded by governments.

It allows anyone, anywhere to access data about all types of life on Earth, shared across national boundaries via the Internet.

By encouraging and helping institutions to publish data according to common standards, GBIF enables research not possible before, and informs better decisions to conserve and sustainably use the biological resources of the planet.

GBIF operates through a network of nodes, coordinating the biodiversity information facilities of Participant countries and organizations, collaborating with each other and the Secretariat to share skills, experiences and technical capacity.

GBIF’s vision: “A world in which biodiversity information is freely and universally available for science, society and a sustainable future.”

Roderic summarizes his post saying:

what I’m doing here is putting data on GitHub and having GBIF harvest that data directly from GitHub. This means I can edit the data, rebuild the Darwin Core Archive file, push it to GitHub, and GBIF will reindex it and update the data on the GBIF portal.

The process isn’t perfect but unlike disciplines where data sharing is the exception rather than the rule, the biodiversity community is trying to improve its sharing of data.

Every attempt at improvement will not succeed but lessons are learned from every attempt.

Kudos to the biodiversity community for a model that other communities should follow!

« Newer PostsOlder Posts »

Powered by WordPress