Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 4, 2014

PLOS’ Bold Data Policy

Filed under: Data,Open Access,Open Data,Public Data — Patrick Durusau @ 11:32 am

PLOS’ Bold Data Policy by David Crotty.

From the post:

If you pay any attention at all to scholarly publishing, you’re likely aware of the current uproar over PLOS’ recent announcement requiring all article authors to make their data publicly available. This is a bold move, and a forward-looking policy from PLOS. It may, for many reasons, have come too early to be effective, but ultimately, that may not be the point.

Perhaps the biggest practical problem with PLOS’ policy is that it puts an additional time and effort burden on already time-short, over-burdened researchers. I think I say this in nearly every post I write for the Scholarly Kitchen, but will repeat it again here: Time is a researcher’s most precious commodity. Researchers will almost always follow the path of least resistance, and not do anything that takes them away from their research if it can be avoided.

When depositing NIH-funded papers in PubMed Central was voluntary, only 3.8% of eligible papers were deposited, not because people didn’t want to improve access to their results, but because it wasn’t required and took time and effort away from experiments. Even now, with PubMed Central deposit mandatory, only 20% of what’s deposited comes from authors. The majority of papers come from journals depositing on behalf of authors (something else for which no one seems to give publishers any credit, Kent, one more for your list). Without publishers automating the process on the author’s behalf, compliance would likely be vastly lower. Lightening the burden of the researcher in this manner has become a competitive advantage for the journals that offer this service.

While recognizing the goal of researchers to do more experiments, isn’t this reminiscent of the lack of documentation for networks and software?

That creators of networks and software want to get on with the work they enjoy, documentation not being part of that work.

The problem with the semantics of research data, much as it is with network and software semantics, it there is no one else to ask about its semantics. If researchers don’t document those semantics as they perform experiments, then they will have to spend the time at publication to gather that information together.

I sense an opportunity here for software to assist researchers in capturing semantics as they perform experiments, so that production of semantically annotated data at the end of an experiment can be largely a clerical task, subject to review by the actual researchers.

The minimal semantics that needs to be captured for different type of research will vary. That is all the more reason to research and document those semantics before anyone writes a complex monolith of semantics into which existing semantics must be shoe horned.

Reasoning if we don’t know the semantics of data, it is more cost effective to pipe it to /dev/null.

I first saw this in a tweet by ChemConnector.

February 26, 2014

Data Access for the Open Access Literature: PLOS’s Data Policy

Filed under: Data,Open Access,Open Data,Public Data — Patrick Durusau @ 5:44 pm

Data Access for the Open Access Literature: PLOS’s Data Policy by Theo Bloom.

From the post:

Data are any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances. In line with Open Access to research articles themselves, PLOS strongly believes that to best foster scientific progress, the underlying data should be made freely available for researchers to use, wherever this is legal and ethical. Data availability allows replication, reanalysis, new analysis, interpretation, or inclusion into meta-analyses, and facilitates reproducibility of research, all providing a better ‘bang for the buck’ out of scientific research, much of which is funded from public or nonprofit sources. Ultimately, all of these considerations aside, our viewpoint is quite simple: ensuring access to the underlying data should be an intrinsic part of the scientific publishing process.

PLOS journals have requested data be available since their inception, but we believe that providing more specific instructions for authors regarding appropriate data deposition options, and providing more information in the published article as to how to access data, is important for readers and users of the research we publish. As a result, PLOS is now releasing a revised Data Policy that will come into effect on March 1, 2014, in which authors will be required to include a data availability statement in all research articles published by PLOS journals; the policy can be found below. This policy was developed after extensive consultation with PLOS in-house professional and external Academic Editors and Editors in Chief, who are practicing scientists from a variety of disciplines.

We now welcome input from the larger community of authors, researchers, patients, and others, and invite you to comment before March. We encourage you to contact us collectively at data@plos.org; feedback via Twitter and other sources will also be monitored. You may also contact individual PLOS journals directly.

That is a large step towards verifiable research and was taken by PLOS in December of 2013.

That has been supplemented with details that do not change the December announcement in: PLOS’ New Data Policy: Public Access to Data by Liz Silva, which reads in part:

A flurry of interest has arisen around the revised PLOS data policy that we announced in December and which will come into effect for research papers submitted next month. We are gratified to see a huge swell of support for the ideas behind the policy, but we note some concerns about how it will be implemented and how it will affect those preparing articles for publication in PLOS journals. We’d therefore like to clarify a few points that have arisen and once again encourage those with concerns to check the details of the policy or our FAQs, and to contact us with concerns if we have not covered them.

I think the bottom line is: Don’t Panic, Ask.

There are always going to be unanticipated details or concerns but as time goes by and customs develop for how to solve those issues, the questions will become fewer and fewer.

Over time and not that much time, our history of arrangements other than open access are going to puzzle present and future generations of researchers.

February 9, 2014

Medical research—still a scandal

Filed under: Medical Informatics,Open Access,Open Data,Research Methods — Patrick Durusau @ 5:45 pm

Medical research—still a scandal by Richard Smith.

From the post:

Twenty years ago this week the statistician Doug Altman published an editorial in the BMJ arguing that much medical research was of poor quality and misleading. In his editorial entitled, “The Scandal of Poor Medical Research,” Altman wrote that much research was “seriously flawed through the use of inappropriate designs, unrepresentative samples, small samples, incorrect methods of analysis, and faulty interpretation.” Twenty years later I fear that things are not better but worse.

Most editorials like most of everything, including people, disappear into obscurity very fast, but Altman’s editorial is one that has lasted. I was the editor of the BMJ when we published the editorial, and I have cited Altman’s editorial many times, including recently. The editorial was published in the dawn of evidence based medicine as an increasing number of people realised how much of medical practice lacked evidence of effectiveness and how much research was poor. Altman’s editorial with its concise argument and blunt, provocative title crystallised the scandal.

Why, asked Altman, is so much research poor? Because “researchers feel compelled for career reasons to carry out research that they are ill equipped to perform, and nobody stops them.” In other words, too much medical research was conducted by amateurs who were required to do some research in order to progress in their medical careers.

Ethics committees, who had to approve research, were ill equipped to detect scientific flaws, and the flaws were eventually detected by statisticians, like Altman, working as firefighters. Quality assurance should be built in at the beginning of research not the end, particularly as many journals lacked statistical skills and simply went ahead and published misleading research.

If you are thinking things are better today, consider a further comment from Richard:

The Lancet has this month published an important collection of articles on waste in medical research. The collection has grown from an article by Iain Chalmers and Paul Glasziou in which they argued that 85% of expenditure on medical research ($240 billion in 2010) is wasted. In a very powerful talk at last year’s peer review congress John Ioannidis showed that almost none of thousands of research reports linking foods to conditions are correct and how around only 1% of thousands of studies linking genes with diseases are reporting linkages that are real. His famous paper “Why most published research findings are false” continues to be the most cited paper of PLoS Medicine.

Not that I think open access would be a panacea for poor research quality but at least it would provide the opportunity for discovery.

All this talk about medical research reminds me of the Big Mechanism DARPA. Assume the research data on pathways is no better or no worse than mapping genes to diseases, DARPA will be spending $42 million to mine data with 1% accuracy.

A better use of those “Big Mechanism” dollars would be to test solutions to produce better medical research for mining.

1% sounds like low-grade ore to me.

OTexts.org Update!

Filed under: Books,Open Access — Patrick Durusau @ 3:59 pm

OTexts.org has added three new books since my post on the launch of OTexts.

New titles:

Applied biostatistical analysis using R by Stephen B. Cox.

Introduction to Computing : Explorations in Language, Logic, and Machines by David Evans.

Modal logic of strict necessity and possibility by Evgeni Latinov.

The STEM fields have put the humanities to shame in terms of open access to high quality materials.

Don’t you think it was about time the humanities started using open access technologies?

February 1, 2014

Academic Torrents!

Filed under: Data,Open Access,Open Data — Patrick Durusau @ 4:02 pm

Academic Torrents!

From the homepage:

Currently making 1.67TB of research data available.

Sharing data is hard. Emails have size limits, and setting up servers is too much work. We’ve designed a distributed system for sharing enormous datasets – for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds. Contact us at joecohen@cs.umb.edu.

Some data sets you have probably already seen but perhaps several you have not! Like the crater data set for Mars!

Enjoy!

I first saw this in a tweet by Tony Ojeda.

January 22, 2014

Composable languages for bioinformatics: the NYoSh experiment

Filed under: Open Access,Publishing — Patrick Durusau @ 4:35 pm

Composable languages for bioinformatics: the NYoSh experiment by Manuele Simi, Fabien Campagne​. (Simi M, Campagne F. (2014) Composable languages for bioinformatics: the NYoSh experiment. PeerJ 2:e241 http://dx.doi.org/10.7717/peerj.241)

Abstract:

Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org.

In the discussion section of the paper the authors concede:

We expect that widespread use of LWB will result in a multiplication of small languages, but in a manner that will increase language reuse and interoperability, rather than in the historical language fragmentation that has been observed with traditional language technology.

Whenever I hear projections about the development of languages I am reminded the inventors of “SCSI” thought it should be pronounced “sexy,” whereas others preferred “scuzzi.” Doesn’t have the same ring to it does it?

I am all in favor of domain specific languages (DSLs), but at the same time, am mindful that undocumented languages are in danger of becoming “dead” languages.

January 20, 2014

Microsoft Research adopts Open Access… [Write to MS]

Filed under: Microsoft,Open Access — Patrick Durusau @ 3:43 pm

Microsoft Research adopts Open Access policy for publications

From the post:

In a recent interview with Scientific American, Peter Lee, head of Microsoft Research, discussed three main motivations for basic research at Microsoft. The first relates to an aspiration to advance human knowledge, the second derives from a culture that relies deeply on the ambitions of individual researchers, and the last concerns “promoting open publication of all research results and encouraging deep collaborations with academic researchers.”

It is in keeping with this third motivation that Microsoft Research recently committed to an Open Access policy for our researchers’ publications.

As evidenced by a long-running series of blog posts by Tony Hey, vice president of Microsoft Research Connections, Microsoft Research has carefully deliberated our role in the growing movement toward open publications and open data.

This is great news. When Microsoft steps, it’s a big step. Heard near and far.

Take the time to write to anyone you know at Microsoft just to say you appreciate the decision.

We all write to them to complain about MS products, so why not write a nice note about open access?

It won’t take five (5) minutes if you open up your email client right now. (I wrote one before I posted this entry.)

OpenAIRE Legal Study has been published

Filed under: Law,Licensing,Open Access,Open Data,Open Source — Patrick Durusau @ 2:14 pm

OpenAIRE Legal Study has been published

From the post:

Guibault, Lucie; Wiebe, Andreas (Eds) (2013) Safe to be Open: Study on the protection of research data and recommendation for access and usage. The full-text of the book is available (PDF, ca. 2 MB ) under the CC BY 4.0 license. Published by University of Göttingen Press (Copies can be ordered from the publisher’s website)

Any e-infrastructure which primarily relies on harvesting external data sources (e.g. repositories) needs to be fully aware of any legal implications for re-use of this knowledge, and further application by 3rd parties. OpenAIRE’s legal study will put forward recommendations as to applicable licenses that appropriately address scientific data in the context of OpenAIRE.

CAUTION:: Safe to be Open is a EU-centric publication and while very useful in copyright discussions elsewhere, should not be relied upon as legal advice. (That’s not an opinion about relying on it in the EU. Ask local counsel for that advice.)

I say that having witnessed too many licensing discussions that were uninformed by legal counsel. Entertaining to be sure but if I have a copyright question, I will be posing it to counsel who is being paid to be correct.

At least until ignorance of the law becomes an affirmative shield against liability for copyright infringement. 😉

To be sure, I recommend reading of Safe to be Open as a means to become informed about the contours of access and usage of research data in the EU. And possibly a model for solutions in legal systems that lag behind the EU in that regard.

Personally I favor Attribution CC BY because the other CC licenses presume the licensed material was created without unacknowledged/uncompensated contributions from others.

Think of all the people who taught you to read, write, program and all the people whose work you have read, been influenced by, etc. Hopefully you can add to the sum of communal knowledge but it is unfair to claim ownership of the whole of communal knowledge simply because you contributed a small part. (That’s not legal advice either, just my personal opinion.)

Without all the instrument makers, composers, singers, organists, etc. that came before him, Mozart would not the same Mozart that we remember. Just as gifted but without a context to display his gifts.

Patent and copyright need to be recognized as “thumbs on the scale” against development of services and knowledge. That’s where I would start a discussion of copyright and patents.

August 28, 2013

CORE

Filed under: Data,Open Access — Patrick Durusau @ 2:28 pm

CORE

From the about:

CORE (COnnecting REpositories) aims to facilitate free access to scholarly publications distributed across many systems. As of today, CORE gives you access to millions of scholarly articles aggregated from many Open Access repositories.

We believe in free access to information. The mission of CORE is to:

  • Support the right of citizens and general public to access the results of research towards which they contributed by paying taxes.
  • Facilitate access to Open Access content for all by targeting general public, software developers, researchers, etc., by improving search and navigation using state-of-the-art technologies in the field of natural language processing and the Semantic Web.
  • Provide support to both content consumers and content providers by working with digital libraries and institutional repositories.
  • Contribute to a cultural change by promoting Open Access.

BTW, CORE also allows you to harvest their data.

As of today, August 28, 2013, 13,639,485 articles.

Excellent resource for scholarly publications!

Not to mention a useful yardstick for other publication indexing projects.

What does your indexing project offer that CORE does not?

That is rather than duplicating indexing we already possess, where it the value-add of your indexing?

August 24, 2013

SCOAP3

Filed under: Open Access,Publishing — Patrick Durusau @ 3:52 pm

SCOAP3

I didn’t recognize the acronym either. 😉

From the “about” page:

The Open Access (OA) tenets of granting unrestricted access to the results of publicly-funded research are in contrast with current models of scientific publishing, where access is restricted to journal customers. At the same time, subscription costs increase and put considerable strain on libraries, forcing them to cancel an increasing number of journals subscriptions. This situation is particularly acute in fields like High-Energy Physics (HEP), where pre-prints describing scientific results are timely available online. There is a growing concern within the academic community that the future of high-quality journals, and the peer-review system they administer, is at risk.

To address this situation for HEP and, as an experiment, Science at large, a new model for OA publishing has emerged: SCOAP3 (Sponsoring Consortium for Open Access Publishing in Particle Physics). In this model, HEP funding agencies and libraries, which today purchase journal subscriptions to implicitly support the peer-review service, federate to explicitly cover its cost, while publishers make the electronic versions of their journals free to read. Authors are not directly charged to publish their articles OA.

SCOAP3 will, for the first time, link quality and price, stimulating competition and enabling considerable medium- and long-term savings. Today, most publishers quote a price in the range of 1’000–2’000 Euros per published article. On this basis, we estimate that the annual budget for the transition of HEP publishing to OA would amount to a maximum of 10 Million Euros/year, sensibly lower than the estimated global expenditure in subscription to HEP journals.

Each SCOAP3 partner will finance its contribution by canceling journal subscriptions. Each country will contribute according to its share of HEP publishing. The transition to OA will be facilitated by the fact that the large majority of HEP articles are published in just six peer-reviewed journals. Of course, the SCOAP3 model is open to any, present or future, high-quality HEP journal aiming at a dynamic market with healthy competition and broader choice.

HEP funding agencies and libraries are currently signing Expressions of Interest for the financial backing of the consortium. A tendering procedure will then take place. Provided that SCOAP3 funding partners are prepared to engage in long-term commitments, many publishers are expected to be ready to enter into negotiations.

The example of SCOAP3 could be rapidly followed by other fields, directly related to HEP, such as nuclear physics or astro-particle physics, also similarly compact and organized with a reasonable number of journals.

Models like this one may result in increasing the amount of information available for topic mapping and the amount of semantic diversity in traditional search results.

Delivery models are changing but search interfaces leave us to our own devices at the document level.

If we are going to have better access in the physical sense, shouldn’t we be working on better access in the content sense?

PS: To show this movement has legs, consider the recent agreement of Elsevier, IOPp and Springer to participate.

July 10, 2013

Data Sharing and Management Snafu in 3 Short Acts

Filed under: Archives,Astroinformatics,Open Access,Open Data — Patrick Durusau @ 1:43 pm

As you may suspect, my concerns are focused on the preservation of the semantics of the field names, Sam1, Sam2, Sam3, but also with the field names that will be generated by the requesting researcher.

I found this video embedded in: A call for open access to all data used in AJ and ApJ articles by Kelle Cruz.

From the post:

I don’t fully understand it, but I know the Astronomical Journal (AJ) and Astrophysical Journal (ApJ) are different than many other journals: They are run by the American Astronomical Society (AAS) and not by a for-profit publisher. That means that the AAS Council and the members (the people actually producing and reading the science) have a lot of control over how the journals are run. In a recent President’s Column, the AAS President, David Helfand proposed a radical, yet obvious, idea for propelling our field into the realm of data sharing and open access: require all journal articles to be accompanied by the data on which the conclusions are based.

We are a data-rich—and data-driven—field [and] I am advocating [that authors provide] a link in articles to the data that underlies a paper’s conclusions…In my view, the time has come—and the technological resources are available—to make the conclusion of every ApJ or AJ article fully reproducible by publishing the data that underlie that conclusion. It would be an important step toward enhancing and sharing our scientific understanding of the universe.

Kelle points out several reasons why existing efforts are insufficient to meet the sharing and archiving needs of the astronomical community.

Suggested reading if you are concerned with astronomical data or archives more generally.

June 18, 2013

Open Access is open access

Filed under: Open Access — Patrick Durusau @ 7:30 am

Open Access is open access by Peter Suber.

From the post:

I’m happy to announce that my book on OA (Open Access, MIT Press, 2012) is now OA. The book came out in mid-June last year, and the OA editions came out one year later, right on schedule. My thanks to MIT Press.
http://mitpress.mit.edu/books/open-access

Today MIT Press released four OA editions:

PDF

HTML

ePub

and Mobi.

Update page

A must forward to all your friends in academia.

Suber narrows the term open access to mean access to research publications, between researchers, without price and permission barriers.

By laying aside numerous other barriers to access and the profit making side of publishing, Suber makes the strongest possible case for open access to research.

A must read!

May 7, 2013

Cassava database becomes open access

Filed under: Agriculture,Open Access,Open Data — Patrick Durusau @ 3:50 pm

Cassava database becomes open access

From the post:

Cassavabase is a database of phenotypic and genotypic data generated by cassava breeding programs within the Next Generation Cassava Breeding (NEXTGEN Cassava) project*.

The database makes available breeding data immediately available, thereby providing cassava researchers and breeders a key reference data source. The Cassava plant (Manihot esculenta) feeds more than 500 million people mainly in Africa.

Besides phenotypic and genotypic data, Cassavabase  contains  cassava geographical maps, genome and sequences and other datasets produced within the NEXTGEN Cassava project. Data can be accessed through the web interface and also various tools are available to view the datasets. Cassavabase, and the advantages of open access data were presented at the recent G8 International Conference on Open Data for Agriculture held in Washington, D.C.

Cassava is a plant that isn’t subject to a Monsanto patent (I don’t think) or that requires Monsanto chemicals to grow properly.

That alone means you are unlikely to encounter references to it in globalization of agriculture discussions.

Why grow something you can’t sell internationally? While paying homage to Monsanto?

Answers suggest themselves to me but for now I simply wanted to make you aware of this dataset.

February 11, 2013

« Newer Posts

Powered by WordPress