Archive for the ‘Publishing’ Category

Overlay Journal – Discrete Analysis

Saturday, March 5th, 2016

From the post:

Discrete Analysis, a new open-access journal for articles which are “analytical in flavour but that also have an impact on the study of discrete structures”, launched this week. What’s interesting about it is that it’s an arXiv overlay journal founded by, among others, Timothy Gowers.

What that means is that you don’t get articles from Discrete Analysis – it just arranges peer review of papers held on the arXiv, cutting out almost all of the expensive parts of traditional journal publishing. I wasn’t really prepared for how shallow that makes the journal’s website – there’s a front page, and when you click on an article you’re shown a brief editorial comment with a link to the corresponding arXiv page, and that’s it.

But that’s all it needs to do – the opinion of Gowers and co. is that the only real value that journals add to the papers they publish is the seal of approval gained by peer review, so that’s the only thing they’re doing. Maths papers tend not to benefit from the typesetting services traditional publishers provide (or, more often than you’d like, are actively hampered by it).

One way the journal is adding value beyond a “yes, this is worth adding to the list of papers we approve of” is by providing an “editorial introduction” to accompany each article. These are brief notes, written by members of the editorial board, which introduce the topics discussed in the paper and provide some context, to help you decide if you want to read the paper. That’s a good idea, and it makes browsing through the articles – and this is something unheard of on the internet – quite pleasurable.

It’s not difficult to imagine “editorial introductions” with underlying mini-topic maps that could be explored on their own or that as you reach the “edge” of a particular topic map, it “unfolds” to reveal more associations/topics.

Not unlike a traditional street map for New York which you can unfold to find general areas but can then fold it up to focus more tightly on a particular area.

I hesitate to say “zoom” because in the application I have seen (important qualification), “zoom” uniformly reduces your field of view.

A more nuanced notion of “zoom,” for a topic map and perhaps for other maps as well, would be to hold portions of the current view stationary, say a starting point on an interstate highway and to “zoom” only a portion of the current view to show a detailed street map. That would enable the user to see a particular location while maintaining its larger context.

Pointers to applications that “zoom” but also maintain different levels of “zoom” in the same view? Given the fascination with “hairy” presentations of graphs that would have to be real winner.

Overlay Journals – Community-Based Peer Review?

Friday, February 12th, 2016

New Journals Piggyback on arXiv by Emily Conover.

From the post:

A non-traditional style of scientific publishing is gaining ground, with new journals popping up in recent months. The journals piggyback on the arXiv or other scientific repositories and apply peer review. A link to the accepted paper on the journal’s website sends readers to the paper on the repository.

Proponents hope to provide inexpensive open access publication and streamline the peer review process. To save money, such “overlay” journals typically do away with some of the services traditional publishers provide, for example typesetting and copyediting.

Not everyone is convinced. Questions remain about the scalability of overlay journals, and whether they will catch on — or whether scientists will demand the stamp of approval (and accompanying prestige) that the established, traditional journals provide.

The idea is by no means new — proposals for journals interfacing with online archives appeared as far back as the 1990s, and a few such journals are established in mathematics and computer science. But now, say proponents, it’s an idea whose time has come.

The newest such journal is the Open Journal of Astrophysics, which began accepting submissions on December 22. Editor in Chief Peter Coles of the University of Sussex says the idea came to him several years ago in a meeting about the cost of open access journals. “They were talking about charging thousands of pounds for making articles open access,” Coles says, and he thought, “I never consult journals now; I get all my papers from the arXiv.” By adding a front end onto arXiv to provide peer review, Coles says, “We can dispense with the whole paraphernalia with traditional journals.”

Authors first submit their papers to arXiv, and then input the appropriate arXiv ID on the journal’s website to indicate that they would like their paper reviewed. The journal follows a standard peer review process, with anonymous referees whose comments remain private.

When an article is accepted, a link appears on the journal’s website and the article is issued a digital object identifier (DOI). The entire process is free for authors and readers. As APS News went to press, Coles hoped to publish the first batch of half-dozen papers at the end of January.

My Archive for the ‘Peer Review’ Category has only a few of the high profile failures of peer review over the last five years.

You are probably familiar with at least twice as many reports as I have reported in this blog on the brokenness of peer review.

If traditional peer review is a known failure, why replicate it even for overlay journals?

Why not ask the full set of peers in a discipline? That is the readers of articles posted in public repositories?

If a book/journal article goes uncited, isn’t that evidence that it:

Did NOT advance the discipline in a way meaningful to their peers?

What other evidence would you have that it did advance the discipline? The opinions of friends of the editor? That seems too weak to even suggest.

Citation analysis isn’t free from issues, Are 90% of academic papers really never cited? Searching citations about academic citations reveals the good, the bad and the ugly, but it has the advantage of drawing on the entire pool of talent that comprises a discipline.

Moreover, peer review would not be limited to a one time judgment of traditional peer reviewers but on the basis of how a monograph or article fits into the intellectual development of the discipline as a whole.

Which is more persuasive: That editors and reviewers at Science or Nature accept a paper or that in the ten years following publication, an article is cited by every other major study in the field?

Citation analysis obviates the overhead costs that are raised about organizing peer review on a massive scale. Why organize peer review at all?

Peers are going to read and cite good literature and more likely than not, skip the bad. Unless you need to create positions for gate keepers and other barnacles on the profession, opt for citation based peer review based on open repositories.

I’m betting on the communities that silently vet papers and books in spite of the formalized and highly suspect mechanisms for peer review.

Overlay journals could publish preliminary lists of articles that are of interest in particular disciplines and as community-based peer review progresses, they can publish “best of…” series as the community further filters the publications.

Community-based peer review is already operating in your discipline. Why not call it out and benefit from it?

Sci-Hub Tip: Converting Paywall DOIs to Public Access

Thursday, February 11th, 2016

In a tweet Jon Tenn@nt points out that:

Reminder: add “.sci-hub.io” after the .com in the URL of pretty much any paywalled paper to gain instant free access.

BTW, I tested Jon’s advice with:

http://dx.doi.org/10.****/*******

re-cast as:

http://dx.doi.org.sci-hub.io/10.****/*******

And it works!

With a little scripting, you can convert your paywall DOIs into public access with sci-hub.io.

This “worked for me” so if you encounter issues, please ping me so I can update this post.

First Pirate – Sci-Hub?

Wednesday, February 10th, 2016

Sci-Hub romanticizes itself as:

Sci-Hub the first pirate website in the world to provide mass and public access to tens of millions of research papers. (from the about page)

I agree with:

…mass and public access to tens of millions of research papers

But Sci-Hub is hardly:

…the first pirate website in the world

I don’t remember the first gate-keeping publisher that went from stealing from the public in print to stealing from the public online.

With careful enough research I’m sure we could track that down but I’m not sure it matters at this point.

What we do know is that academic research is funded by the public, edited and reviewed by volunteers (to the extent it is reviewed at all), and then kept from the vast bulk of humanity for profit and status (gate-keeping).

It’s heady stuff to think of yourself as a bold and swashbuckling pirate, going to stick it “…to the man.”

However, gate-keeping publishers have developed stealing from the public to an art form. If you don’t believe me, take a brief look at the provisions in the Trans-Pacific Partnership that protect traditional publisher interests.

Recovering what has been stolen from the public isn’t theft at all, its restoration!

Use Sci-Hub, support Sci-Hub, spread the word about Sci-Hub.

Allow gate-keeping publishers to slowly, hopefully painfully, wither as opportunities for exploiting the public grow fewer and farther in between.

PS: You need to read: Meet the Robin Hood of Science by Simon Oxenham to get the full background on Sci-Hub and an extraordinary person, Alexandra Elbakyan.

JATS: Journal Article Tag Suite, Navigation Update!

Monday, January 11th, 2016

I posted about the appearance of JATS: Journal Article Tag Suite, version 1.1 and then began to lazily browse the pdf.

I forget what I was looking for now but I noticed the table of contents jumped from page 42 to page 235, and again from 272 to to 405. I’m thinking by this point “this is going to be a bear to find elements/attributes in.” I looked for an index only to find none.

But, there’s hope!

If you look at Chapter 7 “TAG Suite Components,” elements start on page 7 and attributes on page 28, you will find:

Each ✔ is a navigation link to that element (or attribute if you are in the attribute section) under each of those divisions, Archiving, Publishing, Authoring.

Very cool but falls under “non-obvious” for me.

Pass it on so others can safely and quickly navigate JATS 1.1!

PS: It was Tommie Usdin of Balisage fame who pointed out the table in chapter 7 to me. Thanks Tommie!

JATS: Journal Article Tag Suite, version 1.1

Friday, January 8th, 2016

JATS: Journal Article Tag Suite, version 1.1

Abstract:

The Journal Article Tag Suite provides a common XML format in which publishers and archives can exchange journal content. The JATS provides a set of XML elements and attributes for describing the textual and graphical content of journal articles as well as some non-article material such as letters, editorials, and book and product reviews.

Documentation and help files: Journal Article Tag Suite.

Tommie Usdin (of Balisage fame) posted to Facebook:

JATS has added capabilities to encode:
– NISO Access License and Indicators
– additional support for multiple language documents and for Japanese documents (including Ruby)
– citation of datasets
and some other things users of version 1.0 have requested.

Another XML vocabulary that provides grist for your XQuery adventures!

What is Scholarly HTML?

Saturday, October 31st, 2015

What is Scholarly HTML? by Robin Berjon and Sébastien Ballesteros.

Abstract:

Scholarly HTML is a domain-specific data format built entirely on open standards that enables the interoperable exchange of scholarly articles in a manner that is compatible with off-the-shelf browsers. This document describes how Scholarly HTML works and how it is encoded as a document. It is, itself, written in Scholarly HTML.

The abstract is accurate enough but the “Motivation” section provides a better sense of this project:

Scholarly articles are still primarily encoded as unstructured graphics formats in which most of the information initially created by research, or even just in the text, is lost. This was an acceptable, if deplorable, condition when viable alternatives did not seem possible, but document technology has today reached a level of maturity and universality that makes this situation no longer tenable. Information cannot be disseminated if it is destroyed before even having left its creator’s laptop.

According to the New York Times, adding structured information to their recipes (instead of exposing simply as plain text) improved their discoverability to the point of producing an immediate rise of 52 percent in traffic (NYT, 2014). At this point in time, cupcake recipes are reaping greater benefits from modern data format practices than the whole scientific endeavour.

This is not solely a loss for the high principles of knowledge sharing in science, it also has very immediate pragmatic consequences. Any tool, any service that tries to integrate with scholarly publishing has to spend the brunt of its complexity (or budget) extracting data the author would have willingly shared out of antiquated formats. This places stringent limits on the improvement of the scholarly toolbox, on the discoverability of scientific knowledge, and particularly on processes of meta-analysis.

To address these issues, we have followed an approach rooted in established best practices for the reuse of open, standard formats. The «HTML Vernacular» body of practice provides guidelines for the creation of domain-specific data formats that make use of HTML’s inherent extensibility (Science.AI, 2015b). Using the vernacular foundation overlaid with «schema.org» metadata we have produced a format for the interchange of scholarly articles built on open standards, ready for all to use.

Our high-level goals were:

• Uncompromisingly enabling structured metadata, accessibility, and internationalisation.
• Pragmatically working in Web browsers, even if it occasionally incurs some markup overhead.
• Powerfully customisable for inclusion in arbitrary Web sites, while remaining easy to process and interoperable.
• Entirely built on top of open, royalty-free standards.
• Long-term viability as a data format.

Additionally, in view of the specific problem we addressed, in the creation of this vernacular we have favoured the reliability of interchange over ease of authoring; but have nevertheless attempted to cater to the latter as much as possible. A decent boilerplate template file can certainly make authoring relatively simple, but not as radically simple as it can be. For such use cases, Scholarly HTML provides a great output target and overview of the data model required to support scholarly publishing at the document level.

An example of an authoring format that was designed to target Scholarly HTML as an output is the DOCX Standard Scientific Style which enables authors who are comfortable with Microsoft Word to author documents that have a direct upgrade path to semantic, standard content.

Where semantic modelling is concerned, our approach is to stick as much as possible to schema.org. Beyond the obvious advantages there are in reusing a vocabulary that is supported by all the major search engines and is actively being developed towards enabling a shared understanding of many useful concepts, it also provides a protection against «ontological drift» whereby a new vocabulary is defined by a small group with insufficient input from a broader community of practice. A language that solely a single participant understands is of limited value.

In a small, circumscribed number of cases we have had to depart from schema.org, using the https://ns.science.ai/ (prefixed with sa:) vocabulary instead (Science.AI, 2015a). Our goal is to work with schema.org in order to extend their vocabulary, and we will align our usage with the outcome of these discussions.

I especially enjoyed the observation:

According to the New York Times, adding structured information to their recipes (instead of exposing simply as plain text) improved their discoverability to the point of producing an immediate rise of 52 percent in traffic (NYT, 2014). At this point in time, cupcake recipes are reaping greater benefits from modern data format practices than the whole scientific endeavour.

I don’t doubt the truth of that story but after all, a large number of people are interested in baking cupcakes. Not more than three in many cases, are interested in reading any particular academic paper.

The use of schema.org will provide advantages for common concepts but to be truly useful for scholarly writing, it will require serious extension.

Take for example my post yesterday Deep Feature Synthesis:… [Replacing Human Intuition?, Calling Bull Shit]. What microdata from schema.org would help readers find Propositionalisation and Aggregates, 2001, which describes substantially the same technique, without claims of surpassing human intuition? (Uncited by the authors the paper on deep feature synthesis.)

Or the 161 papers on propositionalisation that you can find at CiteSeer?

A crude classification that can be used by search engines is very useful but falls far short of the mark in terms of finding and retrieving scholarly writing.

Semantic uniformity for classifying scholarly content hasn’t been reached by scholars or librarians despite centuries of effort. Rather than taking up that Sisyphean task, let’s map across the ever increasing universe of semantic diversity.

The Future Of News Is Not An Article

Wednesday, October 21st, 2015

The Future Of News Is Not An Article by Alexis Lloyd.

Alexis challenges readers to reconsider their assumptions about the nature of “articles.” Beginning with the model for articles that was taken over from traditional print media. Whatever appeared in an article yesterday must be re-created today if there is a new article on the same subject. Not surprising since print media lacks the means to transclude content from a prior article into a new one.

She saves her best argument for last:

A news organization publishes hundreds of articles a day, then starts all over the next day, recreating any redundant content each time. This approach is deeply shaped by the constraints of print media and seems unnecessary and strange when looked at from a natively digital perspective. Can you imagine if, every time something new happened in Syria, Wikipedia published a new Syria page, and in order to understand the bigger picture, you had to manually sift through hundreds of pages with overlapping information? The idea seems absurd in that context and yet, it is essentially what news publishers do every day.

While I agree fully with the advantages Alexis summarizes as Enhanced tools for journalists, Summarization and synthesis, and Adaptive Content (see her post), there are technical and non-technical roadblocks to such changes.

First and foremost, people are being paid to re-create redundant content everyday and their comfort levels, to say nothing about their remuneration for repetitive reporting of the same content will loom large in the adoption of the technology Alexis imagines.

I recall a disturbing story from a major paper where reporters didn’t share leads or research because of fear that other reporters would “scoop” them. That sort of protectionism isn’t limited to journalists. Rumor has it that Oracle sale reps refused to enter potential sales leads in a company wide database.

I don’t understand why that sort of pettiness is tolerated but be aware that it is, both in government and corporate environments.

Second and almost as importantly, Alexis needs raise the question of semantic ROI for any semantic technology. Take her point about adoption of the Semantic Web:

but have not seen universal adoption because of the labor costs involved in doing so.

To adopt a single level of semantic encoding for all content, without regard to its value, either historical or current use, is a sure budget buster. Perhaps the business community was playing closer attention to the Semantic Web than many of us thought, hence its adoption failure.

Some content may need machine driven encoding, more valuable content may require human supervision and/or encoding and some content may not be worth encoding at all. Depends on your ROI model.

I should mention that the Semantic Web manages statements about statements (in its or other semantic systems) poorly. (AKA, “facts about facts.”) Although I hate to use the term “facts.” The very notion of “facts” is misleading and tricky under the best of circumstances.

However universal (universal = among people you know) knowledge of a “fact” may seem, the better argument is that it is only a “fact” from a particular point of view. Semantic Web based systems have difficulty with such concepts.

Third, and not mentioned by Alexis, is that semantic systems should capture and preserve trails created by information explorers. Reporters at the New York Times use databases everyday, but each search starts from scratch.

If re-making redundant information over and over again is absurd, repeating the same searches (more or less successfully) over and over again is insane.

Capturing search trails as data would enrich existing databases, especially if searchers could annotate their trails and data they encounter along the way. The more intensively searched a resource becomes, the richer its semantics. As it is today, all the effort of searchers is lost at the end of each search.

Alexis is right, let’s stop entombing knowledge in articles, papers, posts and books. It won’t be quick or easy, but worthwhile journeys rarely are.

I first saw this in a tweet by Tim Strehle.

unglue.it

Monday, August 31st, 2015

From the webpage:

unglue (v. t.) 2. To make a digital book free to read and use, worldwide.

New to me, possibly old to you.

I “discovered” this site while looking at Intermediate Python.

From the general FAQ:

Basics

How It Works

What is Unglue.it?

Unglue.it is a a place for individuals and institutions to join together to make ebooks free to the world. We work together with authors, publishers, or other rights holders who want their ebooks to be free but also want to be able to earn a living doing so. We use Creative Commons licensing as an enabling tool to “unglue” the ebooks.

What are Ungluing Campaigns?

We have three types of Ungluing Campaigns: Pledge Campaigns, Buy-to-Unglue Campaigns and Thanks-for-Ungluing campaigns.

• In a Pledge Campaign, book lovers pledge their support for ungluing a book. If enough support is found to reach the goal (and only then), the supporter’s credit cards are charged, and an unglued ebook is released.
• In a Buy-to-Unglue Campaign, every ebook copy sold moves the book’s ungluing date closer to the present. And you can donate ebooks to your local library- that’s something you can’t do in the Kindle or Apple Stores!
• In a Thanks-for-Ungluing Campaign, the ebook is already released with a Creative Commons license. Supporters can express their thanks by paying what they wish for the license and the ebook.

What is Crowdfunding?

Crowdfunding is collectively pooling contributions (or pledges) to support some cause. Using the internet for coordination means that complete strangers can work together, drawn by a common cause. This also means the number of supporters can be vast, so individual contributions can be as large or as small as people are comfortable with, and still add up to enough to do something amazing.

Want to see some examples? Kickstarter lets artists and inventors solicit funds to make their projects a reality. For instance, webcomic artist Rich Burlew sought $57,750 to reprint his comics in paper form — and raised close to a million. In other words, crowdfunding is working together to support something you love. By pooling resources, big and small, from all over the world, we can make huge things happen. What will supplement and then replace contemporary publishing models remains to be seen. In terms of experiments, this one looks quite promising. If you use unglue.it, please ping me with your experience. Thanks! The Nation has a new publishing model Wednesday, July 8th, 2015 From the post: …on July 6, 2015—exactly 150 years after the publication of our first issue—we’re relaunching TheNation.com. The new site, created in partnership with our friends at Blue State Digital and Diaspark, represents our commitment to being at the forefront of independent journalism for the next generation. The article page is designed with the Nation ambassador in mind: Beautiful, clear fonts (Mercury and Knockout) and a variety of image fields make the articles a joy to read—on desktop, tablet, and mobile. Prominent share tools, Twitter quotes, and a “highlight to e-mail/tweet” function make it easy to share them with others. A robust new taxonomy and a continuous scroll seamlessly connect readers to related content. You’ll also see color-coded touts that let readers take action on a particular issue, or donate and subscribe to The Nation. I’m not overly fond of paywalls as you know but one part of the relaunch merits closer study. Comments on articles are going to be open to subscribers only. It will be interesting to learn what the experience of The Nation is with its comments only by subscribers. Hopefully their tracking will be granular enough to determine what portion of subscribers subscribed, simply so they could make comments. There are any number of fields where opinions run hot enough that even open content but paying for comments to be displayed could be a viable model for publication. Imagine a publicly accessible topic map on the candidates for the US presidential election next year. If it had sufficient visibility, the publication of any report would spawn automatic responses from others. Responses that would not appear without paying for access to publish the comment. Viable economic model? Suggestions? Digital Data Repositories in Chemistry… Wednesday, July 1st, 2015 Abtract: We discuss the concept of recasting the data-rich scientific journal article into two components, a narrative and separate data components, each of which is assigned a persistent digital object identifier. Doing so allows each of these components to exist in an environment optimized for purpose. We make use of a poorly-known feature of the handle system for assigning persistent identifiers that allows an individual data file from a larger file set to be retrieved according to its file name or its MIME type. The data objects allow facile visualization and retrieval for reuse of the data and facilitates other operations such as data mining. Examples from five recently published articles illustrate these concepts. A very promising effort to integrate published content and electronic notebooks in chemistry. Encouraging that in addition to the technical and identity issues the authors also point out the lack of incentives for the extra work required to achieve useful integration. Everyone agrees that deeper integration of resources in the sciences will be a game-changer but renewing the realization that there is no such thing as a free lunch, is an important step towards that goal. This article easily repays a close read with interesting subject identity issues and the potential that topic maps would offer to such an effort. The peer review drugs don’t work [Faith Based Science] Sunday, May 31st, 2015 The peer review drugs don’t work by Richard Smith. From the post: It is paradoxical and ironic that peer review, a process at the heart of science, is based on faith not evidence. There is evidence on peer review, but few scientists and scientific editors seem to know of it – and what it shows is that the process has little if any benefit and lots of flaws. Peer review is supposed to be the quality assurance system for science, weeding out the scientifically unreliable and reassuring readers of journals that they can trust what they are reading. In reality, however, it is ineffective, largely a lottery, anti-innovatory, slow, expensive, wasteful of scientific time, inefficient, easily abused, prone to bias, unable to detect fraud and irrelevant. As Drummond Rennie, the founder of the annual International Congress on Peer Review and Biomedical Publication, says, “If peer review was a drug it would never be allowed onto the market.” Cochrane reviews, which gather systematically all available evidence, are the highest form of scientific evidence. A 2007 Cochrane review of peer review for journals concludes: “At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research.” We can see before our eyes that peer review doesn’t work because most of what is published in scientific journals is plain wrong. The most cited paper in Plos Medicine, which was written by Stanford University’s John Ioannidis, shows that most published research findings are false. Studies by Ioannidis and others find that studies published in “top journals” are the most likely to be inaccurate. This is initially surprising, but it is to be expected as the “top journals” select studies that are new and sexy rather than reliable. A series published in The Lancet in 2014 has shown that 85 per cent of medical research is wasted because of poor methods, bias and poor quality control. A study in Nature showed that more than 85 per cent of preclinical studies could not be replicated, the acid test in science. I used to be the editor of the BMJ, and we conducted our own research into peer review. In one study we inserted eight errors into a 600 word paper and sent it 300 reviewers. None of them spotted more than five errors, and a fifth didn’t detect any. The median number spotted was two. These studies have been repeated many times with the same result. Other studies have shown that if reviewers are asked whether a study should be published there is little more agreement than would be expected by chance. As you might expect, the humanities are lagging far behind the sciences in acknowledging that peer review is an exercise in social status rather than quality: One of the changes I want to highlight is the way that “peer review” has evolved fairly quietly during the expansion of digital scholarship and pedagogy. Even though some scholars, such as Kathleen Fitzpatrick, are addressing the need for new models of peer review, recognition of the ways that this process has already been transformed in the digital realm remains limited. The 2010 Center for Studies in Higher Education (hereafter cited as Berkeley Report) comments astutely on the conventional role of peer review in the academy: Among the reasons peer review persists to such a degree in the academy is that, when tied to the venue of a publication, it is an efficient indicator of the quality, relevance, and likely impact of a piece of scholarship. Peer review strongly influences reputation and opportunities. (Harley, et al 21) These observations, like many of those presented in this document, contain considerable wisdom. Nevertheless, our understanding of peer review could use some reconsideration in light of the distinctive qualities and conditions associated with digital humanities. …(Living in a Digital World: Rethinking Peer Review, Collaboration, and Open Access by Sheila Cavanagh.) Can you think of another area where something akin to peer review is being touted? What about internal guidelines of the CIA, NSA, FBI and secret courts reviewing actions by those agencies? How do those differ from peer review, which is an acknowledged failure in science and should be acknowledged in the humanities? They are quite similar in the sense that some secret group is empowered to make decisions that impact others and members of those groups, don’t want to relinquish those powers. Surprise, surprise. Peer review should be scrapped across the board and replaced by tracked replication and use by others, both in the sciences and the humanities. Government decisions should be open to review by all its citizens and not just a privileged few. How journals could “add value” Thursday, May 28th, 2015 How journals could “add value” by Mark Watson. From the post: I wrote a piece for Genome Biology, you may have read it, about open science. I said a lot of things in there, but one thing I want to focus on is how journals could “add value”. As brief background: I think if you’re going to make money from academic publishing (and I have no problem if that’s what you want to do), then I think you should “add value”. Open science and open access is coming: open access journals are increasingly popular (and cheap!), preprint servers are more popular, green and gold open access policies are being implemented etc etc. Essentially, people are going to stop paying to access research articles pretty soon – think 5-10 year time frame. So what can journals do to “add value”? What can they do that will make us want to pay to access them? Here are a few ideas, most of which focus on going beyond the PDF: Humanities journals and their authors should take heed of these suggestions. Not applicable in every case but certainly better than “journal editorial board as resume padding.” Companion to “Functional Programming in Scala” Sunday, February 22nd, 2015 A companion booklet to “Functional Programming in Scala” by Rúnar Óli Bjarnason. From the webpage: This full colour syntax-highlighted booklet comprises all the chapter notes, hints, solutions to exercises, addenda, and errata for the book “Functional Programming in Scala” by Paul Chiusano and Runar Bjarnason. This material is freely available online, but is compiled here as a convenient companion to the book itself. If you talk about supporting alternative forms of publishing, here is your chance to support an alternative form of publishing, financially. Authors are going to gravitate to models that sustain their ability to write. It is up to you what model that will be. The Many Faces of Science (the journal) Saturday, February 21st, 2015 Andy Dalby tells a chilling tale in Why I will never trust Science again. You need to read the full account but as a quick summary, Andy submits a paper to Science that is rejected and within weeks finds that Science accepted another paper, a deeply flawed one, reaching the same conclusion and when he notified Science, it was suggested he post an online comment. Andy’s account has quotes, links to references, etc. That is one face of Science, secretive, arbitrary and restricted peer review of submissions. I say “restricted peer” because Science has a tiny number of reviewers, compared to your peers, who review submissions. If you want “peer review,” you should publish with an open source journal that enlists all of your peers as reviewers, not just a few. There is another face of Science, which appeared last December without any trace of irony at all: Does journal peer review miss best and brightest? by David Shultz, which reads in part: Sometimes greatness is hard to spot. Before going on to lead the Chicago Bulls to six NBA championships, Michael Jordan was famously cut from his high school basketball team. Scientists often face rejection of their own—in their case, the gatekeepers aren’t high school coaches, but journal editors and peers they select to review submitted papers. A study published today indicates that this system does a reasonable job of predicting the eventual interest in most papers, but it may shoot an air ball when it comes to identifying really game-changing research. There is a serious chink in the armor, though: All 14 of the most highly cited papers in the study were rejected by the three elite journals, and 12 of those were bounced before they could reach peer review. The finding suggests that unconventional research that falls outside the established lines of thought may be more prone to rejection from top journals, Siler says. Science publishes research showing its methods are flawed and yet it takes no notice. Perhaps its rejection of Andy’s paper isn’t so strange. It must have not traveled far enough down the stairs. I first saw Andy’s paper in a tweet by Mick Watson. Harry Potter eBooks Sunday, February 1st, 2015 All the Harry Potter ebooks are now on subscription site Oyster by Laura Hazard Owen. Laura reports the Harry Potter books are available on Oyster and Amazon. She says that Oyster has the spin-off titles from the original series where Amazon does not. Both offer$9.95 per month subscription rates, where Oyster claims “over a million” books and Amazon over 700,000. After reading David Mason’s How many books will you read in your lifetime?, I am not sure the difference in raw numbers will make much difference.

Access to electronic texts will certainly make creating topic maps for popular literature a good deal easier.

Enjoy!

Nature: A recap of a successful year in open access, and introducing CC BY as default

Tuesday, January 27th, 2015

A recap of a successful year in open access, and introducing CC BY as default by Carrie Calder, the Director of Strategy for Open Research, Nature Publishing Group/Palgrave Macmillan.

From the post:

We’re pleased to start 2015 with an announcement that we’re now using Creative Commons Attribution license CC BY 4.0 as default. This will apply to all of the 18 fully open access journals Nature Publishing Group owns, and will also apply to any future titles we launch. Two society- owned titles have introduced CC BY as default today and we expect to expand this in the coming months.

This follows a transformative 2014 for open access and open research at Nature Publishing Group. We’ve always been supporters of new technologies and open research (for example, we’ve had a liberal self-archiving policy in place for ten years now. In 2013 we had 65 journals with an open access option) but in 2014 we:

• Built a dedicated team of over 100 people working on Open Research across journals, books, data and author services
• Conducted research on whether there is an open access citation benefit, and researched authors’ views on OA
• Introduced the Nature Partner Journal series of high-quality open access journals and announced our first ten NPJs
• Launched Scientific Data, our first open access publication for Data Descriptors
• And last but not least switched Nature Communications to open access, creating the first Nature-branded fully open access journal

We did this not because it was easy (trust us, it wasn’t always) but because we thought it was the right thing to do. And because we don’t just believe in open access; we believe in driving open research forward, and in working with academics, funders and other publishers to do so. It’s obviously making a difference already. In 2013, 38% of our authors chose to publish open access immediately upon publication – in 2014, this percentage rose to 44%. Both Scientific Reports and Nature Communications had record years in terms of submissions for publication.

Open access is on its way to becoming the expected model for publishing. That isn’t to say that there aren’t economies and kinks to be worked out, but the fundamental principles of open access have been widely accepted.

Not everywhere of course. There are areas of scholarship that think self-isolation makes them important. They shun open access as an attack on their traditions of “Doctor Fathers” and access to original materials as a privilege. Strategies that make them all the more irrelevant in the modern world. Pity because there is so much they could contribute to the public conversation. But a public conversation means you are not insulated from questions that don’t accept “because I say so” as an adequate answer.

If you are working in such an area or know of one, press for emulation of the Nature and the many other efforts to provide open access to both primary and secondary materials. There are many areas of the humanities that already follow that model, but not all. Let’s keep pressing until open access is the default for all disciplines.

Kudos to Nature for their ongoing efforts on open access.

I first saw the news about the post about Nature in a tweet by Ethan White.

The Past, Present and Future of Scholarly Publishing

Saturday, January 3rd, 2015

The Past, Present and Future of Scholarly Publishing By Michael Eisen.

Michael made this presentation to the Commonwealth Club of California on March 12, 2013. This post is from the written text for the presentation and you can catch the audio here.

Michael does a great job tracing the history of academic publishing, the rise of open access and what is holding us back from a more productive publishing environment for everyone.

I disagree with his assessment of classification:

And as for classification, does anyone really think that assigning every paper to one of 10,000 journals, organized in a loose and chaotic hierarchy of topics and importance, is really the best way to help people browse the literature? This is a pure relic of a bygone era – an artifact of the historical accident that Gutenberg invented the printing press before Al Gore invented the Internet.

but will pass over that to address the more serious issue of open access publishing in the humanities.

Michael notes:

But the battle is by no means won. Open access collectively represents only around 10% of biomedical publishing, has less penetration in other sciences, and is almost non-existent in the humanities. And most scientists still send their best papers to “high impact” subscription-based journals.

There are open access journals in the humanities but it is fair to say they are few and far in between. If prestige is one of the drivers in scientific publishing, where large grant programs abound for some times of research, prestige is about the only driver for humanities publishing.

There are grant programs for the humanities but nothing on the scale of funding in the sciences. Salaries in the humanities are for the most part nothing to write home about. Humanities publishing really comes down to prestige.

Prestige from publication may be a dry, hard bone but it is the only bone that most humanities scholars will ever have. Try to take that away and you are likely to get bitten.

For instance, have you ever wondered about the proliferation of new translations of the Bible? Have we discovered new texts? New discoveries about biblical languages? Discovery of major mistakes in a prior edition? What if I said none of the above? To what would you assign the publication of new translations of the Bible?

If you compare the various translations you will find different “editors,” unless you are looking at a common source for bibles. Some sources do that as well. They create different “versions” for different target audiences.

With the exception of new versions like the New Revised Standard Version, which was undertaken to account for new information from the Dead Sea Scrolls, new editions of the Bible are primarily scholarly churn.

The humanities aren’t going to move any closer to open access publishing until their employers (universities) and funders, insist on open access publishing as a condition for tenure and funding.

I will address Michael’s mis-impressions about the value of classification another time.

Early English Books Online – Good News and Bad News

Friday, January 2nd, 2015

Early English Books Online

The very good news is that 25,000 volumes from the Early English Books Online collection have been made available to the public!

From the webpage:

The EEBO corpus consists of the works represented in the English Short Title Catalogue I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts and the Early English Books Tract Supplement. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, linguistics, and history, but these resources also include core texts in religious studies, art, women’s studies, history of science, law, and music.

Even better news from Sebastian Rahtz Sebastian Rahtz (Chief Data Architect, IT Services, University of Oxford):

The University of Oxford is now making this collection, together with Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints, available in various formats (TEI P5 XML, HTML and ePub) initially via the University of Oxford Text Archive at http://www.ota.ox.ac.uk/tcp/, and offering the source XML for community collaborative editing via Github. For the convenience of UK universities who subscribe to JISC Historic Books, a link to page images is also provided. We hope that the XML will serve as the base for enhancements and corrections.

This catalogue also lists EEBO Phase 2 texts, but the HTML and ePub versions of these can only be accessed by members of the University of Oxford.

[Technical note]
Those interested in working on the TEI P5 XML versions of the texts can check them out of Github, via https://github.com/textcreationpartnership/, where each of the texts is in its own repository (eg https://github.com/textcreationpartnership/A00021). There is a CSV file listing all the texts at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv, and a simple Linux/OSX shell script to clone all 32853 unrestricted repositories at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/cloneall.sh

Now for the BAD NEWS:

An additional 45,000 books:

Currently, EEBO-TCP Phase II texts are available to authorized users at partner libraries. Once the project is done, the corpus will be available for sale exclusively through ProQuest for five years. Then, the texts will be released freely to the public.

Can you guess why the public is barred from what are obviously public domain texts?

Because our funding is limited, we aim to key as many different works as possible, in the language in which our staff has the most expertise.

Academic projects are supposed to fund themselves and be self-sustaining. When anyone asks about sustainability of an academic project, ask them when the last time your countries military was “self sustaining?” The U.S. has spent $2.6 trillion on a “war on terrorism” and has nothing to show for it other than dead and injured military personnel, perversion of budgetary policies, and loss of privacy on a world wide scale. It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than$1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc.

My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise?

The Data Scientist

Thursday, January 1st, 2015

The Data Scientist

Kurt Kagel has setup a newspaper on Data Science and Computational Linguistics with the following editor’s note:

I have been covering the electronic information space for more than thirty years, as writer, editor, programmer and information architect. This paper represents an experiment, a venue to explore Data Science and Computational Linguistics, as well as the world of IT in general.

I’m still working out bugs and getting a feel for the platform, so look and feel (and content) will almost certainly change. If you are interested in featuring articles here, please contact me.

It is based on paper.li, which automatically loads content into your newspaper. Not to mention you being able to load content as well.

I have known Kurt for a number of years in the markup world and look forward to seeing how this newspaper develops.

Saturday, December 6th, 2014

From the post:

People are frequently surprised that my book, Higher-Order Perl, is available as a free download from my web site. They ask if it spoiled my sales, or if it was hard to convince the publisher. No and no.

I sent the HOP proposal to five publishers, expecting that two or three would turn it down, and that I would pick from the remaining two or three, but somewhat to my dismay, all five offered to publish it, and I had to decide who.

One of the five publishers was Morgan Kaufmann. I had never heard of Morgan Kaufmann, but one day around 2002 I was reading the web site of Philip Greenspun. Greenspun was incredibly grouchy. He found fault with everything. But he had nothing but praise for Morgan Kaufmann. I thought that if Morgan Kaufmann had pleased Greenspun, who was nearly impossible to please, then they must be really good, so I sent them the proposal. (They eventually published the book, and did a superb job; I have never regretted choosing them.)

But not only Morgan Kaufmann but four other publishers had offered to publish the book. So I asked a number of people for advice. I happened to be in London one week and Greenspun was giving a talk there, which I went to see. After the talk I introduced myself and asked for his advice about picking the publisher.

Access to “free” electronic versions is on its way to becoming a norm, at least with some computer science publishers. Cambridge University Press, CUP, with Data Mining and Analysis: Fundamental Concepts and Algorithms and Basic Category Theory comes to mind.

Other publishers with similar policies? Yes, I know there are CS publishers who want to make free with content of others, not so much with their own. Not the same thing.

I first saw this in a tweet by Julia Evans.

Nature makes all articles free to view [pay-to-say]

Tuesday, December 2nd, 2014

Nature makes all articles free to view by Richard Van Noorden.

From the post:

All research papers from Nature will be made free to read in a proprietary screen-view format that can be annotated but not copied, printed or downloaded, the journal’s publisher Macmillan announced on 2 December.

The content-sharing policy, which also applies to 48 other journals in Macmillan’s Nature Publishing Group (NPG) division, including Nature Genetics, Nature Medicine and Nature Physics, marks an attempt to let scientists freely read and share articles while preserving NPG’s primary source of income — the subscription fees libraries and individuals pay to gain access to articles.

ReadCube, a software platform similar to Apple’s iTunes, will be used to host and display read-only versions of the articles’ PDFs. If the initiative becomes popular, it may also boost the prospects of the ReadCube platform, in which Macmillan has a majority investment.

Annette Thomas, chief executive of Macmillan Science and Education, says that under the policy, subscribers can share any paper they have access to through a link to a read-only version of the paper’s PDF that can be viewed through a web browser. For institutional subscribers, that means every paper dating back to the journal’s foundation in 1869, while personal subscribers get access from 1997 on.

Anyone can subsequently repost and share this link. Around 100 media outlets and blogs will also be able to share links to read-only PDFs. Although the screen-view PDF cannot be printed, it can be annotated — which the publisher says will provide a way for scientists to collaborate by sharing their comments on manuscripts. PDF articles can also be saved to a free desktop version of ReadCube, similarly to how music files can be saved in iTunes.

I am hopeful that Macmillan will discover that allowing copying and printing are no threat to its income stream. Both are means of advertising for its journal at the expense of the user who copies a portion of the text for a citation or shares a printed copy with a colleague. Advertising paid for by users should be considered as a plus.

The annotation step is a good one, although I would modify it in some respects. First I would make all articles accessible by default with annotation capabilities. Then I would grant anyone who registers say 12 comments per year for free and offer a lower-than-subscription-cost option for more than twelve comments on articles.

If there is one thing I suspect users would be willing to pay for is the right to response to others in their fields. Either to response to articles and/or to other comments. Think of it as a pay-to-say market strategy.

It could be an “additional” option to current institutional and personal subscriptions and thus an entirely new revenue stream for Macmillan.

To head off expected objections by “free speech” advocates, I note that no journal publishes every letter to the editor. The right to free speech has never included the right to be heard on someone else’s dime. Annotation of Nature is on Macmillan’s dime.

Basic Category Theory (Publish With CUP)

Monday, July 28th, 2014

Basic Category Theory by Tom Leinster.

From the webpage:

Basic Category Theory is an introductory category theory textbook. Features:

• It doesn’t assume much, either in terms of background or mathematical maturity.
• It sticks to the basics.
• It’s short.

Advanced topics are omitted, leaving more space for careful explanations of the core concepts. I used earlier versions of the text to teach master’s-level courses at the University of Glasgow.

The book is published by Cambridge University Press. You can find all the publication data, and buy it, at the book’s CUP web page.

It was published on 24 July 2014 in hardback and e-book formats. The physical book should be in stock throughout Europe now, and worldwide by mid-September. Wherever you are, you can (pre)order it now from CUP or the usual online stores.

By arrangement with CUP, a free online version will be released in January 2016. This will be not only freely downloadable but also freely editable, under a Creative Commons licence. So, for instance, if parts of the book are unsuitable for the course you’re teaching, or if you don’t like the notation, you can change it. More details will appear here when the time comes.

Freely available as etext (6 months after hard copy release) and freely editable?

Show of hands. How many publishers have you seen with those policies?

I keep coming up with one, Cambridge University Press, CUP.

As readers and authors we need to vote with our feet. Purchase from and publish with Cambridge University Press.

It may take a while but other publishers may finally notice.

TeX Live 2014 released…

Thursday, June 19th, 2014

TeX Live 2014 released – what’s new by Stefan Kottwitz.

Just enough to get you interested:

• TeX and MetaFont updates
• pdfTeX with “fake spaces”
• LuaTeX, engine that can reside in CPU cache
• numerous other changes and improvements

Stefan covers these and more, while pointing you to the documentation for more details.

Has anyone calculated how many decades TeX/LaTeX are ahead of the average word processor?

Just curious.

GitBook:…

Tuesday, June 3rd, 2014

GitBook: Write Books using Markdown on OpenShift by Marek Jelen.

From the post:

GitBook is a tool for using Markdown to write books, which are converted to dynamic websites or exported to static formats like PDF. GitBook also integrates with Git and GitHub, adding a social element to the book creation process.

If you are exporting your book into an HTML page, interactive aspects are also embedable. At the time of this writing, the system provides support for quizzes and JavaScript exercises. However, the tool is fully open source and written using Node.js, so you are free to extend the functionality to meet your needs.

The Gitbook Learn Javascript is used as an example of production with GitBook.

It’s readable but in terms of the publishing craft, the Mikraot Gedolot or The Art of Computer Programming (TAOCP), it’s not.

Still, it may be useful for one-off exports from topic maps and other data sources.

Tuesday, May 20th, 2014

From the webpage:

Madagascar is an open-source software package for multidimensional data analysis and reproducible computational experiments. Its mission is to provide

• a convenient and powerful environment
• a convenient technology transfer tool

for researchers working with digital image and data processing in geophysics and related fields. Technology developed using the Madagascar project management system is transferred in the form of recorded processing histories, which become “computational recipes” to be verified, exchanged, and modified by users of the system.

Interesting tool for “reproducible documents” and data analysis.

The file format, Regularly Sampled Format (RSF) sounds interesting:

For data, Madagascar uses the Regularly Sampled Format (RSF), which is based on the concept of hypercubes (n-D arrays, or regularly sampled functions of several variables), much like the SEPlib (its closest relative), DDS, or the regularly-sampled version of the Javaseis format (SVF). Up to 9 dimensions are supported. For 1D it is conceptually analogous to a time series, for 2D to a raster image, and for 3D to a voxel volume. The format (actually a metaformat) makes use of a ASCII file with metadata (information about the data), including a pointer (in= parameter) to the location of the file with the actual data values. Irregularly sampled data are currently handled as a pair of datasets, one containing data and the second containing the corresponding irregular geometry information. Programs for conversion to and from other formats such as SEG-Y and SU are provided. (From Package Overview)

In case you are interested SEG-Y and SU (Seismic Unix data format) are both formats for geophysical data.

I first saw this in a tweet by Scientific Python.

Thanks for Unguling

Sunday, May 4th, 2014

Thanks-for-Ungluing launches!

From the post:

Great books deserve to be read by all of us, and we ought to be supporting the people who create these books. “Thanks for Ungluing” gives readers, authors, libraries and publishers a new way to build, sustain, and nourish the books we love.

We have some amazing creators participating in this launch.

An attempt to address the problem of open access to published materials while at the same time compensating authors for their efforts.

There is some recent material and old standbys like The Communist Manifesto by Karl Marx and Friedrich Engels. Which is good but having more recent works such as A Theology of Liberation by Gustavo Gutiérrez would be better.

If you are thinking about writing a book on CS topics, please think about “Thanks for Ungluing” as an option.

I first saw this in a tweet by Tim O’Reilly.

Innovations in peer review:…

Tuesday, April 22nd, 2014

Innovations in peer review: join a discussion with our Editors by Shreeya Nanda.

From the post:

Innovation may not be an adjective often associated with peer review, indeed commentators have claimed that peer review slows innovation and creativity in science. Preconceptions aside, publishers are attempting to shake things up a little, with various innovations in peer review, and these are the focus of a panel discussion at BioMed Central’s Editors’ Conference on Wednesday 23 April in Doha, Qatar. This follows our spirited discussion at the Experimental Biology conference in Boston last year.

The discussion last year focussed on the limitations of the traditional peer review model (you can see a video here). This year we want to talk about innovations in the field and the ways in which the limitations are being addressed. Specifically, we will focus on open peer review, portable peer review – in which we help authors transfer their manuscript, often with reviewers’ reports, to a more appropriate journal – and decoupled peer review, which is undertaken by a company or organisation independent of, or on contract from, a journal.

We will be live tweeting from the session at 11.15am local time (9.15am BST), so if you want to join the discussion or put questions to our panellists, please follow #BMCEds14. If you want to brush up on any or all of the models that we’ll be discussing, have a look at some of the content from around BioMed Central’s journals, blogs and Biome below:

This post includes pointers to a number of useful resources concerning the debate around peer review.

But there are oddities as well. First, the claim that peer review “slows innovation and creativity in science,” considering recent reports that peer review is no better than random chance for grants (…lotteries to pick NIH research-grant recipients and the not infrequent reports of false papers, fraud in actual papers, and a general inability to replicate research described in papers (Reproducible Research/(Mapping?)).

A claim doesn’t have to appear on the alt.fringe.peer.review newsgroup (imaginary newsgroup) in order to be questionable on its face.

Secondly, despite the invitation to follow and participate on Twitter, holding the meeting in Qartar means potential attendees from the United States will have to rise at:

Eastern 4:15 AM (last year’s location)

Central 3:15 AM

Mountain 2:15 AM

Western 1:15 AM

I wonder what the participation levels will be from Boston last year as compared to Qatar this year?

Nothing against non-United States locations but non-junket locations, such as major educational/research hubs, should be the sites for such meetings.

…Textbooks for $0 [Digital Illiterates?] Thursday, January 23rd, 2014 OpenStax College Textbooks for$0

From the about page:

OpenStax College is a nonprofit organization committed to improving student access to quality learning materials. Our free textbooks are developed and peer-reviewed by educators to ensure they are readable, accurate, and meet the scope and sequence requirements of your course. Through our partnerships with companies and foundations committed to reducing costs for students, OpenStax College is working to improve access to higher education for all.

OpenStax College is an initiative of Rice University and is made possible through the generous support of several philanthropic foundations. …

Available now:

• Anatomy and Physiology
• Biology
• College Physics
• Concepts of Biology
• Introduction to Sociology
• Introductory Statistics

Coming soon:

• Chemistry
• Precalculus
• Principles of Economics
• Principles of Macroeconomics
• Principles of Microeconomics
• Psychology
• U.S. History

Check to see if I missed any present or forthcoming texts on data science. No, I didn’t see any either.

I looked at the Introduction to Sociology, which has a chapter on research methods, but no opportunity for students to experience data methods. Such as Statwing’s coverage of the General Social Survey (GSS), which I covered in Social Science Dataset Prize!

Data science should not be an aside or extra course any more than language literacy is a requirement for an education.

Consider writing or suggesting edits to subject textbooks to incorporate data science. Solely data science books will be necessary as well, just like there are advanced courses in English Literature.

Let’s not graduate digital illiterates. For their sake and ours.

I first saw this in a tweet by Michael Peter Edson.

Composable languages for bioinformatics: the NYoSh experiment

Wednesday, January 22nd, 2014

Composable languages for bioinformatics: the NYoSh experiment by Manuele Simi, Fabien Campagne​. (Simi M, Campagne F. (2014) Composable languages for bioinformatics: the NYoSh experiment. PeerJ 2:e241 http://dx.doi.org/10.7717/peerj.241)

Abstract:

Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org.

In the discussion section of the paper the authors concede:

We expect that widespread use of LWB will result in a multiplication of small languages, but in a manner that will increase language reuse and interoperability, rather than in the historical language fragmentation that has been observed with traditional language technology.

Whenever I hear projections about the development of languages I am reminded the inventors of “SCSI” thought it should be pronounced “sexy,” whereas others preferred “scuzzi.” Doesn’t have the same ring to it does it?

I am all in favor of domain specific languages (DSLs), but at the same time, am mindful that undocumented languages are in danger of becoming “dead” languages.