Archive for the ‘Publishing’ Category

Free & Interactive Online Introduction to LaTeX

Thursday, July 28th, 2016

Free & Interactive Online Introduction to LaTeX by John Lees-Miller.

From the webpage:

Part 1: The Basics

Welcome to the first part of our free online course to help you learn LaTeX. If you have never used LaTeX before, or if it has been a while and you would like a refresher, this is the place to start. This course will get you writing LaTeX right away with interactive exercises that can be completed online, so you don’t have to download and install LaTeX on your own computer.

In this part of the course, we’ll take you through the basics of how LaTeX works, explain how to get started, and go through lots of examples. Core LaTeX concepts, such as commands, environments, and packages, are introduced as they arise. In particular, we’ll cover:

  • Setting up a LaTeX Document
  • Typesetting Text
  • Handling LaTeX Errors
  • Typesetting Equations
  • Using LaTeX Packages

In part two and part three, we’ll build up to writing beautiful structured documents with figures, tables and automatic bibliographies, and then show you how to apply the same skills to make professional presentations with beamer and advanced drawings with TikZ. Let’s get started!

Since I mentioned fonts earlier today, Learning a Manifold of Fonts, it seems only fair to post about the only typesetting language that can take full advantage of any font you care to use.

TeX was released in 1978 and it has yet to be equaled by any non-TeX/LaTeX system.

It’s almost forty (40) years old, widely used and still sui generis.

Web Design in 4 minutes

Thursday, July 28th, 2016

Web Design in 4 minutes by Jeremy Thomas.

From the post:

Let’s say you have a product, a portfolio, or just an idea you want to share with everyone on your own website. Before you publish it on the internet, you want to make it look attractive, professional, or at least decent to look at.

What is the first thing you need to work on?

This is more for me than you, especially if you consider my much neglected homepage.

Over the years my blog has consumed far more of my attention than my website.

I have some new, longer material that is more appropriate for the website so this post is a reminder to me to get my act together over there!

Other web design resource suggestions welcome!

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask)

Tuesday, July 5th, 2016

Everything You Wanted to Know about Book Sales (But Were Afraid to Ask) by Lincoln Michel.

From the post:

Publishing is the business of creating books and selling them to readers. And yet, for some reason we aren’t supposed to talk about the latter.

Most literary writers consider book sales a half-crass / half-mythological subject that is taboo to discuss.
While authors avoid the topic, every now and then the media brings up book sales — normally to either proclaim, yet again, the death of the novel, or to make sweeping generalizations about the attention spans of different generations. But even then, the data we are given is almost completely useless for anyone interested in fiction and literature. Earlier this year, there was a round of excited editorials about how print is back, baby after industry reports showed print sales increasing for the second consecutive year. However, the growth was driven almost entirely by non-fiction sales… more specifically adult coloring books and YouTube celebrity memoirs. As great as adult coloring books may be, their sales figures tell us nothing about the sales of, say, literary fiction.

Lincoln’s account mirrors my experience (twice) with a small press decades ago.

While you (rightfully) think that every sane person on the planet will forego the rent in order to purchase your book, sadly your publisher is very unlikely to share that view.

One of the comments to this post reads:

…Writing is a calling but publishing is a business.

Quite so.

Don’t be discouraged by this account but do allow it to influence your expectations, at least about the economic rewards of publishing.

Just in case I get hit with the publishing bug again, good luck to us all!

Developing Expert p-Hacking Skills

Saturday, July 2nd, 2016

Introducing the p-hacker app: Train your expert p-hacking skills by Ned Bicare.

Ned’s p-hacker app will be welcomed by everyone who publishes where p-values are accepted.

Publishers should mandate authors and reviewers to submit six p-hacker app results along with any draft that contains, or is a review of, p-values.

The p-hacker app results won’t improve a draft and/or review, but when compared to the draft, will improve the publication in which it might have appeared.

From the post:

My dear fellow scientists!

“If you torture the data long enough, it will confess.”

This aphorism, attributed to Ronald Coase, sometimes has been used in a disrespective manner, as if it was wrong to do creative data analysis.

In fact, the art of creative data analysis has experienced despicable attacks over the last years. A small but annoyingly persistent group of second-stringers tries to denigrate our scientific achievements. They drag psychological science through the mire.

These people propagate stupid method repetitions; and what was once one of the supreme disciplines of scientific investigation – a creative data analysis of a data set – has been crippled to conducting an empty-headed step-by-step pre-registered analysis plan. (Come on: If I lay out the full analysis plan in a pre-registration, even an undergrad student can do the final analysis, right? Is that really the high-level scientific work we were trained for so hard?).

They broadcast in an annoying frequency that p-hacking leads to more significant results, and that researcher who use p-hacking have higher chances of getting things published.

What are the consequence of these findings? The answer is clear. Everybody should be equipped with these powerful tools of research enhancement!

The art of creative data analysis

Some researchers describe a performance-oriented data analysis as “data-dependent analysis”. We go one step further, and call this technique data-optimal analysis (DOA), as our goal is to produce the optimal, most significant outcome from a data set.

I developed an online app that allows to practice creative data analysis and how to polish your p-values. It’s primarily aimed at young researchers who do not have our level of expertise yet, but I guess even old hands might learn one or two new tricks! It’s called “The p-hacker” (please note that ‘hacker’ is meant in a very positive way here. You should think of the cool hackers who fight for world peace). You can use the app in teaching, or to practice p-hacking yourself.

Please test the app, and give me feedback! You can also send it to colleagues: http://shinyapps.org/apps/p-hacker.

Enjoy!

TUGBoat – The Complete Set

Thursday, June 30th, 2016

Norm Walsh tweeted an offer of circa 1990 issues of TUGBoat for free to a good home today (30 June 2016).

On the off chance that you, like me, have only a partial set, consider the full set, TUGBoat Contents, 1980 1:1 to date.

From the TUGBoat homepage:

The TUGboat journal is a unique benefit of joining TUG. It is currently published three times a year and distributed to all TUG members (for that year). Anyone can also buy copies from the TUG store.

We post articles online after about one year for the benefit of the entire TeX community, but TUGboat is funded by member support. So please consider joining TUG if you find TUGboat useful.

TUGboat publishes the proceedings of the TUG Annual Meetings, and sometimes other conferences. A list of other publications by TUG, and by other user groups is available.

This is an opportunity to support the TeX Users Group (TUG) without looking for a future home for your printed copies of TUGBoat. Donate to TUG and read online!

Enjoy!

The No-Value-Add Of Academic Publishers And Peer Review

Tuesday, June 21st, 2016

Comparing Published Scientific Journal Articles to Their Pre-print Versions by Martin Klein, Peter Broadwell, Sharon E. Farb, Todd Grappone.

Abstract:

Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: U.S. academic libraries paid $1.7 billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers and their final published counterparts. This comparison had two working assumptions: 1) if the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and 2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.

The authors have performed a very detailed analysis of pre-prints, 90% – 95% of which are published as open pre-prints first, to conclude there is no appreciable difference between the pre-prints and the final published versions.

I take “…no appreciable difference…” to mean academic publishers and the peer review process, despite claims to the contrary, contribute little or no value to academic publications.

How’s that for a bargaining chip in negotiating subscription prices?

Where Has Sci-Hub Gone?

Saturday, June 18th, 2016

While I was writing about the latest EC idiocy (link tax), I was reminded of Sci-Hub.

Just checking to see if it was still alive, I tried http://sci-hub.io/.

404 by standard DNS service.

If you are having the same problem, Mike Masnick reports in Sci-Hub, The Repository Of ‘Infringing’ Academic Papers Now Available Via Telegram, you can access Sci-Hub via:

I’m not on Telegram, yet, but that may be changing soon. ;-)

BTW, while writing this update, I stumbled across: The New Napster: How Sci-Hub is Blowing Up the Academic Publishing Industry by Jason Shen.

From the post:


This is obviously piracy. And Elsevier, one of the largest academic journal publishers, is furious. In 2015, the company earned $1.1 billion in profits on $2.9 billion in revenue [2] and Sci-hub directly attacks their primary business model: subscription service it sells to academic organizations who pay to get access to its journal articles. Elsevier filed a lawsuit against Sci-Hub in 2015, claiming Sci-hub is causing irreparable injury to the organization and its publishing partners.

But while Elsevier sees Sci-Hub as a major threat, for many scientists and researchers, the site is a gift from the heavens, because they feel unfairly gouged by the pricing of academic publishing. Elsevier is able to boast a lucrative 37% profit margin because of the unusual (and many might call exploitative) business model of academic publishing:

  • Scientists and academics submit their research findings to the most prestigious journal they can hope to land in, without getting any pay.
  • The journal asks leading experts in that field to review papers for quality (this is called peer-review and these experts usually aren’t paid)
  • Finally, the journal turns around and sells access to these articles back to scientists/academics via the organization-wide subscriptions at the academic institution where they work or study

There’s piracy afoot, of that I have no doubt.

Elsevier:

  • Relies on research it does not sponsor
  • Research results are submitted to it for free
  • Research is reviewed for free
  • Research is published in journals of value only because of the free contributions to them
  • Elsevier makes a 37% profit off of that free content

There is piracy but Jason fails to point to Elsevier as the pirate.

Sci-Hub/Alexandra Elbakyan is re-distributing intellectual property that was stolen by Elsevier from the academic community, for its own gain.

It’s time to bring Elsevier’s reign of terror against the academic community to an end. Support Sci-Hub in any way possible.

The Symptom of Many Formats

Monday, June 13th, 2016

Distro.Mic: An Open Source Service for Creating Instant Articles, Google AMP and Apple News Articles

From the post:

Mic is always on the lookout for new ways to reach our audience. When Facebook, Google and Apple announced their own native news experiences, we jumped at the opportunity to publish there.

While setting Mic up on these services, David Björklund realized we needed a common article format that we could use for generating content on any platform. We call this format article-json, and we open-sourced parsers for it.

Article-json got a lot of support from Google and Apple, so we decided to take it a step further. Enter DistroMic. Distro lets anyone transform an HTML article into the format mandated by one of the various platforms.

Sigh.

While I applaud the DistroMic work, I am saddened that it was necessary.

From the DistroMic page, here is the same article in three formats:

Apple:

{
“article”: [
{
“text”: “Astronomers just announced the universe might be expanding up to 9% faster than we thought.\n”,
“additions”: [
{
“type”: “link”,
“rangeStart”: 59,
“rangeLength”: 8,
“URL”: “http://hubblesite.org/newscenter/archive/releases/2016/17/text/”
}
],
“inlineTextStyles”: [
{
“rangeStart”: 59,
“rangeLength”: 8,
“textStyle”: “bodyLinkTextStyle”
}
],
“role”: “body”,
“layout”: “bodyLayout”
},
{
“text”: “It’s a surprising insight that could put us one step closer to finally figuring out what the hell dark energy and dark matter are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.\n”,
“additions”: [
{
“type”: “link”,
“rangeStart”: 98,
“rangeLength”: 28,
“URL”: “http://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy/”
}
],
“inlineTextStyles”: [
{
“rangeStart”: 98,
“rangeLength”: 28,
“textStyle”: “bodyLinkTextStyle”
}
],
“role”: “body”,
“layout”: “bodyLayout”
},
{
“role”: “container”,
“components”: [
{
“role”: “photo”,
“URL”: “bundle://image-0.jpg”,
“style”: “embedMediaStyle”,
“layout”: “embedMediaLayout”,
“caption”: {
“text”: “Source: \n NASA\n \n”,
“additions”: [
{
“type”: “link”,
“rangeStart”: 13,
“rangeLength”: 4,
“URL”: “http://www.nasa.gov/mission_pages/hubble/hst_young_galaxies_200604.html”
}
],
“inlineTextStyles”: [
{
“rangeStart”: 13,
“rangeLength”: 4,
“textStyle”: “embedCaptionTextStyle”
}
],
“textStyle”: “embedCaptionTextStyle”
}
}
],
“layout”: “embedLayout”,
“style”: “embedStyle”
}
],
“bundlesToUrls”: {
“image-0.jpg”: “http://bit.ly/1UFHdpf”
}
}

Facebook:

<article>
<p>Astronomers just announced the universe might be expanding
<a href=”http://hubblesite.org/newscenter/archive/releases/2016/17/text/”>up to 9%</a> faster than we thought.</p>
<p>It’s a surprising insight that could put us one step closer to finally figuring out what the hell
<a href=”http://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy/”>
dark energy and dark matter</a> are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.</p>
<figure data-feedback=”fb:likes,fb:comments”>
<img src=”http://bit.ly/1UFHdpf”></img>
<figcaption><cite>
Source: <a href=”http://www.nasa.gov/mission_pages/hubble/hst_young_
galaxies_200604.html”>NASA</a>
</cite></figcaption>
</figure>
</article>

Google:

<article>
<p>Astronomers just announced the universe might be expanding
<a href=”http://hubblesite.org/newscenter/archive/releases/2016/17/text/”>up to 9%</a> faster than we thought.</p> <p>It’s a surprising insight that could put us one step closer to finally figuring out what the hell
<a href=”http://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy/”> dark energy and dark matter</a> are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.</p>
<figure>
<amp-img width=”900″ height=”445″ layout=”responsive” src=”http://bit.ly/1UFHdpf”></amp-img>
<figcaption>Source:
<a href=”http://www.nasa.gov/mission_pages/hubble/hst_young_
galaxies_200604.html”>NASA</a>
</figcaption>
</figure>
</article>

All starting from the same HTML source:

<p>Astronomers just announced the universe might be expanding
<a href=”http://hubblesite.org/newscenter/archive/releases/2016/17/text/”>up to 9%</a> faster than we thought.</p><p>It’s a surprising insight that could put us one step closer to finally figuring out what the hell
<a href=”http://science.nasa.gov/astrophysics/focus-areas/what-is-dark-energy/”>
dark energy and dark matter</a> are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.</p>
<figure>
<img width=”900″ height=”445″ src=”http://bit.ly/1UFHdpf”>
<figcaption>Source: 
<a href=”http://www.nasa.gov/mission_pages/hubble/hst_young_
galaxies_200604.html”>NASA</a>
</figcaption>
</figure>

Three workflows based on what started life in one common format.

Three workflows that have their own bugs and vulnerabilities.

Three workflows that duplicate the capabilities of each other.

Three formats that require different indexing/searching.

This is not the cause of why we can’t have nice things in software, but it certainly is a symptom.

The next time someone proposes a new format for a project, challenge them to demonstrate a value-add over existing formats.

Newspaper Publishers Protecting Consumers (What?)

Friday, June 3rd, 2016

Newspaper industry asks FTC to investigate “deceptive” adblockers by John Zorabedian.

From the post:

Fearing that online publishers may be on the losing side of their battle with commercial adblockers, the newspaper publishing industry is now seeking relief from the US government.

The Newspaper Association of America (NAA), an industry group representing 2000 newspapers, filed a complaint with the US Federal Trade Commission (FTC) asking the consumer watchdog to investigate adblocker companies’ “deceptive” and “unlawful” practices.

The NAA is not alleging that adblockers themselves are illegal – rather, it says that adblocker companies make misleading claims about their products, a violation of the Federal Trade Commission Act.

Do you feel safer knowing the Newspaper Association of America (NAA) is protecting you from deceptive ads by adblocker companies?

A better service would be to protect consumers from deceptive ads in their publications but I suppose that would be a conflict of interest.

The best result would be for the FTC to declare you can display (or not) content received on your computer any way you like.

You cannot, of course, re-transmit that content, but if a user chooses to combine your content with that of another site, that is entirely on their watch.

Ad-blocking, transformation of lawfully delivered content, including merging of content, are rights that every user should enjoy.

Help Defend MuckRock And Your Right To Know!

Wednesday, May 25th, 2016

A multinational demands to know who reads MuckRock and is suing to stop us from posting records about them by Michael Morisy.

Michael captures everything you need to know in his first paragraph:

A multinational owned by Toshiba is demanding MuckRock remove documents about them received under a public records act request, destroy any copies we have, and help identify MuckRock readers who saw them.

After skimming the petition and the two posted documents (Landis+Gyr Managed Services Report 2015 Final and Req 9_Security Overview), I feel like the man who remarked to George Bailey in It’s A Wonderful Life, “…you must mean two other trees,” taking George for being drunk. ;-)

As far as I can tell, the posted documents contain no pricing information, no contact details, etc.

Do you disagree?

There are judges who insist that pleadings have some relationship to facts. Let’s hope that MuckRock draws one of those.

Do you wonder what other local governments are involved with Landis+Gyr?

There is a simple starting point: Landis+Gyr.

Overlay Journal – Discrete Analysis

Saturday, March 5th, 2016

The arXiv overlay journal Discrete Analysis has launched by Christian Lawson-Perfect.

From the post:

Discrete Analysis, a new open-access journal for articles which are “analytical in flavour but that also have an impact on the study of discrete structures”, launched this week. What’s interesting about it is that it’s an arXiv overlay journal founded by, among others, Timothy Gowers.

What that means is that you don’t get articles from Discrete Analysis – it just arranges peer review of papers held on the arXiv, cutting out almost all of the expensive parts of traditional journal publishing. I wasn’t really prepared for how shallow that makes the journal’s website – there’s a front page, and when you click on an article you’re shown a brief editorial comment with a link to the corresponding arXiv page, and that’s it.

But that’s all it needs to do – the opinion of Gowers and co. is that the only real value that journals add to the papers they publish is the seal of approval gained by peer review, so that’s the only thing they’re doing. Maths papers tend not to benefit from the typesetting services traditional publishers provide (or, more often than you’d like, are actively hampered by it).

One way the journal is adding value beyond a “yes, this is worth adding to the list of papers we approve of” is by providing an “editorial introduction” to accompany each article. These are brief notes, written by members of the editorial board, which introduce the topics discussed in the paper and provide some context, to help you decide if you want to read the paper. That’s a good idea, and it makes browsing through the articles – and this is something unheard of on the internet – quite pleasurable.

It’s not difficult to imagine “editorial introductions” with underlying mini-topic maps that could be explored on their own or that as you reach the “edge” of a particular topic map, it “unfolds” to reveal more associations/topics.

Not unlike a traditional street map for New York which you can unfold to find general areas but can then fold it up to focus more tightly on a particular area.

I hesitate to say “zoom” because in the application I have seen (important qualification), “zoom” uniformly reduces your field of view.

A more nuanced notion of “zoom,” for a topic map and perhaps for other maps as well, would be to hold portions of the current view stationary, say a starting point on an interstate highway and to “zoom” only a portion of the current view to show a detailed street map. That would enable the user to see a particular location while maintaining its larger context.

Pointers to applications that “zoom” but also maintain different levels of “zoom” in the same view? Given the fascination with “hairy” presentations of graphs that would have to be real winner.

Overlay Journals – Community-Based Peer Review?

Friday, February 12th, 2016

New Journals Piggyback on arXiv by Emily Conover.

From the post:

A non-traditional style of scientific publishing is gaining ground, with new journals popping up in recent months. The journals piggyback on the arXiv or other scientific repositories and apply peer review. A link to the accepted paper on the journal’s website sends readers to the paper on the repository.

Proponents hope to provide inexpensive open access publication and streamline the peer review process. To save money, such “overlay” journals typically do away with some of the services traditional publishers provide, for example typesetting and copyediting.

Not everyone is convinced. Questions remain about the scalability of overlay journals, and whether they will catch on — or whether scientists will demand the stamp of approval (and accompanying prestige) that the established, traditional journals provide.

The idea is by no means new — proposals for journals interfacing with online archives appeared as far back as the 1990s, and a few such journals are established in mathematics and computer science. But now, say proponents, it’s an idea whose time has come.

The newest such journal is the Open Journal of Astrophysics, which began accepting submissions on December 22. Editor in Chief Peter Coles of the University of Sussex says the idea came to him several years ago in a meeting about the cost of open access journals. “They were talking about charging thousands of pounds for making articles open access,” Coles says, and he thought, “I never consult journals now; I get all my papers from the arXiv.” By adding a front end onto arXiv to provide peer review, Coles says, “We can dispense with the whole paraphernalia with traditional journals.”

Authors first submit their papers to arXiv, and then input the appropriate arXiv ID on the journal’s website to indicate that they would like their paper reviewed. The journal follows a standard peer review process, with anonymous referees whose comments remain private.

When an article is accepted, a link appears on the journal’s website and the article is issued a digital object identifier (DOI). The entire process is free for authors and readers. As APS News went to press, Coles hoped to publish the first batch of half-dozen papers at the end of January.

My Archive for the ‘Peer Review’ Category has only a few of the high profile failures of peer review over the last five years.

You are probably familiar with at least twice as many reports as I have reported in this blog on the brokenness of peer review.

If traditional peer review is a known failure, why replicate it even for overlay journals?

Why not ask the full set of peers in a discipline? That is the readers of articles posted in public repositories?

If a book/journal article goes uncited, isn’t that evidence that it:

Did NOT advance the discipline in a way meaningful to their peers?

What other evidence would you have that it did advance the discipline? The opinions of friends of the editor? That seems too weak to even suggest.

Citation analysis isn’t free from issues, Are 90% of academic papers really never cited? Searching citations about academic citations reveals the good, the bad and the ugly, but it has the advantage of drawing on the entire pool of talent that comprises a discipline.

Moreover, peer review would not be limited to a one time judgment of traditional peer reviewers but on the basis of how a monograph or article fits into the intellectual development of the discipline as a whole.

Which is more persuasive: That editors and reviewers at Science or Nature accept a paper or that in the ten years following publication, an article is cited by every other major study in the field?

Citation analysis obviates the overhead costs that are raised about organizing peer review on a massive scale. Why organize peer review at all?

Peers are going to read and cite good literature and more likely than not, skip the bad. Unless you need to create positions for gate keepers and other barnacles on the profession, opt for citation based peer review based on open repositories.

I’m betting on the communities that silently vet papers and books in spite of the formalized and highly suspect mechanisms for peer review.

Overlay journals could publish preliminary lists of articles that are of interest in particular disciplines and as community-based peer review progresses, they can publish “best of…” series as the community further filters the publications.

Community-based peer review is already operating in your discipline. Why not call it out and benefit from it?

Sci-Hub Tip: Converting Paywall DOIs to Public Access

Thursday, February 11th, 2016

In a tweet Jon Tenn@nt points out that:

Reminder: add “.sci-hub.io” after the .com in the URL of pretty much any paywalled paper to gain instant free access.

BTW, I tested Jon’s advice with:

http://dx.doi.org/10.****/*******

re-cast as:

http://dx.doi.org.sci-hub.io/10.****/*******

And it works!

With a little scripting, you can convert your paywall DOIs into public access with sci-hub.io.

This “worked for me” so if you encounter issues, please ping me so I can update this post.

Happy reading!

First Pirate – Sci-Hub?

Wednesday, February 10th, 2016

Sci-Hub romanticizes itself as:

Sci-Hub the first pirate website in the world to provide mass and public access to tens of millions of research papers. (from the about page)

I agree with:

…mass and public access to tens of millions of research papers

But Sci-Hub is hardly:

…the first pirate website in the world

I don’t remember the first gate-keeping publisher that went from stealing from the public in print to stealing from the public online.

With careful enough research I’m sure we could track that down but I’m not sure it matters at this point.

What we do know is that academic research is funded by the public, edited and reviewed by volunteers (to the extent it is reviewed at all), and then kept from the vast bulk of humanity for profit and status (gate-keeping).

It’s heady stuff to think of yourself as a bold and swashbuckling pirate, going to stick it “…to the man.”

However, gate-keeping publishers have developed stealing from the public to an art form. If you don’t believe me, take a brief look at the provisions in the Trans-Pacific Partnership that protect traditional publisher interests.

Recovering what has been stolen from the public isn’t theft at all, its restoration!

Use Sci-Hub, support Sci-Hub, spread the word about Sci-Hub.

Allow gate-keeping publishers to slowly, hopefully painfully, wither as opportunities for exploiting the public grow fewer and farther in between.

PS: You need to read: Meet the Robin Hood of Science by Simon Oxenham to get the full background on Sci-Hub and an extraordinary person, Alexandra Elbakyan.

JATS: Journal Article Tag Suite, Navigation Update!

Monday, January 11th, 2016

I posted about the appearance of JATS: Journal Article Tag Suite, version 1.1 and then began to lazily browse the pdf.

I forget what I was looking for now but I noticed the table of contents jumped from page 42 to page 235, and again from 272 to to 405. I’m thinking by this point “this is going to be a bear to find elements/attributes in.” I looked for an index only to find none. :-(

But, there’s hope!

If you look at Chapter 7 “TAG Suite Components,” elements start on page 7 and attributes on page 28, you will find:

JATS-nav

Each ✔ is a navigation link to that element (or attribute if you are in the attribute section) under each of those divisions, Archiving, Publishing, Authoring.

Very cool but falls under “non-obvious” for me.

Pass it on so others can safely and quickly navigate JATS 1.1!

PS: It was Tommie Usdin of Balisage fame who pointed out the table in chapter 7 to me. Thanks Tommie!

JATS: Journal Article Tag Suite, version 1.1

Friday, January 8th, 2016

JATS: Journal Article Tag Suite, version 1.1

Abstract:

The Journal Article Tag Suite provides a common XML format in which publishers and archives can exchange journal content. The JATS provides a set of XML elements and attributes for describing the textual and graphical content of journal articles as well as some non-article material such as letters, editorials, and book and product reviews.

Documentation and help files: Journal Article Tag Suite.

Tommie Usdin (of Balisage fame) posted to Facebook:

JATS has added capabilities to encode:
– NISO Access License and Indicators
– additional support for multiple language documents and for Japanese documents (including Ruby)
– citation of datasets
and some other things users of version 1.0 have requested.

Another XML vocabulary that provides grist for your XQuery adventures!

What is Scholarly HTML?

Saturday, October 31st, 2015

What is Scholarly HTML? by Robin Berjon and Sébastien Ballesteros.

Abstract:

Scholarly HTML is a domain-specific data format built entirely on open standards that enables the interoperable exchange of scholarly articles in a manner that is compatible with off-the-shelf browsers. This document describes how Scholarly HTML works and how it is encoded as a document. It is, itself, written in Scholarly HTML.

The abstract is accurate enough but the “Motivation” section provides a better sense of this project:

Scholarly articles are still primarily encoded as unstructured graphics formats in which most of the information initially created by research, or even just in the text, is lost. This was an acceptable, if deplorable, condition when viable alternatives did not seem possible, but document technology has today reached a level of maturity and universality that makes this situation no longer tenable. Information cannot be disseminated if it is destroyed before even having left its creator’s laptop.

According to the New York Times, adding structured information to their recipes (instead of exposing simply as plain text) improved their discoverability to the point of producing an immediate rise of 52 percent in traffic (NYT, 2014). At this point in time, cupcake recipes are reaping greater benefits from modern data format practices than the whole scientific endeavour.

This is not solely a loss for the high principles of knowledge sharing in science, it also has very immediate pragmatic consequences. Any tool, any service that tries to integrate with scholarly publishing has to spend the brunt of its complexity (or budget) extracting data the author would have willingly shared out of antiquated formats. This places stringent limits on the improvement of the scholarly toolbox, on the discoverability of scientific knowledge, and particularly on processes of meta-analysis.

To address these issues, we have followed an approach rooted in established best practices for the reuse of open, standard formats. The «HTML Vernacular» body of practice provides guidelines for the creation of domain-specific data formats that make use of HTML’s inherent extensibility (Science.AI, 2015b). Using the vernacular foundation overlaid with «schema.org» metadata we have produced a format for the interchange of scholarly articles built on open standards, ready for all to use.

Our high-level goals were:

  • Uncompromisingly enabling structured metadata, accessibility, and internationalisation.
  • Pragmatically working in Web browsers, even if it occasionally incurs some markup overhead.
  • Powerfully customisable for inclusion in arbitrary Web sites, while remaining easy to process and interoperable.
  • Entirely built on top of open, royalty-free standards.
  • Long-term viability as a data format.

Additionally, in view of the specific problem we addressed, in the creation of this vernacular we have favoured the reliability of interchange over ease of authoring; but have nevertheless attempted to cater to the latter as much as possible. A decent boilerplate template file can certainly make authoring relatively simple, but not as radically simple as it can be. For such use cases, Scholarly HTML provides a great output target and overview of the data model required to support scholarly publishing at the document level.

An example of an authoring format that was designed to target Scholarly HTML as an output is the DOCX Standard Scientific Style which enables authors who are comfortable with Microsoft Word to author documents that have a direct upgrade path to semantic, standard content.

Where semantic modelling is concerned, our approach is to stick as much as possible to schema.org. Beyond the obvious advantages there are in reusing a vocabulary that is supported by all the major search engines and is actively being developed towards enabling a shared understanding of many useful concepts, it also provides a protection against «ontological drift» whereby a new vocabulary is defined by a small group with insufficient input from a broader community of practice. A language that solely a single participant understands is of limited value.

In a small, circumscribed number of cases we have had to depart from schema.org, using the https://ns.science.ai/ (prefixed with sa:) vocabulary instead (Science.AI, 2015a). Our goal is to work with schema.org in order to extend their vocabulary, and we will align our usage with the outcome of these discussions.

I especially enjoyed the observation:

According to the New York Times, adding structured information to their recipes (instead of exposing simply as plain text) improved their discoverability to the point of producing an immediate rise of 52 percent in traffic (NYT, 2014). At this point in time, cupcake recipes are reaping greater benefits from modern data format practices than the whole scientific endeavour.

I don’t doubt the truth of that story but after all, a large number of people are interested in baking cupcakes. Not more than three in many cases, are interested in reading any particular academic paper.

The use of schema.org will provide advantages for common concepts but to be truly useful for scholarly writing, it will require serious extension.

Take for example my post yesterday Deep Feature Synthesis:… [Replacing Human Intuition?, Calling Bull Shit]. What microdata from schema.org would help readers find Propositionalisation and Aggregates, 2001, which describes substantially the same technique, without claims of surpassing human intuition? (Uncited by the authors the paper on deep feature synthesis.)

Or the 161 papers on propositionalisation that you can find at CiteSeer?

A crude classification that can be used by search engines is very useful but falls far short of the mark in terms of finding and retrieving scholarly writing.

Semantic uniformity for classifying scholarly content hasn’t been reached by scholars or librarians despite centuries of effort. Rather than taking up that Sisyphean task, let’s map across the ever increasing universe of semantic diversity.

The Future Of News Is Not An Article

Wednesday, October 21st, 2015

The Future Of News Is Not An Article by Alexis Lloyd.

Alexis challenges readers to reconsider their assumptions about the nature of “articles.” Beginning with the model for articles that was taken over from traditional print media. Whatever appeared in an article yesterday must be re-created today if there is a new article on the same subject. Not surprising since print media lacks the means to transclude content from a prior article into a new one.

She saves her best argument for last:


A news organization publishes hundreds of articles a day, then starts all over the next day, recreating any redundant content each time. This approach is deeply shaped by the constraints of print media and seems unnecessary and strange when looked at from a natively digital perspective. Can you imagine if, every time something new happened in Syria, Wikipedia published a new Syria page, and in order to understand the bigger picture, you had to manually sift through hundreds of pages with overlapping information? The idea seems absurd in that context and yet, it is essentially what news publishers do every day.

While I agree fully with the advantages Alexis summarizes as Enhanced tools for journalists, Summarization and synthesis, and Adaptive Content (see her post), there are technical and non-technical roadblocks to such changes.

First and foremost, people are being paid to re-create redundant content everyday and their comfort levels, to say nothing about their remuneration for repetitive reporting of the same content will loom large in the adoption of the technology Alexis imagines.

I recall a disturbing story from a major paper where reporters didn’t share leads or research because of fear that other reporters would “scoop” them. That sort of protectionism isn’t limited to journalists. Rumor has it that Oracle sale reps refused to enter potential sales leads in a company wide database.

I don’t understand why that sort of pettiness is tolerated but be aware that it is, both in government and corporate environments.

Second and almost as importantly, Alexis needs raise the question of semantic ROI for any semantic technology. Take her point about adoption of the Semantic Web:

but have not seen universal adoption because of the labor costs involved in doing so.

To adopt a single level of semantic encoding for all content, without regard to its value, either historical or current use, is a sure budget buster. Perhaps the business community was playing closer attention to the Semantic Web than many of us thought, hence its adoption failure.

Some content may need machine driven encoding, more valuable content may require human supervision and/or encoding and some content may not be worth encoding at all. Depends on your ROI model.

I should mention that the Semantic Web manages statements about statements (in its or other semantic systems) poorly. (AKA, “facts about facts.”) Although I hate to use the term “facts.” The very notion of “facts” is misleading and tricky under the best of circumstances.

However universal (universal = among people you know) knowledge of a “fact” may seem, the better argument is that it is only a “fact” from a particular point of view. Semantic Web based systems have difficulty with such concepts.

Third, and not mentioned by Alexis, is that semantic systems should capture and preserve trails created by information explorers. Reporters at the New York Times use databases everyday, but each search starts from scratch.

If re-making redundant information over and over again is absurd, repeating the same searches (more or less successfully) over and over again is insane.

Capturing search trails as data would enrich existing databases, especially if searchers could annotate their trails and data they encounter along the way. The more intensively searched a resource becomes, the richer its semantics. As it is today, all the effort of searchers is lost at the end of each search.

Alexis is right, let’s stop entombing knowledge in articles, papers, posts and books. It won’t be quick or easy, but worthwhile journeys rarely are.

I first saw this in a tweet by Tim Strehle.

unglue.it

Monday, August 31st, 2015

unglue.it

From the webpage:

unglue (v. t.) 2. To make a digital book free to read and use, worldwide.

New to me, possibly old to you.

I “discovered” this site while looking at Intermediate Python.

From the general FAQ:

Basics

How It Works

What is Unglue.it?

Unglue.it is a a place for individuals and institutions to join together to make ebooks free to the world. We work together with authors, publishers, or other rights holders who want their ebooks to be free but also want to be able to earn a living doing so. We use Creative Commons licensing as an enabling tool to “unglue” the ebooks.

What are Ungluing Campaigns?

We have three types of Ungluing Campaigns: Pledge Campaigns, Buy-to-Unglue Campaigns and Thanks-for-Ungluing campaigns.

  • In a Pledge Campaign, book lovers pledge their support for ungluing a book. If enough support is found to reach the goal (and only then), the supporter’s credit cards are charged, and an unglued ebook is released.
  • In a Buy-to-Unglue Campaign, every ebook copy sold moves the book’s ungluing date closer to the present. And you can donate ebooks to your local library- that’s something you can’t do in the Kindle or Apple Stores!
  • In a Thanks-for-Ungluing Campaign, the ebook is already released with a Creative Commons license. Supporters can express their thanks by paying what they wish for the license and the ebook.

What is Crowdfunding?

Crowdfunding is collectively pooling contributions (or pledges) to support some cause. Using the internet for coordination means that complete strangers can work together, drawn by a common cause. This also means the number of supporters can be vast, so individual contributions can be as large or as small as people are comfortable with, and still add up to enough to do something amazing.

Want to see some examples? Kickstarter lets artists and inventors solicit funds to make their projects a reality. For instance, webcomic artist Rich Burlew sought $57,750 to reprint his comics in paper form — and raised close to a million.

In other words, crowdfunding is working together to support something you love. By pooling resources, big and small, from all over the world, we can make huge things happen.

What will supplement and then replace contemporary publishing models remains to be seen.

In terms of experiments, this one looks quite promising.

If you use unglue.it, please ping me with your experience. Thanks!

The Nation has a new publishing model

Wednesday, July 8th, 2015

Introducing the New TheNation.com by Richard Kim.

From the post:

…on July 6, 2015—exactly 150 years after the publication of our first issue—we’re relaunching TheNation.com. The new site, created in partnership with our friends at Blue State Digital and Diaspark, represents our commitment to being at the forefront of independent journalism for the next generation. The article page is designed with the Nation ambassador in mind: Beautiful, clear fonts (Mercury and Knockout) and a variety of image fields make the articles a joy to read—on desktop, tablet, and mobile. Prominent share tools, Twitter quotes, and a “highlight to e-mail/tweet” function make it easy to share them with others. A robust new taxonomy and a continuous scroll seamlessly connect readers to related content. You’ll also see color-coded touts that let readers take action on a particular issue, or donate and subscribe to The Nation.

I’m not overly fond of paywalls as you know but one part of the relaunch merits closer study. Comments on articles are going to be open to subscribers only.

It will be interesting to learn what the experience of The Nation is with its comments only by subscribers. Hopefully their tracking will be granular enough to determine what portion of subscribers subscribed, simply so they could make comments.

There are any number of fields where opinions run hot enough that even open content but paying for comments to be displayed could be a viable model for publication.

Imagine a publicly accessible topic map on the candidates for the US presidential election next year. If it had sufficient visibility, the publication of any report would spawn automatic responses from others. Responses that would not appear without paying for access to publish the comment.

Viable economic model?

Suggestions?

Digital Data Repositories in Chemistry…

Wednesday, July 1st, 2015

Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic Notebooks by Matthew J. Harvey, Nicholas J. Mason, Henry S. Rzepa.

Abtract:

We discuss the concept of recasting the data-rich scientific journal article into two components, a narrative and separate data components, each of which is assigned a persistent digital object identifier. Doing so allows each of these components to exist in an environment optimized for purpose. We make use of a poorly-known feature of the handle system for assigning persistent identifiers that allows an individual data file from a larger file set to be retrieved according to its file name or its MIME type. The data objects allow facile visualization and retrieval for reuse of the data and facilitates other operations such as data mining. Examples from five recently published articles illustrate these concepts.

A very promising effort to integrate published content and electronic notebooks in chemistry. Encouraging that in addition to the technical and identity issues the authors also point out the lack of incentives for the extra work required to achieve useful integration.

Everyone agrees that deeper integration of resources in the sciences will be a game-changer but renewing the realization that there is no such thing as a free lunch, is an important step towards that goal.

This article easily repays a close read with interesting subject identity issues and the potential that topic maps would offer to such an effort.

The peer review drugs don’t work [Faith Based Science]

Sunday, May 31st, 2015

The peer review drugs don’t work by Richard Smith.

From the post:

It is paradoxical and ironic that peer review, a process at the heart of science, is based on faith not evidence.

There is evidence on peer review, but few scientists and scientific editors seem to know of it – and what it shows is that the process has little if any benefit and lots of flaws.

Peer review is supposed to be the quality assurance system for science, weeding out the scientifically unreliable and reassuring readers of journals that they can trust what they are reading. In reality, however, it is ineffective, largely a lottery, anti-innovatory, slow, expensive, wasteful of scientific time, inefficient, easily abused, prone to bias, unable to detect fraud and irrelevant.

As Drummond Rennie, the founder of the annual International Congress on Peer Review and Biomedical Publication, says, “If peer review was a drug it would never be allowed onto the market.”

Cochrane reviews, which gather systematically all available evidence, are the highest form of scientific evidence. A 2007 Cochrane review of peer review for journals concludes: “At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research.”

We can see before our eyes that peer review doesn’t work because most of what is published in scientific journals is plain wrong. The most cited paper in Plos Medicine, which was written by Stanford University’s John Ioannidis, shows that most published research findings are false. Studies by Ioannidis and others find that studies published in “top journals” are the most likely to be inaccurate. This is initially surprising, but it is to be expected as the “top journals” select studies that are new and sexy rather than reliable. A series published in The Lancet in 2014 has shown that 85 per cent of medical research is wasted because of poor methods, bias and poor quality control. A study in Nature showed that more than 85 per cent of preclinical studies could not be replicated, the acid test in science.

I used to be the editor of the BMJ, and we conducted our own research into peer review. In one study we inserted eight errors into a 600 word paper and sent it 300 reviewers. None of them spotted more than five errors, and a fifth didn’t detect any. The median number spotted was two. These studies have been repeated many times with the same result. Other studies have shown that if reviewers are asked whether a study should be published there is little more agreement than would be expected by chance.

As you might expect, the humanities are lagging far behind the sciences in acknowledging that peer review is an exercise in social status rather than quality:


One of the changes I want to highlight is the way that “peer review” has evolved fairly quietly during the expansion of digital scholarship and pedagogy. Even though some scholars, such as Kathleen Fitzpatrick, are addressing the need for new models of peer review, recognition of the ways that this process has already been transformed in the digital realm remains limited. The 2010 Center for Studies in Higher Education (hereafter cited as Berkeley Report) comments astutely on the conventional role of peer review in the academy:

Among the reasons peer review persists to such a degree in the academy is that, when tied to the venue of a publication, it is an efficient indicator of the quality, relevance, and likely impact of a piece of scholarship. Peer review strongly influences reputation and opportunities. (Harley, et al 21)

These observations, like many of those presented in this document, contain considerable wisdom. Nevertheless, our understanding of peer review could use some reconsideration in light of the distinctive qualities and conditions associated with digital humanities.
…(Living in a Digital World: Rethinking Peer Review, Collaboration, and Open Access by Sheila Cavanagh.)

Can you think of another area where something akin to peer review is being touted?

What about internal guidelines of the CIA, NSA, FBI and secret courts reviewing actions by those agencies?

How do those differ from peer review, which is an acknowledged failure in science and should be acknowledged in the humanities?

They are quite similar in the sense that some secret group is empowered to make decisions that impact others and members of those groups, don’t want to relinquish those powers. Surprise, surprise.

Peer review should be scrapped across the board and replaced by tracked replication and use by others, both in the sciences and the humanities.

Government decisions should be open to review by all its citizens and not just a privileged few.

How journals could “add value”

Thursday, May 28th, 2015

How journals could “add value” by Mark Watson.

From the post:

I wrote a piece for Genome Biology, you may have read it, about open science. I said a lot of things in there, but one thing I want to focus on is how journals could “add value”. As brief background: I think if you’re going to make money from academic publishing (and I have no problem if that’s what you want to do), then I think you should “add value”. Open science and open access is coming: open access journals are increasingly popular (and cheap!), preprint servers are more popular, green and gold open access policies are being implemented etc etc. Essentially, people are going to stop paying to access research articles pretty soon – think 5-10 year time frame.

So what can journals do to “add value”? What can they do that will make us want to pay to access them? Here are a few ideas, most of which focus on going beyond the PDF:

Humanities journals and their authors should take heed of these suggestions.

Not applicable in every case but certainly better than “journal editorial board as resume padding.”

Companion to “Functional Programming in Scala”

Sunday, February 22nd, 2015

A companion booklet to “Functional Programming in Scala” by Rúnar Óli Bjarnason.

From the webpage:

This full colour syntax-highlighted booklet comprises all the chapter notes, hints, solutions to exercises, addenda, and errata for the book “Functional Programming in Scala” by Paul Chiusano and Runar Bjarnason. This material is freely available online, but is compiled here as a convenient companion to the book itself.

If you talk about supporting alternative forms of publishing, here is your chance to support an alternative form of publishing, financially.

Authors are going to gravitate to models that sustain their ability to write.

It is up to you what model that will be.

The Many Faces of Science (the journal)

Saturday, February 21st, 2015

Andy Dalby tells a chilling tale in Why I will never trust Science again.

You need to read the full account but as a quick summary, Andy submits a paper to Science that is rejected and within weeks finds that Science accepted another paper, a deeply flawed one, reaching the same conclusion and when he notified Science, it was suggested he post an online comment. Andy’s account has quotes, links to references, etc.

That is one face of Science, secretive, arbitrary and restricted peer review of submissions. I say “restricted peer” because Science has a tiny number of reviewers, compared to your peers, who review submissions. If you want “peer review,” you should publish with an open source journal that enlists all of your peers as reviewers, not just a few.

There is another face of Science, which appeared last December without any trace of irony at all:

Does journal peer review miss best and brightest? by David Shultz, which reads in part:

Sometimes greatness is hard to spot. Before going on to lead the Chicago Bulls to six NBA championships, Michael Jordan was famously cut from his high school basketball team. Scientists often face rejection of their own—in their case, the gatekeepers aren’t high school coaches, but journal editors and peers they select to review submitted papers. A study published today indicates that this system does a reasonable job of predicting the eventual interest in most papers, but it may shoot an air ball when it comes to identifying really game-changing research.

There is a serious chink in the armor, though: All 14 of the most highly cited papers in the study were rejected by the three elite journals, and 12 of those were bounced before they could reach peer review. The finding suggests that unconventional research that falls outside the established lines of thought may be more prone to rejection from top journals, Siler says.

Science publishes research showing its methods are flawed and yet it takes no notice. Perhaps its rejection of Andy’s paper isn’t so strange. It must have not traveled far enough down the stairs.

I first saw Andy’s paper in a tweet by Mick Watson.

Harry Potter eBooks

Sunday, February 1st, 2015

All the Harry Potter ebooks are now on subscription site Oyster by Laura Hazard Owen.

Laura reports the Harry Potter books are available on Oyster and Amazon. She says that Oyster has the spin-off titles from the original series where Amazon does not.

Both offer $9.95 per month subscription rates, where Oyster claims “over a million” books and Amazon over 700,000. After reading David Mason’s How many books will you read in your lifetime?, I am not sure the difference in raw numbers will make much difference.

Access to electronic texts will certainly make creating topic maps for popular literature a good deal easier.

Enjoy!

Nature: A recap of a successful year in open access, and introducing CC BY as default

Tuesday, January 27th, 2015

A recap of a successful year in open access, and introducing CC BY as default by Carrie Calder, the Director of Strategy for Open Research, Nature Publishing Group/Palgrave Macmillan.

From the post:

We’re pleased to start 2015 with an announcement that we’re now using Creative Commons Attribution license CC BY 4.0 as default. This will apply to all of the 18 fully open access journals Nature Publishing Group owns, and will also apply to any future titles we launch. Two society- owned titles have introduced CC BY as default today and we expect to expand this in the coming months.

This follows a transformative 2014 for open access and open research at Nature Publishing Group. We’ve always been supporters of new technologies and open research (for example, we’ve had a liberal self-archiving policy in place for ten years now. In 2013 we had 65 journals with an open access option) but in 2014 we:

  • Built a dedicated team of over 100 people working on Open Research across journals, books, data and author services
  • Conducted research on whether there is an open access citation benefit, and researched authors’ views on OA
  • Introduced the Nature Partner Journal series of high-quality open access journals and announced our first ten NPJs
  • Launched Scientific Data, our first open access publication for Data Descriptors
  • And last but not least switched Nature Communications to open access, creating the first Nature-branded fully open access journal

We did this not because it was easy (trust us, it wasn’t always) but because we thought it was the right thing to do. And because we don’t just believe in open access; we believe in driving open research forward, and in working with academics, funders and other publishers to do so. It’s obviously making a difference already. In 2013, 38% of our authors chose to publish open access immediately upon publication – in 2014, this percentage rose to 44%. Both Scientific Reports and Nature Communications had record years in terms of submissions for publication.

Open access is on its way to becoming the expected model for publishing. That isn’t to say that there aren’t economies and kinks to be worked out, but the fundamental principles of open access have been widely accepted.

Not everywhere of course. There are areas of scholarship that think self-isolation makes them important. They shun open access as an attack on their traditions of “Doctor Fathers” and access to original materials as a privilege. Strategies that make them all the more irrelevant in the modern world. Pity because there is so much they could contribute to the public conversation. But a public conversation means you are not insulated from questions that don’t accept “because I say so” as an adequate answer.

If you are working in such an area or know of one, press for emulation of the Nature and the many other efforts to provide open access to both primary and secondary materials. There are many areas of the humanities that already follow that model, but not all. Let’s keep pressing until open access is the default for all disciplines.

Kudos to Nature for their ongoing efforts on open access.

I first saw the news about the post about Nature in a tweet by Ethan White.

The Past, Present and Future of Scholarly Publishing

Saturday, January 3rd, 2015

The Past, Present and Future of Scholarly Publishing By Michael Eisen.

Michael made this presentation to the Commonwealth Club of California on March 12, 2013. This post is from the written text for the presentation and you can catch the audio here.

Michael does a great job tracing the history of academic publishing, the rise of open access and what is holding us back from a more productive publishing environment for everyone.

I disagree with his assessment of classification:

And as for classification, does anyone really think that assigning every paper to one of 10,000 journals, organized in a loose and chaotic hierarchy of topics and importance, is really the best way to help people browse the literature? This is a pure relic of a bygone era – an artifact of the historical accident that Gutenberg invented the printing press before Al Gore invented the Internet.

but will pass over that to address the more serious issue of open access publishing in the humanities.

Michael notes:

But the battle is by no means won. Open access collectively represents only around 10% of biomedical publishing, has less penetration in other sciences, and is almost non-existent in the humanities. And most scientists still send their best papers to “high impact” subscription-based journals.

There are open access journals in the humanities but it is fair to say they are few and far in between. If prestige is one of the drivers in scientific publishing, where large grant programs abound for some times of research, prestige is about the only driver for humanities publishing.

There are grant programs for the humanities but nothing on the scale of funding in the sciences. Salaries in the humanities are for the most part nothing to write home about. Humanities publishing really comes down to prestige.

Prestige from publication may be a dry, hard bone but it is the only bone that most humanities scholars will ever have. Try to take that away and you are likely to get bitten.

For instance, have you ever wondered about the proliferation of new translations of the Bible? Have we discovered new texts? New discoveries about biblical languages? Discovery of major mistakes in a prior edition? What if I said none of the above? To what would you assign the publication of new translations of the Bible?

If you compare the various translations you will find different “editors,” unless you are looking at a common source for bibles. Some sources do that as well. They create different “versions” for different target audiences.

With the exception of new versions like the New Revised Standard Version, which was undertaken to account for new information from the Dead Sea Scrolls, new editions of the Bible are primarily scholarly churn.

The humanities aren’t going to move any closer to open access publishing until their employers (universities) and funders, insist on open access publishing as a condition for tenure and funding.

I will address Michael’s mis-impressions about the value of classification another time. ;-)

Early English Books Online – Good News and Bad News

Friday, January 2nd, 2015

Early English Books Online

The very good news is that 25,000 volumes from the Early English Books Online collection have been made available to the public!

From the webpage:

The EEBO corpus consists of the works represented in the English Short Title Catalogue I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts and the Early English Books Tract Supplement. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, linguistics, and history, but these resources also include core texts in religious studies, art, women’s studies, history of science, law, and music.

Even better news from Sebastian Rahtz Sebastian Rahtz (Chief Data Architect, IT Services, University of Oxford):

The University of Oxford is now making this collection, together with Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints, available in various formats (TEI P5 XML, HTML and ePub) initially via the University of Oxford Text Archive at http://www.ota.ox.ac.uk/tcp/, and offering the source XML for community collaborative editing via Github. For the convenience of UK universities who subscribe to JISC Historic Books, a link to page images is also provided. We hope that the XML will serve as the base for enhancements and corrections.

This catalogue also lists EEBO Phase 2 texts, but the HTML and ePub versions of these can only be accessed by members of the University of Oxford.

[Technical note]
Those interested in working on the TEI P5 XML versions of the texts can check them out of Github, via https://github.com/textcreationpartnership/, where each of the texts is in its own repository (eg https://github.com/textcreationpartnership/A00021). There is a CSV file listing all the texts at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv, and a simple Linux/OSX shell script to clone all 32853 unrestricted repositories at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/cloneall.sh

Now for the BAD NEWS:

An additional 45,000 books:

Currently, EEBO-TCP Phase II texts are available to authorized users at partner libraries. Once the project is done, the corpus will be available for sale exclusively through ProQuest for five years. Then, the texts will be released freely to the public.

Can you guess why the public is barred from what are obviously public domain texts?

Because our funding is limited, we aim to key as many different works as possible, in the language in which our staff has the most expertise.

Academic projects are supposed to fund themselves and be self-sustaining. When anyone asks about sustainability of an academic project, ask them when the last time your countries military was “self sustaining?” The U.S. has spent $2.6 trillion on a “war on terrorism” and has nothing to show for it other than dead and injured military personnel, perversion of budgetary policies, and loss of privacy on a world wide scale.

It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than $1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc.

My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise?

The Data Scientist

Thursday, January 1st, 2015

The Data Scientist

Kurt Kagel has setup a newspaper on Data Science and Computational Linguistics with the following editor’s note:

I have been covering the electronic information space for more than thirty years, as writer, editor, programmer and information architect. This paper represents an experiment, a venue to explore Data Science and Computational Linguistics, as well as the world of IT in general.

I’m still working out bugs and getting a feel for the platform, so look and feel (and content) will almost certainly change. If you are interested in featuring articles here, please contact me.

It is based on paper.li, which automatically loads content into your newspaper. Not to mention you being able to load content as well.

I have known Kurt for a number of years in the markup world and look forward to seeing how this newspaper develops.