Archive for the ‘Publishing’ Category

Companion to “Functional Programming in Scala”

Sunday, February 22nd, 2015

A companion booklet to “Functional Programming in Scala” by Rúnar Óli Bjarnason.

From the webpage:

This full colour syntax-highlighted booklet comprises all the chapter notes, hints, solutions to exercises, addenda, and errata for the book “Functional Programming in Scala” by Paul Chiusano and Runar Bjarnason. This material is freely available online, but is compiled here as a convenient companion to the book itself.

If you talk about supporting alternative forms of publishing, here is your chance to support an alternative form of publishing, financially.

Authors are going to gravitate to models that sustain their ability to write.

It is up to you what model that will be.

The Many Faces of Science (the journal)

Saturday, February 21st, 2015

Andy Dalby tells a chilling tale in Why I will never trust Science again.

You need to read the full account but as a quick summary, Andy submits a paper to Science that is rejected and within weeks finds that Science accepted another paper, a deeply flawed one, reaching the same conclusion and when he notified Science, it was suggested he post an online comment. Andy’s account has quotes, links to references, etc.

That is one face of Science, secretive, arbitrary and restricted peer review of submissions. I say “restricted peer” because Science has a tiny number of reviewers, compared to your peers, who review submissions. If you want “peer review,” you should publish with an open source journal that enlists all of your peers as reviewers, not just a few.

There is another face of Science, which appeared last December without any trace of irony at all:

Does journal peer review miss best and brightest? by David Shultz, which reads in part:

Sometimes greatness is hard to spot. Before going on to lead the Chicago Bulls to six NBA championships, Michael Jordan was famously cut from his high school basketball team. Scientists often face rejection of their own—in their case, the gatekeepers aren’t high school coaches, but journal editors and peers they select to review submitted papers. A study published today indicates that this system does a reasonable job of predicting the eventual interest in most papers, but it may shoot an air ball when it comes to identifying really game-changing research.

There is a serious chink in the armor, though: All 14 of the most highly cited papers in the study were rejected by the three elite journals, and 12 of those were bounced before they could reach peer review. The finding suggests that unconventional research that falls outside the established lines of thought may be more prone to rejection from top journals, Siler says.

Science publishes research showing its methods are flawed and yet it takes no notice. Perhaps its rejection of Andy’s paper isn’t so strange. It must have not traveled far enough down the stairs.

I first saw Andy’s paper in a tweet by Mick Watson.

Harry Potter eBooks

Sunday, February 1st, 2015

All the Harry Potter ebooks are now on subscription site Oyster by Laura Hazard Owen.

Laura reports the Harry Potter books are available on Oyster and Amazon. She says that Oyster has the spin-off titles from the original series where Amazon does not.

Both offer $9.95 per month subscription rates, where Oyster claims “over a million” books and Amazon over 700,000. After reading David Mason’s How many books will you read in your lifetime?, I am not sure the difference in raw numbers will make much difference.

Access to electronic texts will certainly make creating topic maps for popular literature a good deal easier.


Nature: A recap of a successful year in open access, and introducing CC BY as default

Tuesday, January 27th, 2015

A recap of a successful year in open access, and introducing CC BY as default by Carrie Calder, the Director of Strategy for Open Research, Nature Publishing Group/Palgrave Macmillan.

From the post:

We’re pleased to start 2015 with an announcement that we’re now using Creative Commons Attribution license CC BY 4.0 as default. This will apply to all of the 18 fully open access journals Nature Publishing Group owns, and will also apply to any future titles we launch. Two society- owned titles have introduced CC BY as default today and we expect to expand this in the coming months.

This follows a transformative 2014 for open access and open research at Nature Publishing Group. We’ve always been supporters of new technologies and open research (for example, we’ve had a liberal self-archiving policy in place for ten years now. In 2013 we had 65 journals with an open access option) but in 2014 we:

  • Built a dedicated team of over 100 people working on Open Research across journals, books, data and author services
  • Conducted research on whether there is an open access citation benefit, and researched authors’ views on OA
  • Introduced the Nature Partner Journal series of high-quality open access journals and announced our first ten NPJs
  • Launched Scientific Data, our first open access publication for Data Descriptors
  • And last but not least switched Nature Communications to open access, creating the first Nature-branded fully open access journal

We did this not because it was easy (trust us, it wasn’t always) but because we thought it was the right thing to do. And because we don’t just believe in open access; we believe in driving open research forward, and in working with academics, funders and other publishers to do so. It’s obviously making a difference already. In 2013, 38% of our authors chose to publish open access immediately upon publication – in 2014, this percentage rose to 44%. Both Scientific Reports and Nature Communications had record years in terms of submissions for publication.

Open access is on its way to becoming the expected model for publishing. That isn’t to say that there aren’t economies and kinks to be worked out, but the fundamental principles of open access have been widely accepted.

Not everywhere of course. There are areas of scholarship that think self-isolation makes them important. They shun open access as an attack on their traditions of “Doctor Fathers” and access to original materials as a privilege. Strategies that make them all the more irrelevant in the modern world. Pity because there is so much they could contribute to the public conversation. But a public conversation means you are not insulated from questions that don’t accept “because I say so” as an adequate answer.

If you are working in such an area or know of one, press for emulation of the Nature and the many other efforts to provide open access to both primary and secondary materials. There are many areas of the humanities that already follow that model, but not all. Let’s keep pressing until open access is the default for all disciplines.

Kudos to Nature for their ongoing efforts on open access.

I first saw the news about the post about Nature in a tweet by Ethan White.

The Past, Present and Future of Scholarly Publishing

Saturday, January 3rd, 2015

The Past, Present and Future of Scholarly Publishing By Michael Eisen.

Michael made this presentation to the Commonwealth Club of California on March 12, 2013. This post is from the written text for the presentation and you can catch the audio here.

Michael does a great job tracing the history of academic publishing, the rise of open access and what is holding us back from a more productive publishing environment for everyone.

I disagree with his assessment of classification:

And as for classification, does anyone really think that assigning every paper to one of 10,000 journals, organized in a loose and chaotic hierarchy of topics and importance, is really the best way to help people browse the literature? This is a pure relic of a bygone era – an artifact of the historical accident that Gutenberg invented the printing press before Al Gore invented the Internet.

but will pass over that to address the more serious issue of open access publishing in the humanities.

Michael notes:

But the battle is by no means won. Open access collectively represents only around 10% of biomedical publishing, has less penetration in other sciences, and is almost non-existent in the humanities. And most scientists still send their best papers to “high impact” subscription-based journals.

There are open access journals in the humanities but it is fair to say they are few and far in between. If prestige is one of the drivers in scientific publishing, where large grant programs abound for some times of research, prestige is about the only driver for humanities publishing.

There are grant programs for the humanities but nothing on the scale of funding in the sciences. Salaries in the humanities are for the most part nothing to write home about. Humanities publishing really comes down to prestige.

Prestige from publication may be a dry, hard bone but it is the only bone that most humanities scholars will ever have. Try to take that away and you are likely to get bitten.

For instance, have you ever wondered about the proliferation of new translations of the Bible? Have we discovered new texts? New discoveries about biblical languages? Discovery of major mistakes in a prior edition? What if I said none of the above? To what would you assign the publication of new translations of the Bible?

If you compare the various translations you will find different “editors,” unless you are looking at a common source for bibles. Some sources do that as well. They create different “versions” for different target audiences.

With the exception of new versions like the New Revised Standard Version, which was undertaken to account for new information from the Dead Sea Scrolls, new editions of the Bible are primarily scholarly churn.

The humanities aren’t going to move any closer to open access publishing until their employers (universities) and funders, insist on open access publishing as a condition for tenure and funding.

I will address Michael’s mis-impressions about the value of classification another time. ;-)

Early English Books Online – Good News and Bad News

Friday, January 2nd, 2015

Early English Books Online

The very good news is that 25,000 volumes from the Early English Books Online collection have been made available to the public!

From the webpage:

The EEBO corpus consists of the works represented in the English Short Title Catalogue I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts and the Early English Books Tract Supplement. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, linguistics, and history, but these resources also include core texts in religious studies, art, women’s studies, history of science, law, and music.

Even better news from Sebastian Rahtz Sebastian Rahtz (Chief Data Architect, IT Services, University of Oxford):

The University of Oxford is now making this collection, together with Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints, available in various formats (TEI P5 XML, HTML and ePub) initially via the University of Oxford Text Archive at, and offering the source XML for community collaborative editing via Github. For the convenience of UK universities who subscribe to JISC Historic Books, a link to page images is also provided. We hope that the XML will serve as the base for enhancements and corrections.

This catalogue also lists EEBO Phase 2 texts, but the HTML and ePub versions of these can only be accessed by members of the University of Oxford.

[Technical note]
Those interested in working on the TEI P5 XML versions of the texts can check them out of Github, via, where each of the texts is in its own repository (eg There is a CSV file listing all the texts at, and a simple Linux/OSX shell script to clone all 32853 unrestricted repositories at

Now for the BAD NEWS:

An additional 45,000 books:

Currently, EEBO-TCP Phase II texts are available to authorized users at partner libraries. Once the project is done, the corpus will be available for sale exclusively through ProQuest for five years. Then, the texts will be released freely to the public.

Can you guess why the public is barred from what are obviously public domain texts?

Because our funding is limited, we aim to key as many different works as possible, in the language in which our staff has the most expertise.

Academic projects are supposed to fund themselves and be self-sustaining. When anyone asks about sustainability of an academic project, ask them when the last time your countries military was “self sustaining?” The U.S. has spent $2.6 trillion on a “war on terrorism” and has nothing to show for it other than dead and injured military personnel, perversion of budgetary policies, and loss of privacy on a world wide scale.

It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than $1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc.

My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise?

The Data Scientist

Thursday, January 1st, 2015

The Data Scientist

Kurt Kagel has setup a newspaper on Data Science and Computational Linguistics with the following editor’s note:

I have been covering the electronic information space for more than thirty years, as writer, editor, programmer and information architect. This paper represents an experiment, a venue to explore Data Science and Computational Linguistics, as well as the world of IT in general.

I’m still working out bugs and getting a feel for the platform, so look and feel (and content) will almost certainly change. If you are interested in featuring articles here, please contact me.

It is based on, which automatically loads content into your newspaper. Not to mention you being able to load content as well.

I have known Kurt for a number of years in the markup world and look forward to seeing how this newspaper develops.

Why my book can be downloaded for free

Saturday, December 6th, 2014

Why my book can be downloaded for free by Mark Dominus.

From the post:

People are frequently surprised that my book, Higher-Order Perl, is available as a free download from my web site. They ask if it spoiled my sales, or if it was hard to convince the publisher. No and no.

I sent the HOP proposal to five publishers, expecting that two or three would turn it down, and that I would pick from the remaining two or three, but somewhat to my dismay, all five offered to publish it, and I had to decide who.

One of the five publishers was Morgan Kaufmann. I had never heard of Morgan Kaufmann, but one day around 2002 I was reading the web site of Philip Greenspun. Greenspun was incredibly grouchy. He found fault with everything. But he had nothing but praise for Morgan Kaufmann. I thought that if Morgan Kaufmann had pleased Greenspun, who was nearly impossible to please, then they must be really good, so I sent them the proposal. (They eventually published the book, and did a superb job; I have never regretted choosing them.)

But not only Morgan Kaufmann but four other publishers had offered to publish the book. So I asked a number of people for advice. I happened to be in London one week and Greenspun was giving a talk there, which I went to see. After the talk I introduced myself and asked for his advice about picking the publisher.

Access to “free” electronic versions is on its way to becoming a norm, at least with some computer science publishers. Cambridge University Press, CUP, with Data Mining and Analysis: Fundamental Concepts and Algorithms and Basic Category Theory comes to mind.

Other publishers with similar policies? Yes, I know there are CS publishers who want to make free with content of others, not so much with their own. Not the same thing.

I first saw this in a tweet by Julia Evans.

Nature makes all articles free to view [pay-to-say]

Tuesday, December 2nd, 2014

Nature makes all articles free to view by Richard Van Noorden.

From the post:

All research papers from Nature will be made free to read in a proprietary screen-view format that can be annotated but not copied, printed or downloaded, the journal’s publisher Macmillan announced on 2 December.

The content-sharing policy, which also applies to 48 other journals in Macmillan’s Nature Publishing Group (NPG) division, including Nature Genetics, Nature Medicine and Nature Physics, marks an attempt to let scientists freely read and share articles while preserving NPG’s primary source of income — the subscription fees libraries and individuals pay to gain access to articles.

ReadCube, a software platform similar to Apple’s iTunes, will be used to host and display read-only versions of the articles’ PDFs. If the initiative becomes popular, it may also boost the prospects of the ReadCube platform, in which Macmillan has a majority investment.

Annette Thomas, chief executive of Macmillan Science and Education, says that under the policy, subscribers can share any paper they have access to through a link to a read-only version of the paper’s PDF that can be viewed through a web browser. For institutional subscribers, that means every paper dating back to the journal’s foundation in 1869, while personal subscribers get access from 1997 on.

Anyone can subsequently repost and share this link. Around 100 media outlets and blogs will also be able to share links to read-only PDFs. Although the screen-view PDF cannot be printed, it can be annotated — which the publisher says will provide a way for scientists to collaborate by sharing their comments on manuscripts. PDF articles can also be saved to a free desktop version of ReadCube, similarly to how music files can be saved in iTunes.

I am hopeful that Macmillan will discover that allowing copying and printing are no threat to its income stream. Both are means of advertising for its journal at the expense of the user who copies a portion of the text for a citation or shares a printed copy with a colleague. Advertising paid for by users should be considered as a plus.

The annotation step is a good one, although I would modify it in some respects. First I would make all articles accessible by default with annotation capabilities. Then I would grant anyone who registers say 12 comments per year for free and offer a lower-than-subscription-cost option for more than twelve comments on articles.

If there is one thing I suspect users would be willing to pay for is the right to response to others in their fields. Either to response to articles and/or to other comments. Think of it as a pay-to-say market strategy.

It could be an “additional” option to current institutional and personal subscriptions and thus an entirely new revenue stream for Macmillan.

To head off expected objections by “free speech” advocates, I note that no journal publishes every letter to the editor. The right to free speech has never included the right to be heard on someone else’s dime. Annotation of Nature is on Macmillan’s dime.

Basic Category Theory (Publish With CUP)

Monday, July 28th, 2014

Basic Category Theory by Tom Leinster.

From the webpage:

Basic Category Theory is an introductory category theory textbook. Features:

  • It doesn’t assume much, either in terms of background or mathematical maturity.
  • It sticks to the basics.
  • It’s short.

Advanced topics are omitted, leaving more space for careful explanations of the core concepts. I used earlier versions of the text to teach master’s-level courses at the University of Glasgow.

The book is published by Cambridge University Press. You can find all the publication data, and buy it, at the book’s CUP web page.

It was published on 24 July 2014 in hardback and e-book formats. The physical book should be in stock throughout Europe now, and worldwide by mid-September. Wherever you are, you can (pre)order it now from CUP or the usual online stores.

By arrangement with CUP, a free online version will be released in January 2016. This will be not only freely downloadable but also freely editable, under a Creative Commons licence. So, for instance, if parts of the book are unsuitable for the course you’re teaching, or if you don’t like the notation, you can change it. More details will appear here when the time comes.

Freely available as etext (6 months after hard copy release) and freely editable?

Show of hands. How many publishers have you seen with those policies?

I keep coming up with one, Cambridge University Press, CUP.

As readers and authors we need to vote with our feet. Purchase from and publish with Cambridge University Press.

It may take a while but other publishers may finally notice.

TeX Live 2014 released…

Thursday, June 19th, 2014

TeX Live 2014 released – what’s new by Stefan Kottwitz.

Just enough to get you interested:

  • TeX and MetaFont updates
  • pdfTeX with “fake spaces”
  • LuaTeX, engine that can reside in CPU cache
  • numerous other changes and improvements

Stefan covers these and more, while pointing you to the documentation for more details.

Has anyone calculated how many decades TeX/LaTeX are ahead of the average word processor?

Just curious.


Tuesday, June 3rd, 2014

GitBook: Write Books using Markdown on OpenShift by Marek Jelen.

From the post:

GitBook is a tool for using Markdown to write books, which are converted to dynamic websites or exported to static formats like PDF. GitBook also integrates with Git and GitHub, adding a social element to the book creation process.

If you are exporting your book into an HTML page, interactive aspects are also embedable. At the time of this writing, the system provides support for quizzes and JavaScript exercises. However, the tool is fully open source and written using Node.js, so you are free to extend the functionality to meet your needs.

The Gitbook Learn Javascript is used as an example of production with GitBook.

It’s readable but in terms of the publishing craft, the Mikraot Gedolot or The Art of Computer Programming (TAOCP), it’s not.

Still, it may be useful for one-off exports from topic maps and other data sources.


Tuesday, May 20th, 2014


From the webpage:

Madagascar is an open-source software package for multidimensional data analysis and reproducible computational experiments. Its mission is to provide

  • a convenient and powerful environment
  • a convenient technology transfer tool

for researchers working with digital image and data processing in geophysics and related fields. Technology developed using the Madagascar project management system is transferred in the form of recorded processing histories, which become “computational recipes” to be verified, exchanged, and modified by users of the system.

Interesting tool for “reproducible documents” and data analysis.

The file format, Regularly Sampled Format (RSF) sounds interesting:

For data, Madagascar uses the Regularly Sampled Format (RSF), which is based on the concept of hypercubes (n-D arrays, or regularly sampled functions of several variables), much like the SEPlib (its closest relative), DDS, or the regularly-sampled version of the Javaseis format (SVF). Up to 9 dimensions are supported. For 1D it is conceptually analogous to a time series, for 2D to a raster image, and for 3D to a voxel volume. The format (actually a metaformat) makes use of a ASCII file with metadata (information about the data), including a pointer (in= parameter) to the location of the file with the actual data values. Irregularly sampled data are currently handled as a pair of datasets, one containing data and the second containing the corresponding irregular geometry information. Programs for conversion to and from other formats such as SEG-Y and SU are provided. (From Package Overview)

In case you are interested SEG-Y and SU (Seismic Unix data format) are both formats for geophysical data.

I first saw this in a tweet by Scientific Python.

Thanks for Unguling

Sunday, May 4th, 2014

Thanks-for-Ungluing launches!

From the post:

Great books deserve to be read by all of us, and we ought to be supporting the people who create these books. “Thanks for Ungluing” gives readers, authors, libraries and publishers a new way to build, sustain, and nourish the books we love.

“Thanks for Ungluing” books are Creative Commons licensed and free to download. You don’t need to register or anything. But when you download, the creators can ask for your support. You can pay what you want. You can just scroll down and download the book. But when that book has become your friend, your advisor, your confidante, you’ll probably want to show your support and tell all your friends.

We have some amazing creators participating in this launch.

An attempt to address the problem of open access to published materials while at the same time compensating authors for their efforts.

There is some recent material and old standbys like The Communist Manifesto by Karl Marx and Friedrich Engels. Which is good but having more recent works such as A Theology of Liberation by Gustavo Gutiérrez would be better.

If you are thinking about writing a book on CS topics, please think about “Thanks for Ungluing” as an option.

I first saw this in a tweet by Tim O’Reilly.

Innovations in peer review:…

Tuesday, April 22nd, 2014

Innovations in peer review: join a discussion with our Editors by Shreeya Nanda.

From the post:

Innovation may not be an adjective often associated with peer review, indeed commentators have claimed that peer review slows innovation and creativity in science. Preconceptions aside, publishers are attempting to shake things up a little, with various innovations in peer review, and these are the focus of a panel discussion at BioMed Central’s Editors’ Conference on Wednesday 23 April in Doha, Qatar. This follows our spirited discussion at the Experimental Biology conference in Boston last year.

The discussion last year focussed on the limitations of the traditional peer review model (you can see a video here). This year we want to talk about innovations in the field and the ways in which the limitations are being addressed. Specifically, we will focus on open peer review, portable peer review – in which we help authors transfer their manuscript, often with reviewers’ reports, to a more appropriate journal – and decoupled peer review, which is undertaken by a company or organisation independent of, or on contract from, a journal.

We will be live tweeting from the session at 11.15am local time (9.15am BST), so if you want to join the discussion or put questions to our panellists, please follow #BMCEds14. If you want to brush up on any or all of the models that we’ll be discussing, have a look at some of the content from around BioMed Central’s journals, blogs and Biome below:

This post includes pointers to a number of useful resources concerning the debate around peer review.

But there are oddities as well. First, the claim that peer review “slows innovation and creativity in science,” considering recent reports that peer review is no better than random chance for grants (…lotteries to pick NIH research-grant recipients and the not infrequent reports of false papers, fraud in actual papers, and a general inability to replicate research described in papers (Reproducible Research/(Mapping?)).

A claim doesn’t have to appear on the newsgroup (imaginary newsgroup) in order to be questionable on its face.

Secondly, despite the invitation to follow and participate on Twitter, holding the meeting in Qartar means potential attendees from the United States will have to rise at:

Eastern 4:15 AM (last year’s location)

Central 3:15 AM

Mountain 2:15 AM

Western 1:15 AM

I wonder what the participation levels will be from Boston last year as compared to Qatar this year?

Nothing against non-United States locations but non-junket locations, such as major educational/research hubs, should be the sites for such meetings.

…Textbooks for $0 [Digital Illiterates?]

Thursday, January 23rd, 2014

OpenStax College Textbooks for $0

From the about page:

OpenStax College is a nonprofit organization committed to improving student access to quality learning materials. Our free textbooks are developed and peer-reviewed by educators to ensure they are readable, accurate, and meet the scope and sequence requirements of your course. Through our partnerships with companies and foundations committed to reducing costs for students, OpenStax College is working to improve access to higher education for all.

OpenStax College is an initiative of Rice University and is made possible through the generous support of several philanthropic foundations. …

Available now:

  • Anatomy and Physiology
  • Biology
  • College Physics
  • Concepts of Biology
  • Introduction to Sociology
  • Introductory Statistics

Coming soon:

  • Chemistry
  • Precalculus
  • Principles of Economics
  • Principles of Macroeconomics
  • Principles of Microeconomics
  • Psychology
  • U.S. History

Check to see if I missed any present or forthcoming texts on data science. No, I didn’t see any either.

I looked at the Introduction to Sociology, which has a chapter on research methods, but no opportunity for students to experience data methods. Such as Statwing’s coverage of the General Social Survey (GSS), which I covered in Social Science Dataset Prize!

Data science should not be an aside or extra course any more than language literacy is a requirement for an education.

Consider writing or suggesting edits to subject textbooks to incorporate data science. Solely data science books will be necessary as well, just like there are advanced courses in English Literature.

Let’s not graduate digital illiterates. For their sake and ours.

I first saw this in a tweet by Michael Peter Edson.

Composable languages for bioinformatics: the NYoSh experiment

Wednesday, January 22nd, 2014

Composable languages for bioinformatics: the NYoSh experiment by Manuele Simi, Fabien Campagne​. (Simi M, Campagne F. (2014) Composable languages for bioinformatics: the NYoSh experiment. PeerJ 2:e241


Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at

In the discussion section of the paper the authors concede:

We expect that widespread use of LWB will result in a multiplication of small languages, but in a manner that will increase language reuse and interoperability, rather than in the historical language fragmentation that has been observed with traditional language technology.

Whenever I hear projections about the development of languages I am reminded the inventors of “SCSI” thought it should be pronounced “sexy,” whereas others preferred “scuzzi.” Doesn’t have the same ring to it does it?

I am all in favor of domain specific languages (DSLs), but at the same time, am mindful that undocumented languages are in danger of becoming “dead” languages.

Pay the Man!

Saturday, January 18th, 2014

Books go online for free in Norway by Martin Chilton.

From the post:

More than 135,000 books still in copyright are going online for free in Norway after an innovative scheme by the National Library ensured that publishers and authors are paid for the project.

The copyright-protected books (including translations of foreign books) have to be published before 2000 and the digitising has to be done with the consent of the copyright holders.

National Library of Norway chief Vigdis Moe Skarstein said the project is the first of its kind to offer free online access to books still under copyright, which in Norway expires 70 years after the author’s death. Books by Stephen King, Ken Follett, John Steinbeck, Jo Nesbø, Karin Fossum and Nobel Laureate Knut Hamsun are among those in the scheme.

The National Library has signed an agreement with Kopinor, an umbrella group representing major authors and publishers through 22 member organisations, and for every digitised page that goes online, the library pays a predetermined sum to Kopinor, which will be responsible for distributing the royalties among its members. The per-page amount was 0.36 Norwegian kroner (four pence), which will decrease to three pence when the online collection reaches its estimated target of 250,000 books.

Norway has discovered a way out of the copyright conundrum, pay the man!

Can you imagine the impact if the United States were to bulk license all of the Springer publications in digital format?

Some immediate consequences:

  1. All citizen-innovators would have access to a vast library of high quality content, without restriction by place of employment or academic status.
  2. Taking over the cost of Springer materials would act as a additional funding for libraries with existing subscriptions.
  3. It would even out access to Springer materials across the educational system in the U.S.
  4. It would reduce the administrative burden on both libraries and Springer by consolidating all existing accounts into one account.
  5. Springer could offer “advanced” services in addition to basic search and content for additional fees, leveraged on top of the standard content.
  6. Other vendors could offer “advanced” services for fees leveraged on top of standard content.

I have nothing against the many “open access” journals but bear in mind the vast legacy of science and technology that remains the property of Springer and others.

The principal advantage that I would pitch to Springer would be the availability of its content under bulk licensing would result in other vendors building services on top of that content.

What advantage is there for Springer? Imagine that you can be either a road (content) or a convenience store (app. built on content) next to the road. Which one gets maintained longer?

Everybody has an interest in maintaining and even expanding the road. By becoming part of the intellectual infrastructure of education, industry and government, even more than it is now, Springer would secure a very stable and lucrative future.

Put that way, I would much rather be the road than the convenience store.



Saturday, August 24th, 2013


I didn’t recognize the acronym either. ;-)

From the “about” page:

The Open Access (OA) tenets of granting unrestricted access to the results of publicly-funded research are in contrast with current models of scientific publishing, where access is restricted to journal customers. At the same time, subscription costs increase and put considerable strain on libraries, forcing them to cancel an increasing number of journals subscriptions. This situation is particularly acute in fields like High-Energy Physics (HEP), where pre-prints describing scientific results are timely available online. There is a growing concern within the academic community that the future of high-quality journals, and the peer-review system they administer, is at risk.

To address this situation for HEP and, as an experiment, Science at large, a new model for OA publishing has emerged: SCOAP3 (Sponsoring Consortium for Open Access Publishing in Particle Physics). In this model, HEP funding agencies and libraries, which today purchase journal subscriptions to implicitly support the peer-review service, federate to explicitly cover its cost, while publishers make the electronic versions of their journals free to read. Authors are not directly charged to publish their articles OA.

SCOAP3 will, for the first time, link quality and price, stimulating competition and enabling considerable medium- and long-term savings. Today, most publishers quote a price in the range of 1’000–2’000 Euros per published article. On this basis, we estimate that the annual budget for the transition of HEP publishing to OA would amount to a maximum of 10 Million Euros/year, sensibly lower than the estimated global expenditure in subscription to HEP journals.

Each SCOAP3 partner will finance its contribution by canceling journal subscriptions. Each country will contribute according to its share of HEP publishing. The transition to OA will be facilitated by the fact that the large majority of HEP articles are published in just six peer-reviewed journals. Of course, the SCOAP3 model is open to any, present or future, high-quality HEP journal aiming at a dynamic market with healthy competition and broader choice.

HEP funding agencies and libraries are currently signing Expressions of Interest for the financial backing of the consortium. A tendering procedure will then take place. Provided that SCOAP3 funding partners are prepared to engage in long-term commitments, many publishers are expected to be ready to enter into negotiations.

The example of SCOAP3 could be rapidly followed by other fields, directly related to HEP, such as nuclear physics or astro-particle physics, also similarly compact and organized with a reasonable number of journals.

Models like this one may result in increasing the amount of information available for topic mapping and the amount of semantic diversity in traditional search results.

Delivery models are changing but search interfaces leave us to our own devices at the document level.

If we are going to have better access in the physical sense, shouldn’t we be working on better access in the content sense?

PS: To show this movement has legs, consider the recent agreement of Elsevier, IOPp and Springer to participate.

Information Extraction from the Internet

Saturday, August 24th, 2013

Information Extraction from the Internet by Nan Tang.

From the description at Amazon ($116.22):

As the Internet continues to become part of our lives, there now exists an overabundance of reliable information sources on this medium. The temporal and cognitive resources of human beings, however, do not change. “Information Extraction from the Internet” provides methods and tools for Web information extraction and retrieval. Success in this area will greatly enhance business processes and provide information seekers new tools that allow them to reduce their searching time and cost involvement. This book focuses on the latest approaches for Web content extraction, and analyzes the limitations of existing technology and solutions. “Information Extraction from the Internet” includes several interesting and popular topics that are being widely discussed in the area of information extraction: data spasity and field-associated knowledge (Chapters 1–2), Web agent design and mining components (Chapters 3–4), extraction skills on various documents (Chapters 5–7), duplicate detection for music documents (Chapter 8), name disambiguation in digital libraries using Web information (Chapter 9), Web personalization and user-behavior issues (Chapters 10–11), and information retrieval case studies (Chapters 12–14). “Information Extraction from the Internet” is suitable for advanced undergraduate students and postgraduate students. It takes a practical approach rather than a conceptual approach. Moreover, it offers a truly reader-friendly way to get to the subject related to information extraction, making it the ideal resource for any student new to this subject, and providing a definitive guide to anyone in this vibrant and evolving discipline. This book is an invaluable companion for students, from their first encounter with the subject to more advanced studies, while the full-color artworks are designed to present the key concepts with simplicity, clarity, and consistency.

I discovered this volume while searching for the publisher of: On-demand Synonym Extraction Using Suffix Arrays.

As you can see from the description, a wide ranging coverage of information extraction interests.

All of the chapters are free for downloading at the publisher’s site.

iConcepts Press has a number of books and periodicals you may find interesting.

Semantic Search… [Call for Papers]

Saturday, August 3rd, 2013

Semantic Search – Call for Papers for special issue of Aslib Journal of Information Management by Fran Alexander.

From the post:

I am currently drafting the Call for Papers for a special issue of the Aslib Journal of Information Management (formerly Aslib Proceedings) which I am guest editing alongside Dr Ulrike Spree from the University of Hamburg.

Ulrike is the academic expert, while I am providing the practitioner perspective. I am very keen to include practical case studies, so if you have an interesting project or comments on a project but have never written an academic paper before, don’t be put off. I will be happy to advise on style, referencing, etc.

Suggested Topics

Themes Ulrike is interested in include:

  • current trends in semantic search
  • best practice – how far along the road from ‘early adopters’ to ‘mainstream users’ has semantic search gone so far
  • usability of semantic search
  • visualisation and semantic search
  • the relationship between new trends in knowledge organisation and semantic search, such as vocabulary norms (like ISO 25964 “Thesauri for information retrieval“) and the potential of semantic search from a more critical perspective – what, for example, are the criteria for judging quality?

Themes I am interested in include:

  • the history of semantic search – how the latest techniques and technologies have come out of developments over the last 5, 10, 20, 100, 2000… years
  • how semantic search techniques and technologies are being used in practice
  • how semantic technologies are fostering a need for cross-industry collaboration and standardization
  • practical problems in brokering consensus and agreement – defining terms and classes, etc.
  • differences between web-scale, enterprise scale, and collection-specific scale techniques
  • curation and management of ontologies.

However, we are open to suggestions, especially as it is such a broad topic, there are so many aspects that could be covered.

Fran doesn’t mention a deadline but I will ask and update here when I get it.

Sounds like a venue that would welcome papers on topic maps.


Proceedings of the 3rd Workshop on Semantic Publishing

Sunday, July 7th, 2013

Proceedings of the 3rd Workshop on Semantic Publishing edited by: Alexander García Castro, Christoph Lange, Phillip Lord, and Robert Stevens.

Table of Contents

Research Papers

  1. Twenty-Five Shades of Greycite: Semantics for Referencing and Preservation Phillip Lord
  2. Systematic Reviews as an Interface to the Web of (Trial) Data: using PICO as an Ontology for Knowledge Synthesis in Evidence-based Healthcare Research Chris Mavergames
  3. Towards Linked Research Data: an Institutional Approach Najko JahnFlorian Lier, Thilo Paul-Stueve, Christian Pietsch, Philipp Cimiano
  4. Repurposing Benchmark Corpora for Reconstructing Provenance Sara Magliacane.
  5. Connections across Scientific Publications based on Semantic Annotations Leyla Jael García Castro, Rafael Berlanga, Dietrich Rebholz-Schuhmann, Alexander Garcia.
  6. Towards the Automatic Identification of the Nature of Citations Angelo Di Iorio, Andrea Giovanni Nuzzolese, Silvio Peroni.
  7. How Reliable is Your Workflow: Monitoring Decay in Scholarly Publications José Manuel Gómez-Pérez, Esteban García-Cuesta, Jun Zhao, Aleix Garrido, José Enrique Ruiz.

Polemics (published externally)

  1. Flash Mob Science, Open Innovation and Semantic Publishing Hal Warren, Bryan Dennis, Eva Winer.
  2. Science, Semantic Web and Excuses Idafen Santana Pérez, Daniel Garijo, Oscar Corcho.
  3. Polemic on Future of Scholarly Publishing/Semantic Publishing Chris Mavergames.
  4. Linked Research Sarven Capadisli.

The whole proceedings can also be downloaded as a single file (PDF, including title pages, preface, and table of contents).

Some reading to start your week!

Annual update released for TeX Live (2013)

Monday, June 24th, 2013

Annual update released for TeX Live

From the post:

The developers of the TeX Live distribution of LaTeX have released their annual update. However, after 17 years of development, the changes in TeX Live 2013 mostly amount to technical details.

The texmf/ directory, for example, has been merged into texmf-dist/, while the TEXMFMAIN and TEXMFDIST Kpathsea variables now point to texmf-dist. The developers have also merged several language collections for easier installation. Users will find native support for PNG output and floating-point numbers in MetaPost. LuaTeX now uses version 5.2 of Lua and includes a new library (pdfscanner) for processing external PDF data, and xdvi now uses freetype instead of t1lib for rendering.

Several updates have been made to XeTeX: HarfBuzz is now used instead of ICU for font layout and has been combined with Graphite2 to replace SilGraphite for Graphite layout; support has also been improved for OpenType.

TeX Live 2013 is open source software, licensed under a combination of the LaTeX Project Public License (LPPL) and a number of other licences. The software works on all of the major operating systems, although the program no longer runs on AIX systems using PowerPCs. Mac OS X users may want to take a look at MacTeX, which is based on – and has been updated in line with – TeX Live.

No major changes but we should be grateful for the effort that resulted in this release.

Journal of Data Mining & Digital Humanities

Monday, May 27th, 2013

Journal of Data Mining & Digital Humanities

From the webpage:

Data mining, an interdisciplinary subfield of computer science, involving the methods at the intersection of artificial intelligence, machine learning and database systems. The Journal of Data Mining & Digital Humanities concerned with the intersection of computing and the disciplines of the humanities, with tools provided by computing such as data visualisation, information retrieval, statistics, text mining by publishing scholarly work beyond the traditional humanities.

The journal includes a wide range of fields in its discipline to create a platform for the authors to make their contribution towards the journal and the editorial office promises a peer review process for the submitted manuscripts for the quality of publishing.

Journal of Data Mining & Digital Humanities is an Open Access journal and aims to publish most complete and reliable source of information on the discoveries and current developments in the mode of original articles, review articles, case reports, short communications, etc. in all areas of the field and making them freely available through online without any restrictions or any other subscriptions to researchers worldwide.

The journal is using Editorial Tracking System for quality in review process. Editorial Tracking is an online manuscript submission, review and tracking systems. Review processing is performed by the editorial board members of Journal of Data Mining & Digital Humanities or outside experts; at least two independent reviewers approval followed by editor approval is required for acceptance of any citable manuscript. Authors may submit manuscripts and track their progress through the system, hopefully to publication. Reviewers can download manuscripts and submit their opinions to the editor. Editors can manage the whole submission/review/revise/publish process.

KDNuggets reports the first issue of JDMDH will appear in August, 2013. Deadline for submissions for the first issue: 25 June 2013.

A great venue for topic map focused papers. (When you are not writing for the Economist.)

New York Times – Article Search API v. 2

Sunday, May 5th, 2013

New York Times – Article Search API v. 2

From the documentation page:

With the Article Search API, you can search New York Times articles from Sept. 18, 1851 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

The prior Article Search API described itself as:

With the Article Search API, you can search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

An addition of one hundred and eighty years of content for searching. No bad for a v. 2 release.

On cursory review, the API does appear to have changed significantly.

For example, the default fields for each request in version 1.0 were body, byline, date, title, url.

In version 2.0, the default fields returned are: web_url, snippet, lead_paragraph, abstract, print_page, blog, source, multimedia, headline, keywords, pub_date, document_type, news_desk, byline, type_of_material, _id, and word_count.

Five default fields for version 1.0 versus seventeen for version 2.0.

There are changes in terminology that will make discovering all the changes from version 1.0 to version 2.0 non-trivial.

Two fields that were present in version 1.0 that I don’t see (under another name?) in version 2.0 are:


DBpedia person names mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

The Times per_facet is often more comprehensive than dbpedia_resource, but the DBpedia name is easier to use with other data sources. For more information about linked open data, see


URLs to DBpedia person names that have been mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

For more information about linked open data, see

More documentation is promised, which I hope includes a mapping from version 1.0 to version 2.0.

Certainly looks like the basis for annotating content in the New York Times archives as part of a topic map.

Where users input their authentication details for the New York Times and/or other pay-per-view sites.

I can’t imagine anyone objecting to you helping them sell their content. ;-)

Mathbabe, the book

Saturday, May 4th, 2013

Mathbabe, the book by Cathy O’Neil.

From the post:

Thanks to a certain friendly neighborhood mathbabe reader, I’ve created this mathbabe book, which is essentially all of my posts that I ever wrote (I think. Note sure about that.) bundled together mostly by date and stuck in a huge pdf. It comes to 1,243 pages.

I did it using, which charges $0.99 per person who downloads the pdf. I’m not charging anything over that, because the way I look at it, it’s already free.

Speaking of that, I can see why I’d want a copy of this stuff, since it’s the best way I can think of to have a local version of a bunch of writing I’ve done over the past couple of years, but I don’t actually see why anyone else would. So please don’t think I’m expecting you to go buy this book! Even so, more than one reader has requested this, so here it is.

And one strange thing: I don’t think it required my password on to do it, I just needed the url for the RSS feed. So if you want to avoid paying 99 cents, I’m pretty sure you can go to leanpub or one of its competitors and create another, identical book using that same feed.

And for that matter you can also go build your own book about anything using these tools, which is pretty cool when you think about it. Readers, please tell me if there’s a way to do this that’s open source and free.

The Mathbabe “book” would be one that I would be interested in reading. I can think of several other blogs that fall into that category.

I hesitate to use the term “book” for such a collection.

Maybe I am confusing “monograph,” which is focused on a topic, with “book,” which applies to works beyond a certain length.

I think of my postings, once you remove the dated notice materials, as potential essays or chapters in a book.

But they would need fleshing out and polishing to qualify for more formal publication.


Thursday, March 21st, 2013


Short description:

Force11 (the Future of Research Communications and e-Scholarship) is a virtual community working to transform scholarly communications toward improved knowledge creation and sharing. Currently, we have 315 active members.

A longer description from the “about” page:

Research and scholarship lead to the generation of new knowledge. The dissemination of this knowledge has a fundamental impact on the ways in which society develops and progresses; and at the same time, it feeds back to improve subsequent research and scholarship. Here, as in so many other areas of human activity, the Internet is changing the way things work: it opens up opportunities for new processes that can accelerate the growth of knowledge, including the creation of new means of communicating that knowledge among researchers and within the wider community. Two decades of emergent and increasingly pervasive information technology have demonstrated the potential for far more effective scholarly communication. However, the use of this technology remains limited; research processes and the dissemination of research results have yet to fully assimilate the capabilities of the Web and other digital media. Producers and consumers remain wedded to formats developed in the era of print publication, and the reward systems for researchers remain tied to those delivery mechanisms.

Force11 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing. Individually and collectively, we aim to bring about a change in modern scholarly communications through the effective use of information technology. Force11 has grown from a small group of like-minded individuals into an open movement with clearly identified stakeholders associated with emerging technologies, policies, funding mechanisms and business models. While not disputing the expressive power of the written word to communicate complex ideas, our foundational assumption is that scholarly communication by means of semantically enhanced media-rich digital publishing is likely to have a greater impact than communication in traditional print media or electronic facsimiles of printed works. However, to date, online versions of ‘scholarly outputs’ have tended to replicate print forms, rather than exploit the additional functionalities afforded by the digital terrain. We believe that digital publishing of enhanced papers will enable more effective scholarly communication, which will also broaden to include, for example, the publication of software tools, and research communication by means of social media channels. We see Force11 as a starting point for a community that we hope will grow and be augmented by individual and collective efforts by the participants and others. We invite you to join and contribute to this enterprise.

Force11 grew out of the FORC Workshop held in Dagstuhl, Germany in August 2011.

FORCE11 is a movement of people interested in furthering the goals stated in the FORCE11 manifesto. An important part of our work is information gathering and dissemination. We invite anyone with relevant information to provide us links which we may include on our websites. We ask anyone with similar and/or related efforts to include links to FORCE11. We are a neutral information market, and do not endorse or seek to block any relevant work.

The Tools and Resources page is particularly interesting.

Current divisions are:

  • Alternative metrics
  • Author Identification
  • Annotation
  • Authoring tools
  • Citation analysis
  • Computational Linguistics/Text Mining Efforts
  • Data citation
  • Ereaders
  • Hypothesis/claim-based representation of the rhetorical structure of a scientific paper
  • Mapping initiatives between ontologies
  • Metadata standards and ontologies
  • Modular formats for science publishing
  • Open Citations
  • Peer Review: New Models
  • Provenance
  • Publications and reports relevant to scholarly digital publication and data
  • Semantic publishing initiatives and other enriched forms of publication
  • Structured Digital Abstracts – modeling science (especially biology) as triples
  • Structured experimental methods and workflows
  • Text Extraction

Topic maps fit into communication agendas quite easily.

The first step in communication is capturing something to say.

The second step in communication is expressing what has been captured so it can be understood by others (or yourself next week).

Topic maps do both quite nicely.

I first saw this in a tweet by Anita de Waard.

What tools do you use for information gathering and publishing?

Thursday, January 24th, 2013

What tools do you use for information gathering and publishing? by Mac Slocum.

From the post:

Many apps claim to be the pinnacle of content consumption and distribution. Most are a tangle of silly names and bad interfaces, but some of these tools are useful. A few are downright empowering.

Finding those good ones is the tricky part. I queried O’Reilly colleagues to find out what they use and why, and that process offered a decent starting point. We put all our notes together into this public Hackpad — feel free to add to it. I also went through and plucked out some of the top choices. Those are posted below.

Information gathering, however humble it may be, is the start of any topic map authoring project.

Mac asks for the tools you use every week.

Let’s not disappoint him!

Intelligent Content:…

Monday, January 14th, 2013

Intelligent Content: How APIs Can Supply the Right Content to the Right Reader by Adam DuVander.

From the post:

When you buy a car, it comes with a thick manual that probably sits in your glove box for the life of the car. The experience with a new luxury car may be much different. That printed, bound manual may only contain the information relevant to your car. No leather seats, no two page spread on caring for the hide. That’s intelligent content. And it’s an opportunity for APIs to help publishers go way beyond the cookie cutter printed book. It also happens to be an exciting conference coming to San Francisco in February.

It takes effort to segment content, especially when it was originally written as one piece. There are many benefits to those that put in the effort to think of their content as a platform. Publisher Pearson did this with a number of its titles, most notably with its Pearson Eyewitness Guides API. Using the API, developers can take what was a standalone travel book–say, the Eyewitness Guide to London–and query individual locations. One can imagine travel apps using the content to display great restaurants or landmarks that are nearby, for example.

Traditional publishing is a market that is ripe for disruption, characterized by Berkeley professor Robert Glushko co-creating a new approach to academic textbooks with his students in the Future of E-books. Glushko is one of the speakers at the Intelligent Content Conference, which will bring together content creators, technologists and publishers to discuss the many opportunities. Also speaking is Netflix’s Daniel Jacobson, who architected a large redesign of the Netflix API in order to support hundreds of devices. And yes, I will discuss the opportunities for content-as-a-service via APIs.

ProgrammableWeb readers can still get in on the early bird discount to attend Intelligent Content, which takes place February 7-8 in San Francisco.

San Francisco in February sounds like a good idea. Particularly if the future of publishing is on the agenda.

Would observe that “intelligent content” implies that some one, that is a person, has both authored the content and designed the API. Doesn’t happen auto-magically.

And with people involved, our old friend semantic diversity is going to be in the midst of the discussions, proposals and projects.

Reliable collation of data from different publishers (universities with multiple subscriptions should be pushing for this now) could make access seamless to end users.