## Archive for the ‘Publishing’ Category

### Digital Data Repositories in Chemistry…

Wednesday, July 1st, 2015

Abtract:

We discuss the concept of recasting the data-rich scientific journal article into two components, a narrative and separate data components, each of which is assigned a persistent digital object identifier. Doing so allows each of these components to exist in an environment optimized for purpose. We make use of a poorly-known feature of the handle system for assigning persistent identifiers that allows an individual data file from a larger file set to be retrieved according to its file name or its MIME type. The data objects allow facile visualization and retrieval for reuse of the data and facilitates other operations such as data mining. Examples from five recently published articles illustrate these concepts.

A very promising effort to integrate published content and electronic notebooks in chemistry. Encouraging that in addition to the technical and identity issues the authors also point out the lack of incentives for the extra work required to achieve useful integration.

Everyone agrees that deeper integration of resources in the sciences will be a game-changer but renewing the realization that there is no such thing as a free lunch, is an important step towards that goal.

This article easily repays a close read with interesting subject identity issues and the potential that topic maps would offer to such an effort.

### The peer review drugs don’t work [Faith Based Science]

Sunday, May 31st, 2015

The peer review drugs don’t work by Richard Smith.

From the post:

It is paradoxical and ironic that peer review, a process at the heart of science, is based on faith not evidence.

There is evidence on peer review, but few scientists and scientific editors seem to know of it – and what it shows is that the process has little if any benefit and lots of flaws.

Peer review is supposed to be the quality assurance system for science, weeding out the scientifically unreliable and reassuring readers of journals that they can trust what they are reading. In reality, however, it is ineffective, largely a lottery, anti-innovatory, slow, expensive, wasteful of scientific time, inefficient, easily abused, prone to bias, unable to detect fraud and irrelevant.

As Drummond Rennie, the founder of the annual International Congress on Peer Review and Biomedical Publication, says, “If peer review was a drug it would never be allowed onto the market.”

Cochrane reviews, which gather systematically all available evidence, are the highest form of scientific evidence. A 2007 Cochrane review of peer review for journals concludes: “At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research.”

We can see before our eyes that peer review doesn’t work because most of what is published in scientific journals is plain wrong. The most cited paper in Plos Medicine, which was written by Stanford University’s John Ioannidis, shows that most published research findings are false. Studies by Ioannidis and others find that studies published in “top journals” are the most likely to be inaccurate. This is initially surprising, but it is to be expected as the “top journals” select studies that are new and sexy rather than reliable. A series published in The Lancet in 2014 has shown that 85 per cent of medical research is wasted because of poor methods, bias and poor quality control. A study in Nature showed that more than 85 per cent of preclinical studies could not be replicated, the acid test in science.

I used to be the editor of the BMJ, and we conducted our own research into peer review. In one study we inserted eight errors into a 600 word paper and sent it 300 reviewers. None of them spotted more than five errors, and a fifth didn’t detect any. The median number spotted was two. These studies have been repeated many times with the same result. Other studies have shown that if reviewers are asked whether a study should be published there is little more agreement than would be expected by chance.

As you might expect, the humanities are lagging far behind the sciences in acknowledging that peer review is an exercise in social status rather than quality:

One of the changes I want to highlight is the way that “peer review” has evolved fairly quietly during the expansion of digital scholarship and pedagogy. Even though some scholars, such as Kathleen Fitzpatrick, are addressing the need for new models of peer review, recognition of the ways that this process has already been transformed in the digital realm remains limited. The 2010 Center for Studies in Higher Education (hereafter cited as Berkeley Report) comments astutely on the conventional role of peer review in the academy:

Among the reasons peer review persists to such a degree in the academy is that, when tied to the venue of a publication, it is an efficient indicator of the quality, relevance, and likely impact of a piece of scholarship. Peer review strongly influences reputation and opportunities. (Harley, et al 21)

These observations, like many of those presented in this document, contain considerable wisdom. Nevertheless, our understanding of peer review could use some reconsideration in light of the distinctive qualities and conditions associated with digital humanities.
…(Living in a Digital World: Rethinking Peer Review, Collaboration, and Open Access by Sheila Cavanagh.)

Can you think of another area where something akin to peer review is being touted?

What about internal guidelines of the CIA, NSA, FBI and secret courts reviewing actions by those agencies?

How do those differ from peer review, which is an acknowledged failure in science and should be acknowledged in the humanities?

They are quite similar in the sense that some secret group is empowered to make decisions that impact others and members of those groups, don’t want to relinquish those powers. Surprise, surprise.

Peer review should be scrapped across the board and replaced by tracked replication and use by others, both in the sciences and the humanities.

Government decisions should be open to review by all its citizens and not just a privileged few.

### How journals could “add value”

Thursday, May 28th, 2015

How journals could “add value” by Mark Watson.

From the post:

I wrote a piece for Genome Biology, you may have read it, about open science. I said a lot of things in there, but one thing I want to focus on is how journals could “add value”. As brief background: I think if you’re going to make money from academic publishing (and I have no problem if that’s what you want to do), then I think you should “add value”. Open science and open access is coming: open access journals are increasingly popular (and cheap!), preprint servers are more popular, green and gold open access policies are being implemented etc etc. Essentially, people are going to stop paying to access research articles pretty soon – think 5-10 year time frame.

So what can journals do to “add value”? What can they do that will make us want to pay to access them? Here are a few ideas, most of which focus on going beyond the PDF:

Humanities journals and their authors should take heed of these suggestions.

Not applicable in every case but certainly better than “journal editorial board as resume padding.”

### Companion to “Functional Programming in Scala”

Sunday, February 22nd, 2015

A companion booklet to “Functional Programming in Scala” by Rúnar Óli Bjarnason.

From the webpage:

This full colour syntax-highlighted booklet comprises all the chapter notes, hints, solutions to exercises, addenda, and errata for the book “Functional Programming in Scala” by Paul Chiusano and Runar Bjarnason. This material is freely available online, but is compiled here as a convenient companion to the book itself.

If you talk about supporting alternative forms of publishing, here is your chance to support an alternative form of publishing, financially.

Authors are going to gravitate to models that sustain their ability to write.

It is up to you what model that will be.

### The Many Faces of Science (the journal)

Saturday, February 21st, 2015

Andy Dalby tells a chilling tale in Why I will never trust Science again.

You need to read the full account but as a quick summary, Andy submits a paper to Science that is rejected and within weeks finds that Science accepted another paper, a deeply flawed one, reaching the same conclusion and when he notified Science, it was suggested he post an online comment. Andy’s account has quotes, links to references, etc.

That is one face of Science, secretive, arbitrary and restricted peer review of submissions. I say “restricted peer” because Science has a tiny number of reviewers, compared to your peers, who review submissions. If you want “peer review,” you should publish with an open source journal that enlists all of your peers as reviewers, not just a few.

There is another face of Science, which appeared last December without any trace of irony at all:

Does journal peer review miss best and brightest? by David Shultz, which reads in part:

Sometimes greatness is hard to spot. Before going on to lead the Chicago Bulls to six NBA championships, Michael Jordan was famously cut from his high school basketball team. Scientists often face rejection of their own—in their case, the gatekeepers aren’t high school coaches, but journal editors and peers they select to review submitted papers. A study published today indicates that this system does a reasonable job of predicting the eventual interest in most papers, but it may shoot an air ball when it comes to identifying really game-changing research.

There is a serious chink in the armor, though: All 14 of the most highly cited papers in the study were rejected by the three elite journals, and 12 of those were bounced before they could reach peer review. The finding suggests that unconventional research that falls outside the established lines of thought may be more prone to rejection from top journals, Siler says.

Science publishes research showing its methods are flawed and yet it takes no notice. Perhaps its rejection of Andy’s paper isn’t so strange. It must have not traveled far enough down the stairs.

I first saw Andy’s paper in a tweet by Mick Watson.

### Harry Potter eBooks

Sunday, February 1st, 2015

All the Harry Potter ebooks are now on subscription site Oyster by Laura Hazard Owen.

Laura reports the Harry Potter books are available on Oyster and Amazon. She says that Oyster has the spin-off titles from the original series where Amazon does not.

It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than $1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc. My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise? ### The Data Scientist Thursday, January 1st, 2015 The Data Scientist Kurt Kagel has setup a newspaper on Data Science and Computational Linguistics with the following editor’s note: I have been covering the electronic information space for more than thirty years, as writer, editor, programmer and information architect. This paper represents an experiment, a venue to explore Data Science and Computational Linguistics, as well as the world of IT in general. I’m still working out bugs and getting a feel for the platform, so look and feel (and content) will almost certainly change. If you are interested in featuring articles here, please contact me. It is based on paper.li, which automatically loads content into your newspaper. Not to mention you being able to load content as well. I have known Kurt for a number of years in the markup world and look forward to seeing how this newspaper develops. ### Why my book can be downloaded for free Saturday, December 6th, 2014 Why my book can be downloaded for free by Mark Dominus. From the post: People are frequently surprised that my book, Higher-Order Perl, is available as a free download from my web site. They ask if it spoiled my sales, or if it was hard to convince the publisher. No and no. I sent the HOP proposal to five publishers, expecting that two or three would turn it down, and that I would pick from the remaining two or three, but somewhat to my dismay, all five offered to publish it, and I had to decide who. One of the five publishers was Morgan Kaufmann. I had never heard of Morgan Kaufmann, but one day around 2002 I was reading the web site of Philip Greenspun. Greenspun was incredibly grouchy. He found fault with everything. But he had nothing but praise for Morgan Kaufmann. I thought that if Morgan Kaufmann had pleased Greenspun, who was nearly impossible to please, then they must be really good, so I sent them the proposal. (They eventually published the book, and did a superb job; I have never regretted choosing them.) But not only Morgan Kaufmann but four other publishers had offered to publish the book. So I asked a number of people for advice. I happened to be in London one week and Greenspun was giving a talk there, which I went to see. After the talk I introduced myself and asked for his advice about picking the publisher. Access to “free” electronic versions is on its way to becoming a norm, at least with some computer science publishers. Cambridge University Press, CUP, with Data Mining and Analysis: Fundamental Concepts and Algorithms and Basic Category Theory comes to mind. Other publishers with similar policies? Yes, I know there are CS publishers who want to make free with content of others, not so much with their own. Not the same thing. I first saw this in a tweet by Julia Evans. ### Nature makes all articles free to view [pay-to-say] Tuesday, December 2nd, 2014 Nature makes all articles free to view by Richard Van Noorden. From the post: All research papers from Nature will be made free to read in a proprietary screen-view format that can be annotated but not copied, printed or downloaded, the journal’s publisher Macmillan announced on 2 December. The content-sharing policy, which also applies to 48 other journals in Macmillan’s Nature Publishing Group (NPG) division, including Nature Genetics, Nature Medicine and Nature Physics, marks an attempt to let scientists freely read and share articles while preserving NPG’s primary source of income — the subscription fees libraries and individuals pay to gain access to articles. ReadCube, a software platform similar to Apple’s iTunes, will be used to host and display read-only versions of the articles’ PDFs. If the initiative becomes popular, it may also boost the prospects of the ReadCube platform, in which Macmillan has a majority investment. Annette Thomas, chief executive of Macmillan Science and Education, says that under the policy, subscribers can share any paper they have access to through a link to a read-only version of the paper’s PDF that can be viewed through a web browser. For institutional subscribers, that means every paper dating back to the journal’s foundation in 1869, while personal subscribers get access from 1997 on. Anyone can subsequently repost and share this link. Around 100 media outlets and blogs will also be able to share links to read-only PDFs. Although the screen-view PDF cannot be printed, it can be annotated — which the publisher says will provide a way for scientists to collaborate by sharing their comments on manuscripts. PDF articles can also be saved to a free desktop version of ReadCube, similarly to how music files can be saved in iTunes. I am hopeful that Macmillan will discover that allowing copying and printing are no threat to its income stream. Both are means of advertising for its journal at the expense of the user who copies a portion of the text for a citation or shares a printed copy with a colleague. Advertising paid for by users should be considered as a plus. The annotation step is a good one, although I would modify it in some respects. First I would make all articles accessible by default with annotation capabilities. Then I would grant anyone who registers say 12 comments per year for free and offer a lower-than-subscription-cost option for more than twelve comments on articles. If there is one thing I suspect users would be willing to pay for is the right to response to others in their fields. Either to response to articles and/or to other comments. Think of it as a pay-to-say market strategy. It could be an “additional” option to current institutional and personal subscriptions and thus an entirely new revenue stream for Macmillan. To head off expected objections by “free speech” advocates, I note that no journal publishes every letter to the editor. The right to free speech has never included the right to be heard on someone else’s dime. Annotation of Nature is on Macmillan’s dime. ### Basic Category Theory (Publish With CUP) Monday, July 28th, 2014 Basic Category Theory by Tom Leinster. From the webpage: Basic Category Theory is an introductory category theory textbook. Features: • It doesn’t assume much, either in terms of background or mathematical maturity. • It sticks to the basics. • It’s short. Advanced topics are omitted, leaving more space for careful explanations of the core concepts. I used earlier versions of the text to teach master’s-level courses at the University of Glasgow. The book is published by Cambridge University Press. You can find all the publication data, and buy it, at the book’s CUP web page. It was published on 24 July 2014 in hardback and e-book formats. The physical book should be in stock throughout Europe now, and worldwide by mid-September. Wherever you are, you can (pre)order it now from CUP or the usual online stores. By arrangement with CUP, a free online version will be released in January 2016. This will be not only freely downloadable but also freely editable, under a Creative Commons licence. So, for instance, if parts of the book are unsuitable for the course you’re teaching, or if you don’t like the notation, you can change it. More details will appear here when the time comes. Freely available as etext (6 months after hard copy release) and freely editable? Show of hands. How many publishers have you seen with those policies? I keep coming up with one, Cambridge University Press, CUP. As readers and authors we need to vote with our feet. Purchase from and publish with Cambridge University Press. It may take a while but other publishers may finally notice. ### TeX Live 2014 released… Thursday, June 19th, 2014 TeX Live 2014 released – what’s new by Stefan Kottwitz. Just enough to get you interested: • TeX and MetaFont updates • pdfTeX with “fake spaces” • LuaTeX, engine that can reside in CPU cache • numerous other changes and improvements Stefan covers these and more, while pointing you to the documentation for more details. Has anyone calculated how many decades TeX/LaTeX are ahead of the average word processor? Just curious. ### GitBook:… Tuesday, June 3rd, 2014 GitBook: Write Books using Markdown on OpenShift by Marek Jelen. From the post: GitBook is a tool for using Markdown to write books, which are converted to dynamic websites or exported to static formats like PDF. GitBook also integrates with Git and GitHub, adding a social element to the book creation process. If you are exporting your book into an HTML page, interactive aspects are also embedable. At the time of this writing, the system provides support for quizzes and JavaScript exercises. However, the tool is fully open source and written using Node.js, so you are free to extend the functionality to meet your needs. The Gitbook Learn Javascript is used as an example of production with GitBook. It’s readable but in terms of the publishing craft, the Mikraot Gedolot or The Art of Computer Programming (TAOCP), it’s not. Still, it may be useful for one-off exports from topic maps and other data sources. ### Madagascar Tuesday, May 20th, 2014 Madagascar From the webpage: Madagascar is an open-source software package for multidimensional data analysis and reproducible computational experiments. Its mission is to provide • a convenient and powerful environment • a convenient technology transfer tool for researchers working with digital image and data processing in geophysics and related fields. Technology developed using the Madagascar project management system is transferred in the form of recorded processing histories, which become “computational recipes” to be verified, exchanged, and modified by users of the system. Interesting tool for “reproducible documents” and data analysis. The file format, Regularly Sampled Format (RSF) sounds interesting: For data, Madagascar uses the Regularly Sampled Format (RSF), which is based on the concept of hypercubes (n-D arrays, or regularly sampled functions of several variables), much like the SEPlib (its closest relative), DDS, or the regularly-sampled version of the Javaseis format (SVF). Up to 9 dimensions are supported. For 1D it is conceptually analogous to a time series, for 2D to a raster image, and for 3D to a voxel volume. The format (actually a metaformat) makes use of a ASCII file with metadata (information about the data), including a pointer (in= parameter) to the location of the file with the actual data values. Irregularly sampled data are currently handled as a pair of datasets, one containing data and the second containing the corresponding irregular geometry information. Programs for conversion to and from other formats such as SEG-Y and SU are provided. (From Package Overview) In case you are interested SEG-Y and SU (Seismic Unix data format) are both formats for geophysical data. I first saw this in a tweet by Scientific Python. ### Thanks for Unguling Sunday, May 4th, 2014 Thanks-for-Ungluing launches! From the post: Great books deserve to be read by all of us, and we ought to be supporting the people who create these books. “Thanks for Ungluing” gives readers, authors, libraries and publishers a new way to build, sustain, and nourish the books we love. “Thanks for Ungluing” books are Creative Commons licensed and free to download. You don’t need to register or anything. But when you download, the creators can ask for your support. You can pay what you want. You can just scroll down and download the book. But when that book has become your friend, your advisor, your confidante, you’ll probably want to show your support and tell all your friends. We have some amazing creators participating in this launch. An attempt to address the problem of open access to published materials while at the same time compensating authors for their efforts. There is some recent material and old standbys like The Communist Manifesto by Karl Marx and Friedrich Engels. Which is good but having more recent works such as A Theology of Liberation by Gustavo Gutiérrez would be better. If you are thinking about writing a book on CS topics, please think about “Thanks for Ungluing” as an option. I first saw this in a tweet by Tim O’Reilly. ### Innovations in peer review:… Tuesday, April 22nd, 2014 Innovations in peer review: join a discussion with our Editors by Shreeya Nanda. From the post: Innovation may not be an adjective often associated with peer review, indeed commentators have claimed that peer review slows innovation and creativity in science. Preconceptions aside, publishers are attempting to shake things up a little, with various innovations in peer review, and these are the focus of a panel discussion at BioMed Central’s Editors’ Conference on Wednesday 23 April in Doha, Qatar. This follows our spirited discussion at the Experimental Biology conference in Boston last year. The discussion last year focussed on the limitations of the traditional peer review model (you can see a video here). This year we want to talk about innovations in the field and the ways in which the limitations are being addressed. Specifically, we will focus on open peer review, portable peer review – in which we help authors transfer their manuscript, often with reviewers’ reports, to a more appropriate journal – and decoupled peer review, which is undertaken by a company or organisation independent of, or on contract from, a journal. We will be live tweeting from the session at 11.15am local time (9.15am BST), so if you want to join the discussion or put questions to our panellists, please follow #BMCEds14. If you want to brush up on any or all of the models that we’ll be discussing, have a look at some of the content from around BioMed Central’s journals, blogs and Biome below: This post includes pointers to a number of useful resources concerning the debate around peer review. But there are oddities as well. First, the claim that peer review “slows innovation and creativity in science,” considering recent reports that peer review is no better than random chance for grants (…lotteries to pick NIH research-grant recipients and the not infrequent reports of false papers, fraud in actual papers, and a general inability to replicate research described in papers (Reproducible Research/(Mapping?)). A claim doesn’t have to appear on the alt.fringe.peer.review newsgroup (imaginary newsgroup) in order to be questionable on its face. Secondly, despite the invitation to follow and participate on Twitter, holding the meeting in Qartar means potential attendees from the United States will have to rise at: Eastern 4:15 AM (last year’s location) Central 3:15 AM Mountain 2:15 AM Western 1:15 AM I wonder what the participation levels will be from Boston last year as compared to Qatar this year? Nothing against non-United States locations but non-junket locations, such as major educational/research hubs, should be the sites for such meetings. ### …Textbooks for$0 [Digital Illiterates?]

Thursday, January 23rd, 2014

OpenStax College Textbooks for $0 From the about page: OpenStax College is a nonprofit organization committed to improving student access to quality learning materials. Our free textbooks are developed and peer-reviewed by educators to ensure they are readable, accurate, and meet the scope and sequence requirements of your course. Through our partnerships with companies and foundations committed to reducing costs for students, OpenStax College is working to improve access to higher education for all. OpenStax College is an initiative of Rice University and is made possible through the generous support of several philanthropic foundations. … Available now: • Anatomy and Physiology • Biology • College Physics • Concepts of Biology • Introduction to Sociology • Introductory Statistics Coming soon: • Chemistry • Precalculus • Principles of Economics • Principles of Macroeconomics • Principles of Microeconomics • Psychology • U.S. History Check to see if I missed any present or forthcoming texts on data science. No, I didn’t see any either. I looked at the Introduction to Sociology, which has a chapter on research methods, but no opportunity for students to experience data methods. Such as Statwing’s coverage of the General Social Survey (GSS), which I covered in Social Science Dataset Prize! Data science should not be an aside or extra course any more than language literacy is a requirement for an education. Consider writing or suggesting edits to subject textbooks to incorporate data science. Solely data science books will be necessary as well, just like there are advanced courses in English Literature. Let’s not graduate digital illiterates. For their sake and ours. I first saw this in a tweet by Michael Peter Edson. ### Composable languages for bioinformatics: the NYoSh experiment Wednesday, January 22nd, 2014 Composable languages for bioinformatics: the NYoSh experiment by Manuele Simi, Fabien Campagne​. (Simi M, Campagne F. (2014) Composable languages for bioinformatics: the NYoSh experiment. PeerJ 2:e241 http://dx.doi.org/10.7717/peerj.241) Abstract: Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org. In the discussion section of the paper the authors concede: We expect that widespread use of LWB will result in a multiplication of small languages, but in a manner that will increase language reuse and interoperability, rather than in the historical language fragmentation that has been observed with traditional language technology. Whenever I hear projections about the development of languages I am reminded the inventors of “SCSI” thought it should be pronounced “sexy,” whereas others preferred “scuzzi.” Doesn’t have the same ring to it does it? I am all in favor of domain specific languages (DSLs), but at the same time, am mindful that undocumented languages are in danger of becoming “dead” languages. ### Pay the Man! Saturday, January 18th, 2014 Books go online for free in Norway by Martin Chilton. From the post: More than 135,000 books still in copyright are going online for free in Norway after an innovative scheme by the National Library ensured that publishers and authors are paid for the project. The copyright-protected books (including translations of foreign books) have to be published before 2000 and the digitising has to be done with the consent of the copyright holders. National Library of Norway chief Vigdis Moe Skarstein said the project is the first of its kind to offer free online access to books still under copyright, which in Norway expires 70 years after the author’s death. Books by Stephen King, Ken Follett, John Steinbeck, Jo Nesbø, Karin Fossum and Nobel Laureate Knut Hamsun are among those in the scheme. The National Library has signed an agreement with Kopinor, an umbrella group representing major authors and publishers through 22 member organisations, and for every digitised page that goes online, the library pays a predetermined sum to Kopinor, which will be responsible for distributing the royalties among its members. The per-page amount was 0.36 Norwegian kroner (four pence), which will decrease to three pence when the online collection reaches its estimated target of 250,000 books. Norway has discovered a way out of the copyright conundrum, pay the man! Can you imagine the impact if the United States were to bulk license all of the Springer publications in digital format? Some immediate consequences: 1. All citizen-innovators would have access to a vast library of high quality content, without restriction by place of employment or academic status. 2. Taking over the cost of Springer materials would act as a additional funding for libraries with existing subscriptions. 3. It would even out access to Springer materials across the educational system in the U.S. 4. It would reduce the administrative burden on both libraries and Springer by consolidating all existing accounts into one account. 5. Springer could offer “advanced” services in addition to basic search and content for additional fees, leveraged on top of the standard content. 6. Other vendors could offer “advanced” services for fees leveraged on top of standard content. I have nothing against the many “open access” journals but bear in mind the vast legacy of science and technology that remains the property of Springer and others. The principal advantage that I would pitch to Springer would be the availability of its content under bulk licensing would result in other vendors building services on top of that content. What advantage is there for Springer? Imagine that you can be either a road (content) or a convenience store (app. built on content) next to the road. Which one gets maintained longer? Everybody has an interest in maintaining and even expanding the road. By becoming part of the intellectual infrastructure of education, industry and government, even more than it is now, Springer would secure a very stable and lucrative future. Put that way, I would much rather be the road than the convenience store. You? ### SCOAP3 Saturday, August 24th, 2013 SCOAP3 I didn’t recognize the acronym either. From the “about” page: The Open Access (OA) tenets of granting unrestricted access to the results of publicly-funded research are in contrast with current models of scientific publishing, where access is restricted to journal customers. At the same time, subscription costs increase and put considerable strain on libraries, forcing them to cancel an increasing number of journals subscriptions. This situation is particularly acute in fields like High-Energy Physics (HEP), where pre-prints describing scientific results are timely available online. There is a growing concern within the academic community that the future of high-quality journals, and the peer-review system they administer, is at risk. To address this situation for HEP and, as an experiment, Science at large, a new model for OA publishing has emerged: SCOAP3 (Sponsoring Consortium for Open Access Publishing in Particle Physics). In this model, HEP funding agencies and libraries, which today purchase journal subscriptions to implicitly support the peer-review service, federate to explicitly cover its cost, while publishers make the electronic versions of their journals free to read. Authors are not directly charged to publish their articles OA. SCOAP3 will, for the first time, link quality and price, stimulating competition and enabling considerable medium- and long-term savings. Today, most publishers quote a price in the range of 1’000–2’000 Euros per published article. On this basis, we estimate that the annual budget for the transition of HEP publishing to OA would amount to a maximum of 10 Million Euros/year, sensibly lower than the estimated global expenditure in subscription to HEP journals. Each SCOAP3 partner will finance its contribution by canceling journal subscriptions. Each country will contribute according to its share of HEP publishing. The transition to OA will be facilitated by the fact that the large majority of HEP articles are published in just six peer-reviewed journals. Of course, the SCOAP3 model is open to any, present or future, high-quality HEP journal aiming at a dynamic market with healthy competition and broader choice. HEP funding agencies and libraries are currently signing Expressions of Interest for the financial backing of the consortium. A tendering procedure will then take place. Provided that SCOAP3 funding partners are prepared to engage in long-term commitments, many publishers are expected to be ready to enter into negotiations. The example of SCOAP3 could be rapidly followed by other fields, directly related to HEP, such as nuclear physics or astro-particle physics, also similarly compact and organized with a reasonable number of journals. Models like this one may result in increasing the amount of information available for topic mapping and the amount of semantic diversity in traditional search results. Delivery models are changing but search interfaces leave us to our own devices at the document level. If we are going to have better access in the physical sense, shouldn’t we be working on better access in the content sense? PS: To show this movement has legs, consider the recent agreement of Elsevier, IOPp and Springer to participate. ### Information Extraction from the Internet Saturday, August 24th, 2013 Information Extraction from the Internet by Nan Tang. From the description at Amazon ($116.22):

As the Internet continues to become part of our lives, there now exists an overabundance of reliable information sources on this medium. The temporal and cognitive resources of human beings, however, do not change. “Information Extraction from the Internet” provides methods and tools for Web information extraction and retrieval. Success in this area will greatly enhance business processes and provide information seekers new tools that allow them to reduce their searching time and cost involvement. This book focuses on the latest approaches for Web content extraction, and analyzes the limitations of existing technology and solutions. “Information Extraction from the Internet” includes several interesting and popular topics that are being widely discussed in the area of information extraction: data spasity and field-associated knowledge (Chapters 1–2), Web agent design and mining components (Chapters 3–4), extraction skills on various documents (Chapters 5–7), duplicate detection for music documents (Chapter 8), name disambiguation in digital libraries using Web information (Chapter 9), Web personalization and user-behavior issues (Chapters 10–11), and information retrieval case studies (Chapters 12–14). “Information Extraction from the Internet” is suitable for advanced undergraduate students and postgraduate students. It takes a practical approach rather than a conceptual approach. Moreover, it offers a truly reader-friendly way to get to the subject related to information extraction, making it the ideal resource for any student new to this subject, and providing a definitive guide to anyone in this vibrant and evolving discipline. This book is an invaluable companion for students, from their first encounter with the subject to more advanced studies, while the full-color artworks are designed to present the key concepts with simplicity, clarity, and consistency.

I discovered this volume while searching for the publisher of: On-demand Synonym Extraction Using Suffix Arrays.

As you can see from the description, a wide ranging coverage of information extraction interests.

iConcepts Press has a number of books and periodicals you may find interesting.

### Semantic Search… [Call for Papers]

Saturday, August 3rd, 2013

From the post:

I am currently drafting the Call for Papers for a special issue of the Aslib Journal of Information Management (formerly Aslib Proceedings) which I am guest editing alongside Dr Ulrike Spree from the University of Hamburg.

Ulrike is the academic expert, while I am providing the practitioner perspective. I am very keen to include practical case studies, so if you have an interesting project or comments on a project but have never written an academic paper before, don’t be put off. I will be happy to advise on style, referencing, etc.

## Suggested Topics

Themes Ulrike is interested in include:

• current trends in semantic search
• best practice – how far along the road from ‘early adopters’ to ‘mainstream users’ has semantic search gone so far
• usability of semantic search
• visualisation and semantic search
• the relationship between new trends in knowledge organisation and semantic search, such as vocabulary norms (like ISO 25964 “Thesauri for information retrieval“) and the potential of semantic search from a more critical perspective – what, for example, are the criteria for judging quality?

Themes I am interested in include:

• the history of semantic search – how the latest techniques and technologies have come out of developments over the last 5, 10, 20, 100, 2000… years
• how semantic search techniques and technologies are being used in practice
• how semantic technologies are fostering a need for cross-industry collaboration and standardization
• practical problems in brokering consensus and agreement – defining terms and classes, etc.
• differences between web-scale, enterprise scale, and collection-specific scale techniques
• curation and management of ontologies.

However, we are open to suggestions, especially as it is such a broad topic, there are so many aspects that could be covered.

Fran doesn’t mention a deadline but I will ask and update here when I get it.

Sounds like a venue that would welcome papers on topic maps.

Yes?

### Proceedings of the 3rd Workshop on Semantic Publishing

Sunday, July 7th, 2013

Proceedings of the 3rd Workshop on Semantic Publishing edited by: Alexander García Castro, Christoph Lange, Phillip Lord, and Robert Stevens.

Research Papers

1. Twenty-Five Shades of Greycite: Semantics for Referencing and Preservation Phillip Lord
2. Systematic Reviews as an Interface to the Web of (Trial) Data: using PICO as an Ontology for Knowledge Synthesis in Evidence-based Healthcare Research Chris Mavergames
3. Towards Linked Research Data: an Institutional Approach Najko JahnFlorian Lier, Thilo Paul-Stueve, Christian Pietsch, Philipp Cimiano
4. Repurposing Benchmark Corpora for Reconstructing Provenance Sara Magliacane.
5. Connections across Scientific Publications based on Semantic Annotations Leyla Jael García Castro, Rafael Berlanga, Dietrich Rebholz-Schuhmann, Alexander Garcia.
6. Towards the Automatic Identification of the Nature of Citations Angelo Di Iorio, Andrea Giovanni Nuzzolese, Silvio Peroni.
7. How Reliable is Your Workflow: Monitoring Decay in Scholarly Publications José Manuel Gómez-Pérez, Esteban García-Cuesta, Jun Zhao, Aleix Garrido, José Enrique Ruiz.

Polemics (published externally)

1. Flash Mob Science, Open Innovation and Semantic Publishing Hal Warren, Bryan Dennis, Eva Winer.
2. Science, Semantic Web and Excuses Idafen Santana Pérez, Daniel Garijo, Oscar Corcho.
3. Polemic on Future of Scholarly Publishing/Semantic Publishing Chris Mavergames.

### Annual update released for TeX Live (2013)

Monday, June 24th, 2013

Annual update released for TeX Live

From the post:

The developers of the TeX Live distribution of LaTeX have released their annual update. However, after 17 years of development, the changes in TeX Live 2013 mostly amount to technical details.

The texmf/ directory, for example, has been merged into texmf-dist/, while the TEXMFMAIN and TEXMFDIST Kpathsea variables now point to texmf-dist. The developers have also merged several language collections for easier installation. Users will find native support for PNG output and floating-point numbers in MetaPost. LuaTeX now uses version 5.2 of Lua and includes a new library (pdfscanner) for processing external PDF data, and xdvi now uses freetype instead of t1lib for rendering.

Several updates have been made to XeTeX: HarfBuzz is now used instead of ICU for font layout and has been combined with Graphite2 to replace SilGraphite for Graphite layout; support has also been improved for OpenType.

TeX Live 2013 is open source software, licensed under a combination of the LaTeX Project Public License (LPPL) and a number of other licences. The software works on all of the major operating systems, although the program no longer runs on AIX systems using PowerPCs. Mac OS X users may want to take a look at MacTeX, which is based on – and has been updated in line with – TeX Live.

No major changes but we should be grateful for the effort that resulted in this release.

### Journal of Data Mining & Digital Humanities

Monday, May 27th, 2013

Journal of Data Mining & Digital Humanities

From the webpage:

Data mining, an interdisciplinary subfield of computer science, involving the methods at the intersection of artificial intelligence, machine learning and database systems. The Journal of Data Mining & Digital Humanities concerned with the intersection of computing and the disciplines of the humanities, with tools provided by computing such as data visualisation, information retrieval, statistics, text mining by publishing scholarly work beyond the traditional humanities.

The journal includes a wide range of fields in its discipline to create a platform for the authors to make their contribution towards the journal and the editorial office promises a peer review process for the submitted manuscripts for the quality of publishing.

Journal of Data Mining & Digital Humanities is an Open Access journal and aims to publish most complete and reliable source of information on the discoveries and current developments in the mode of original articles, review articles, case reports, short communications, etc. in all areas of the field and making them freely available through online without any restrictions or any other subscriptions to researchers worldwide.

The journal is using Editorial Tracking System for quality in review process. Editorial Tracking is an online manuscript submission, review and tracking systems. Review processing is performed by the editorial board members of Journal of Data Mining & Digital Humanities or outside experts; at least two independent reviewers approval followed by editor approval is required for acceptance of any citable manuscript. Authors may submit manuscripts and track their progress through the system, hopefully to publication. Reviewers can download manuscripts and submit their opinions to the editor. Editors can manage the whole submission/review/revise/publish process.

KDNuggets reports the first issue of JDMDH will appear in August, 2013. Deadline for submissions for the first issue: 25 June 2013.

A great venue for topic map focused papers. (When you are not writing for the Economist.)

### New York Times – Article Search API v. 2

Sunday, May 5th, 2013

New York Times – Article Search API v. 2

From the documentation page:

With the Article Search API, you can search New York Times articles from Sept. 18, 1851 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

The prior Article Search API described itself as:

With the Article Search API, you can search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

An addition of one hundred and eighty years of content for searching. No bad for a v. 2 release.

On cursory review, the API does appear to have changed significantly.

For example, the default fields for each request in version 1.0 were body, byline, date, title, url.

In version 2.0, the default fields returned are: web_url, snippet, lead_paragraph, abstract, print_page, blog, source, multimedia, headline, keywords, pub_date, document_type, news_desk, byline, type_of_material, _id, and word_count.

Five default fields for version 1.0 versus seventeen for version 2.0.

There are changes in terminology that will make discovering all the changes from version 1.0 to version 2.0 non-trivial.

Two fields that were present in version 1.0 that I don’t see (under another name?) in version 2.0 are:

dbpedia_resource:

DBpedia person names mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

The Times per_facet is often more comprehensive than dbpedia_resource, but the DBpedia name is easier to use with other data sources. For more information about linked open data, see data.nytimes.com.

dbpedia_resource_url:

URLs to DBpedia person names that have been mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

More documentation is promised, which I hope includes a mapping from version 1.0 to version 2.0.

Certainly looks like the basis for annotating content in the New York Times archives as part of a topic map.

Where users input their authentication details for the New York Times and/or other pay-per-view sites.

I can’t imagine anyone objecting to you helping them sell their content.

### Mathbabe, the book

Saturday, May 4th, 2013

Mathbabe, the book by Cathy O’Neil.

From the post:

Thanks to a certain friendly neighborhood mathbabe reader, I’ve created this mathbabe book, which is essentially all of my posts that I ever wrote (I think. Note sure about that.) bundled together mostly by date and stuck in a huge pdf. It comes to 1,243 pages.

I did it using leanpub.com, which charges \$0.99 per person who downloads the pdf. I’m not charging anything over that, because the way I look at it, it’s already free.

Speaking of that, I can see why I’d want a copy of this stuff, since it’s the best way I can think of to have a local version of a bunch of writing I’ve done over the past couple of years, but I don’t actually see why anyone else would. So please don’t think I’m expecting you to go buy this book! Even so, more than one reader has requested this, so here it is.

And one strange thing: I don’t think it required my password on WordPress.com to do it, I just needed the url for the RSS feed. So if you want to avoid paying 99 cents, I’m pretty sure you can go to leanpub or one of its competitors and create another, identical book using that same feed.

And for that matter you can also go build your own book about anything using these tools, which is pretty cool when you think about it. Readers, please tell me if there’s a way to do this that’s open source and free.

The Mathbabe “book” would be one that I would be interested in reading. I can think of several other blogs that fall into that category.

I hesitate to use the term “book” for such a collection.

Maybe I am confusing “monograph,” which is focused on a topic, with “book,” which applies to works beyond a certain length.

I think of my postings, once you remove the dated notice materials, as potential essays or chapters in a book.

But they would need fleshing out and polishing to qualify for more formal publication.