Archive for the ‘Publishing’ Category

Basic Category Theory (Publish With CUP)

Monday, July 28th, 2014

Basic Category Theory by Tom Leinster.

From the webpage:

Basic Category Theory is an introductory category theory textbook. Features:

  • It doesn’t assume much, either in terms of background or mathematical maturity.
  • It sticks to the basics.
  • It’s short.

Advanced topics are omitted, leaving more space for careful explanations of the core concepts. I used earlier versions of the text to teach master’s-level courses at the University of Glasgow.

The book is published by Cambridge University Press. You can find all the publication data, and buy it, at the book’s CUP web page.

It was published on 24 July 2014 in hardback and e-book formats. The physical book should be in stock throughout Europe now, and worldwide by mid-September. Wherever you are, you can (pre)order it now from CUP or the usual online stores.

By arrangement with CUP, a free online version will be released in January 2016. This will be not only freely downloadable but also freely editable, under a Creative Commons licence. So, for instance, if parts of the book are unsuitable for the course you’re teaching, or if you don’t like the notation, you can change it. More details will appear here when the time comes.

Freely available as etext (6 months after hard copy release) and freely editable?

Show of hands. How many publishers have you seen with those policies?

I keep coming up with one, Cambridge University Press, CUP.

As readers and authors we need to vote with our feet. Purchase from and publish with Cambridge University Press.

It may take a while but other publishers may finally notice.

TeX Live 2014 released…

Thursday, June 19th, 2014

TeX Live 2014 released – what’s new by Stefan Kottwitz.

Just enough to get you interested:

  • TeX and MetaFont updates
  • pdfTeX with “fake spaces”
  • LuaTeX, engine that can reside in CPU cache
  • numerous other changes and improvements

Stefan covers these and more, while pointing you to the documentation for more details.

Has anyone calculated how many decades TeX/LaTeX are ahead of the average word processor?

Just curious.

GitBook:…

Tuesday, June 3rd, 2014

GitBook: Write Books using Markdown on OpenShift by Marek Jelen.

From the post:

GitBook is a tool for using Markdown to write books, which are converted to dynamic websites or exported to static formats like PDF. GitBook also integrates with Git and GitHub, adding a social element to the book creation process.

If you are exporting your book into an HTML page, interactive aspects are also embedable. At the time of this writing, the system provides support for quizzes and JavaScript exercises. However, the tool is fully open source and written using Node.js, so you are free to extend the functionality to meet your needs.

The Gitbook Learn Javascript is used as an example of production with GitBook.

It’s readable but in terms of the publishing craft, the Mikraot Gedolot or The Art of Computer Programming (TAOCP), it’s not.

Still, it may be useful for one-off exports from topic maps and other data sources.

Madagascar

Tuesday, May 20th, 2014

Madagascar

From the webpage:

Madagascar is an open-source software package for multidimensional data analysis and reproducible computational experiments. Its mission is to provide

  • a convenient and powerful environment
  • a convenient technology transfer tool

for researchers working with digital image and data processing in geophysics and related fields. Technology developed using the Madagascar project management system is transferred in the form of recorded processing histories, which become “computational recipes” to be verified, exchanged, and modified by users of the system.

Interesting tool for “reproducible documents” and data analysis.

The file format, Regularly Sampled Format (RSF) sounds interesting:

For data, Madagascar uses the Regularly Sampled Format (RSF), which is based on the concept of hypercubes (n-D arrays, or regularly sampled functions of several variables), much like the SEPlib (its closest relative), DDS, or the regularly-sampled version of the Javaseis format (SVF). Up to 9 dimensions are supported. For 1D it is conceptually analogous to a time series, for 2D to a raster image, and for 3D to a voxel volume. The format (actually a metaformat) makes use of a ASCII file with metadata (information about the data), including a pointer (in= parameter) to the location of the file with the actual data values. Irregularly sampled data are currently handled as a pair of datasets, one containing data and the second containing the corresponding irregular geometry information. Programs for conversion to and from other formats such as SEG-Y and SU are provided. (From Package Overview)

In case you are interested SEG-Y and SU (Seismic Unix data format) are both formats for geophysical data.

I first saw this in a tweet by Scientific Python.

Thanks for Unguling

Sunday, May 4th, 2014

Thanks-for-Ungluing launches!

From the post:

Great books deserve to be read by all of us, and we ought to be supporting the people who create these books. “Thanks for Ungluing” gives readers, authors, libraries and publishers a new way to build, sustain, and nourish the books we love.

“Thanks for Ungluing” books are Creative Commons licensed and free to download. You don’t need to register or anything. But when you download, the creators can ask for your support. You can pay what you want. You can just scroll down and download the book. But when that book has become your friend, your advisor, your confidante, you’ll probably want to show your support and tell all your friends.

We have some amazing creators participating in this launch.

An attempt to address the problem of open access to published materials while at the same time compensating authors for their efforts.

There is some recent material and old standbys like The Communist Manifesto by Karl Marx and Friedrich Engels. Which is good but having more recent works such as A Theology of Liberation by Gustavo Gutiérrez would be better.

If you are thinking about writing a book on CS topics, please think about “Thanks for Ungluing” as an option.

I first saw this in a tweet by Tim O’Reilly.

Innovations in peer review:…

Tuesday, April 22nd, 2014

Innovations in peer review: join a discussion with our Editors by Shreeya Nanda.

From the post:

Innovation may not be an adjective often associated with peer review, indeed commentators have claimed that peer review slows innovation and creativity in science. Preconceptions aside, publishers are attempting to shake things up a little, with various innovations in peer review, and these are the focus of a panel discussion at BioMed Central’s Editors’ Conference on Wednesday 23 April in Doha, Qatar. This follows our spirited discussion at the Experimental Biology conference in Boston last year.

The discussion last year focussed on the limitations of the traditional peer review model (you can see a video here). This year we want to talk about innovations in the field and the ways in which the limitations are being addressed. Specifically, we will focus on open peer review, portable peer review – in which we help authors transfer their manuscript, often with reviewers’ reports, to a more appropriate journal – and decoupled peer review, which is undertaken by a company or organisation independent of, or on contract from, a journal.

We will be live tweeting from the session at 11.15am local time (9.15am BST), so if you want to join the discussion or put questions to our panellists, please follow #BMCEds14. If you want to brush up on any or all of the models that we’ll be discussing, have a look at some of the content from around BioMed Central’s journals, blogs and Biome below:

This post includes pointers to a number of useful resources concerning the debate around peer review.

But there are oddities as well. First, the claim that peer review “slows innovation and creativity in science,” considering recent reports that peer review is no better than random chance for grants (…lotteries to pick NIH research-grant recipients and the not infrequent reports of false papers, fraud in actual papers, and a general inability to replicate research described in papers (Reproducible Research/(Mapping?)).

A claim doesn’t have to appear on the alt.fringe.peer.review newsgroup (imaginary newsgroup) in order to be questionable on its face.

Secondly, despite the invitation to follow and participate on Twitter, holding the meeting in Qartar means potential attendees from the United States will have to rise at:

Eastern 4:15 AM (last year’s location)

Central 3:15 AM

Mountain 2:15 AM

Western 1:15 AM

I wonder what the participation levels will be from Boston last year as compared to Qatar this year?

Nothing against non-United States locations but non-junket locations, such as major educational/research hubs, should be the sites for such meetings.

…Textbooks for $0 [Digital Illiterates?]

Thursday, January 23rd, 2014

OpenStax College Textbooks for $0

From the about page:

OpenStax College is a nonprofit organization committed to improving student access to quality learning materials. Our free textbooks are developed and peer-reviewed by educators to ensure they are readable, accurate, and meet the scope and sequence requirements of your course. Through our partnerships with companies and foundations committed to reducing costs for students, OpenStax College is working to improve access to higher education for all.

OpenStax College is an initiative of Rice University and is made possible through the generous support of several philanthropic foundations. …

Available now:

  • Anatomy and Physiology
  • Biology
  • College Physics
  • Concepts of Biology
  • Introduction to Sociology
  • Introductory Statistics

Coming soon:

  • Chemistry
  • Precalculus
  • Principles of Economics
  • Principles of Macroeconomics
  • Principles of Microeconomics
  • Psychology
  • U.S. History

Check to see if I missed any present or forthcoming texts on data science. No, I didn’t see any either.

I looked at the Introduction to Sociology, which has a chapter on research methods, but no opportunity for students to experience data methods. Such as Statwing’s coverage of the General Social Survey (GSS), which I covered in Social Science Dataset Prize!

Data science should not be an aside or extra course any more than language literacy is a requirement for an education.

Consider writing or suggesting edits to subject textbooks to incorporate data science. Solely data science books will be necessary as well, just like there are advanced courses in English Literature.

Let’s not graduate digital illiterates. For their sake and ours.

I first saw this in a tweet by Michael Peter Edson.

Composable languages for bioinformatics: the NYoSh experiment

Wednesday, January 22nd, 2014

Composable languages for bioinformatics: the NYoSh experiment by Manuele Simi, Fabien Campagne​. (Simi M, Campagne F. (2014) Composable languages for bioinformatics: the NYoSh experiment. PeerJ 2:e241 http://dx.doi.org/10.7717/peerj.241)

Abstract:

Language WorkBenches (LWBs) are software engineering tools that help domain experts develop solutions to various classes of problems. Some of these tools focus on non-technical users and provide languages to help organize knowledge while other workbenches provide means to create new programming languages. A key advantage of language workbenches is that they support the seamless composition of independently developed languages. This capability is useful when developing programs that can benefit from different levels of abstraction. We reasoned that language workbenches could be useful to develop bioinformatics software solutions. In order to evaluate the potential of language workbenches in bioinformatics, we tested a prominent workbench by developing an alternative to shell scripting. To illustrate what LWBs and Language Composition can bring to bioinformatics, we report on our design and development of NYoSh (Not Your ordinary Shell). NYoSh was implemented as a collection of languages that can be composed to write programs as expressive and concise as shell scripts. This manuscript offers a concrete illustration of the advantages and current minor drawbacks of using the MPS LWB. For instance, we found that we could implement an environment-aware editor for NYoSh that can assist the programmers when developing scripts for specific execution environments. This editor further provides semantic error detection and can be compiled interactively with an automatic build and deployment system. In contrast to shell scripts, NYoSh scripts can be written in a modern development environment, supporting context dependent intentions and can be extended seamlessly by end-users with new abstractions and language constructs. We further illustrate language extension and composition with LWBs by presenting a tight integration of NYoSh scripts with the GobyWeb system. The NYoSh Workbench prototype, which implements a fully featured integrated development environment for NYoSh is distributed at http://nyosh.campagnelab.org.

In the discussion section of the paper the authors concede:

We expect that widespread use of LWB will result in a multiplication of small languages, but in a manner that will increase language reuse and interoperability, rather than in the historical language fragmentation that has been observed with traditional language technology.

Whenever I hear projections about the development of languages I am reminded the inventors of “SCSI” thought it should be pronounced “sexy,” whereas others preferred “scuzzi.” Doesn’t have the same ring to it does it?

I am all in favor of domain specific languages (DSLs), but at the same time, am mindful that undocumented languages are in danger of becoming “dead” languages.

Pay the Man!

Saturday, January 18th, 2014

Books go online for free in Norway by Martin Chilton.

From the post:

More than 135,000 books still in copyright are going online for free in Norway after an innovative scheme by the National Library ensured that publishers and authors are paid for the project.

The copyright-protected books (including translations of foreign books) have to be published before 2000 and the digitising has to be done with the consent of the copyright holders.

National Library of Norway chief Vigdis Moe Skarstein said the project is the first of its kind to offer free online access to books still under copyright, which in Norway expires 70 years after the author’s death. Books by Stephen King, Ken Follett, John Steinbeck, Jo Nesbø, Karin Fossum and Nobel Laureate Knut Hamsun are among those in the scheme.

The National Library has signed an agreement with Kopinor, an umbrella group representing major authors and publishers through 22 member organisations, and for every digitised page that goes online, the library pays a predetermined sum to Kopinor, which will be responsible for distributing the royalties among its members. The per-page amount was 0.36 Norwegian kroner (four pence), which will decrease to three pence when the online collection reaches its estimated target of 250,000 books.

Norway has discovered a way out of the copyright conundrum, pay the man!

Can you imagine the impact if the United States were to bulk license all of the Springer publications in digital format?

Some immediate consequences:

  1. All citizen-innovators would have access to a vast library of high quality content, without restriction by place of employment or academic status.
  2. Taking over the cost of Springer materials would act as a additional funding for libraries with existing subscriptions.
  3. It would even out access to Springer materials across the educational system in the U.S.
  4. It would reduce the administrative burden on both libraries and Springer by consolidating all existing accounts into one account.
  5. Springer could offer “advanced” services in addition to basic search and content for additional fees, leveraged on top of the standard content.
  6. Other vendors could offer “advanced” services for fees leveraged on top of standard content.

I have nothing against the many “open access” journals but bear in mind the vast legacy of science and technology that remains the property of Springer and others.

The principal advantage that I would pitch to Springer would be the availability of its content under bulk licensing would result in other vendors building services on top of that content.

What advantage is there for Springer? Imagine that you can be either a road (content) or a convenience store (app. built on content) next to the road. Which one gets maintained longer?

Everybody has an interest in maintaining and even expanding the road. By becoming part of the intellectual infrastructure of education, industry and government, even more than it is now, Springer would secure a very stable and lucrative future.

Put that way, I would much rather be the road than the convenience store.

You?

SCOAP3

Saturday, August 24th, 2013

SCOAP3

I didn’t recognize the acronym either. ;-)

From the “about” page:

The Open Access (OA) tenets of granting unrestricted access to the results of publicly-funded research are in contrast with current models of scientific publishing, where access is restricted to journal customers. At the same time, subscription costs increase and put considerable strain on libraries, forcing them to cancel an increasing number of journals subscriptions. This situation is particularly acute in fields like High-Energy Physics (HEP), where pre-prints describing scientific results are timely available online. There is a growing concern within the academic community that the future of high-quality journals, and the peer-review system they administer, is at risk.

To address this situation for HEP and, as an experiment, Science at large, a new model for OA publishing has emerged: SCOAP3 (Sponsoring Consortium for Open Access Publishing in Particle Physics). In this model, HEP funding agencies and libraries, which today purchase journal subscriptions to implicitly support the peer-review service, federate to explicitly cover its cost, while publishers make the electronic versions of their journals free to read. Authors are not directly charged to publish their articles OA.

SCOAP3 will, for the first time, link quality and price, stimulating competition and enabling considerable medium- and long-term savings. Today, most publishers quote a price in the range of 1’000–2’000 Euros per published article. On this basis, we estimate that the annual budget for the transition of HEP publishing to OA would amount to a maximum of 10 Million Euros/year, sensibly lower than the estimated global expenditure in subscription to HEP journals.

Each SCOAP3 partner will finance its contribution by canceling journal subscriptions. Each country will contribute according to its share of HEP publishing. The transition to OA will be facilitated by the fact that the large majority of HEP articles are published in just six peer-reviewed journals. Of course, the SCOAP3 model is open to any, present or future, high-quality HEP journal aiming at a dynamic market with healthy competition and broader choice.

HEP funding agencies and libraries are currently signing Expressions of Interest for the financial backing of the consortium. A tendering procedure will then take place. Provided that SCOAP3 funding partners are prepared to engage in long-term commitments, many publishers are expected to be ready to enter into negotiations.

The example of SCOAP3 could be rapidly followed by other fields, directly related to HEP, such as nuclear physics or astro-particle physics, also similarly compact and organized with a reasonable number of journals.

Models like this one may result in increasing the amount of information available for topic mapping and the amount of semantic diversity in traditional search results.

Delivery models are changing but search interfaces leave us to our own devices at the document level.

If we are going to have better access in the physical sense, shouldn’t we be working on better access in the content sense?

PS: To show this movement has legs, consider the recent agreement of Elsevier, IOPp and Springer to participate.

Information Extraction from the Internet

Saturday, August 24th, 2013

Information Extraction from the Internet by Nan Tang.

From the description at Amazon ($116.22):

As the Internet continues to become part of our lives, there now exists an overabundance of reliable information sources on this medium. The temporal and cognitive resources of human beings, however, do not change. “Information Extraction from the Internet” provides methods and tools for Web information extraction and retrieval. Success in this area will greatly enhance business processes and provide information seekers new tools that allow them to reduce their searching time and cost involvement. This book focuses on the latest approaches for Web content extraction, and analyzes the limitations of existing technology and solutions. “Information Extraction from the Internet” includes several interesting and popular topics that are being widely discussed in the area of information extraction: data spasity and field-associated knowledge (Chapters 1–2), Web agent design and mining components (Chapters 3–4), extraction skills on various documents (Chapters 5–7), duplicate detection for music documents (Chapter 8), name disambiguation in digital libraries using Web information (Chapter 9), Web personalization and user-behavior issues (Chapters 10–11), and information retrieval case studies (Chapters 12–14). “Information Extraction from the Internet” is suitable for advanced undergraduate students and postgraduate students. It takes a practical approach rather than a conceptual approach. Moreover, it offers a truly reader-friendly way to get to the subject related to information extraction, making it the ideal resource for any student new to this subject, and providing a definitive guide to anyone in this vibrant and evolving discipline. This book is an invaluable companion for students, from their first encounter with the subject to more advanced studies, while the full-color artworks are designed to present the key concepts with simplicity, clarity, and consistency.

I discovered this volume while searching for the publisher of: On-demand Synonym Extraction Using Suffix Arrays.

As you can see from the description, a wide ranging coverage of information extraction interests.

All of the chapters are free for downloading at the publisher’s site.

iConcepts Press has a number of books and periodicals you may find interesting.

Semantic Search… [Call for Papers]

Saturday, August 3rd, 2013

Semantic Search – Call for Papers for special issue of Aslib Journal of Information Management by Fran Alexander.

From the post:

I am currently drafting the Call for Papers for a special issue of the Aslib Journal of Information Management (formerly Aslib Proceedings) which I am guest editing alongside Dr Ulrike Spree from the University of Hamburg.

Ulrike is the academic expert, while I am providing the practitioner perspective. I am very keen to include practical case studies, so if you have an interesting project or comments on a project but have never written an academic paper before, don’t be put off. I will be happy to advise on style, referencing, etc.

Suggested Topics

Themes Ulrike is interested in include:

  • current trends in semantic search
  • best practice – how far along the road from ‘early adopters’ to ‘mainstream users’ has semantic search gone so far
  • usability of semantic search
  • visualisation and semantic search
  • the relationship between new trends in knowledge organisation and semantic search, such as vocabulary norms (like ISO 25964 “Thesauri for information retrieval“) and the potential of semantic search from a more critical perspective – what, for example, are the criteria for judging quality?

Themes I am interested in include:

  • the history of semantic search – how the latest techniques and technologies have come out of developments over the last 5, 10, 20, 100, 2000… years
  • how semantic search techniques and technologies are being used in practice
  • how semantic technologies are fostering a need for cross-industry collaboration and standardization
  • practical problems in brokering consensus and agreement – defining terms and classes, etc.
  • differences between web-scale, enterprise scale, and collection-specific scale techniques
  • curation and management of ontologies.

However, we are open to suggestions, especially as it is such a broad topic, there are so many aspects that could be covered.

Fran doesn’t mention a deadline but I will ask and update here when I get it.

Sounds like a venue that would welcome papers on topic maps.

Yes?

Proceedings of the 3rd Workshop on Semantic Publishing

Sunday, July 7th, 2013

Proceedings of the 3rd Workshop on Semantic Publishing edited by: Alexander García Castro, Christoph Lange, Phillip Lord, and Robert Stevens.

Table of Contents

Research Papers

  1. Twenty-Five Shades of Greycite: Semantics for Referencing and Preservation Phillip Lord
  2. Systematic Reviews as an Interface to the Web of (Trial) Data: using PICO as an Ontology for Knowledge Synthesis in Evidence-based Healthcare Research Chris Mavergames
  3. Towards Linked Research Data: an Institutional Approach Najko JahnFlorian Lier, Thilo Paul-Stueve, Christian Pietsch, Philipp Cimiano
  4. Repurposing Benchmark Corpora for Reconstructing Provenance Sara Magliacane.
  5. Connections across Scientific Publications based on Semantic Annotations Leyla Jael García Castro, Rafael Berlanga, Dietrich Rebholz-Schuhmann, Alexander Garcia.
  6. Towards the Automatic Identification of the Nature of Citations Angelo Di Iorio, Andrea Giovanni Nuzzolese, Silvio Peroni.
  7. How Reliable is Your Workflow: Monitoring Decay in Scholarly Publications José Manuel Gómez-Pérez, Esteban García-Cuesta, Jun Zhao, Aleix Garrido, José Enrique Ruiz.

Polemics (published externally)

  1. Flash Mob Science, Open Innovation and Semantic Publishing Hal Warren, Bryan Dennis, Eva Winer.
  2. Science, Semantic Web and Excuses Idafen Santana Pérez, Daniel Garijo, Oscar Corcho.
  3. Polemic on Future of Scholarly Publishing/Semantic Publishing Chris Mavergames.
  4. Linked Research Sarven Capadisli.

The whole proceedings can also be downloaded as a single file (PDF, including title pages, preface, and table of contents).

Some reading to start your week!

Annual update released for TeX Live (2013)

Monday, June 24th, 2013

Annual update released for TeX Live

From the post:

The developers of the TeX Live distribution of LaTeX have released their annual update. However, after 17 years of development, the changes in TeX Live 2013 mostly amount to technical details.

The texmf/ directory, for example, has been merged into texmf-dist/, while the TEXMFMAIN and TEXMFDIST Kpathsea variables now point to texmf-dist. The developers have also merged several language collections for easier installation. Users will find native support for PNG output and floating-point numbers in MetaPost. LuaTeX now uses version 5.2 of Lua and includes a new library (pdfscanner) for processing external PDF data, and xdvi now uses freetype instead of t1lib for rendering.

Several updates have been made to XeTeX: HarfBuzz is now used instead of ICU for font layout and has been combined with Graphite2 to replace SilGraphite for Graphite layout; support has also been improved for OpenType.

TeX Live 2013 is open source software, licensed under a combination of the LaTeX Project Public License (LPPL) and a number of other licences. The software works on all of the major operating systems, although the program no longer runs on AIX systems using PowerPCs. Mac OS X users may want to take a look at MacTeX, which is based on – and has been updated in line with – TeX Live.

No major changes but we should be grateful for the effort that resulted in this release.

Journal of Data Mining & Digital Humanities

Monday, May 27th, 2013

Journal of Data Mining & Digital Humanities

From the webpage:

Data mining, an interdisciplinary subfield of computer science, involving the methods at the intersection of artificial intelligence, machine learning and database systems. The Journal of Data Mining & Digital Humanities concerned with the intersection of computing and the disciplines of the humanities, with tools provided by computing such as data visualisation, information retrieval, statistics, text mining by publishing scholarly work beyond the traditional humanities.

The journal includes a wide range of fields in its discipline to create a platform for the authors to make their contribution towards the journal and the editorial office promises a peer review process for the submitted manuscripts for the quality of publishing.

Journal of Data Mining & Digital Humanities is an Open Access journal and aims to publish most complete and reliable source of information on the discoveries and current developments in the mode of original articles, review articles, case reports, short communications, etc. in all areas of the field and making them freely available through online without any restrictions or any other subscriptions to researchers worldwide.

The journal is using Editorial Tracking System for quality in review process. Editorial Tracking is an online manuscript submission, review and tracking systems. Review processing is performed by the editorial board members of Journal of Data Mining & Digital Humanities or outside experts; at least two independent reviewers approval followed by editor approval is required for acceptance of any citable manuscript. Authors may submit manuscripts and track their progress through the system, hopefully to publication. Reviewers can download manuscripts and submit their opinions to the editor. Editors can manage the whole submission/review/revise/publish process.

KDNuggets reports the first issue of JDMDH will appear in August, 2013. Deadline for submissions for the first issue: 25 June 2013.

A great venue for topic map focused papers. (When you are not writing for the Economist.)

New York Times – Article Search API v. 2

Sunday, May 5th, 2013

New York Times – Article Search API v. 2

From the documentation page:

With the Article Search API, you can search New York Times articles from Sept. 18, 1851 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

The prior Article Search API described itself as:

With the Article Search API, you can search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

An addition of one hundred and eighty years of content for searching. No bad for a v. 2 release.

On cursory review, the API does appear to have changed significantly.

For example, the default fields for each request in version 1.0 were body, byline, date, title, url.

In version 2.0, the default fields returned are: web_url, snippet, lead_paragraph, abstract, print_page, blog, source, multimedia, headline, keywords, pub_date, document_type, news_desk, byline, type_of_material, _id, and word_count.

Five default fields for version 1.0 versus seventeen for version 2.0.

There are changes in terminology that will make discovering all the changes from version 1.0 to version 2.0 non-trivial.

Two fields that were present in version 1.0 that I don’t see (under another name?) in version 2.0 are:

dbpedia_resource:

DBpedia person names mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

The Times per_facet is often more comprehensive than dbpedia_resource, but the DBpedia name is easier to use with other data sources. For more information about linked open data, see data.nytimes.com.

dbpedia_resource_url:

URLs to DBpedia person names that have been mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

For more information about linked open data, see data.nytimes.com.

More documentation is promised, which I hope includes a mapping from version 1.0 to version 2.0.

Certainly looks like the basis for annotating content in the New York Times archives as part of a topic map.

Where users input their authentication details for the New York Times and/or other pay-per-view sites.

I can’t imagine anyone objecting to you helping them sell their content. ;-)

Mathbabe, the book

Saturday, May 4th, 2013

Mathbabe, the book by Cathy O’Neil.

From the post:

Thanks to a certain friendly neighborhood mathbabe reader, I’ve created this mathbabe book, which is essentially all of my posts that I ever wrote (I think. Note sure about that.) bundled together mostly by date and stuck in a huge pdf. It comes to 1,243 pages.

I did it using leanpub.com, which charges $0.99 per person who downloads the pdf. I’m not charging anything over that, because the way I look at it, it’s already free.

Speaking of that, I can see why I’d want a copy of this stuff, since it’s the best way I can think of to have a local version of a bunch of writing I’ve done over the past couple of years, but I don’t actually see why anyone else would. So please don’t think I’m expecting you to go buy this book! Even so, more than one reader has requested this, so here it is.

And one strange thing: I don’t think it required my password on WordPress.com to do it, I just needed the url for the RSS feed. So if you want to avoid paying 99 cents, I’m pretty sure you can go to leanpub or one of its competitors and create another, identical book using that same feed.

And for that matter you can also go build your own book about anything using these tools, which is pretty cool when you think about it. Readers, please tell me if there’s a way to do this that’s open source and free.

The Mathbabe “book” would be one that I would be interested in reading. I can think of several other blogs that fall into that category.

I hesitate to use the term “book” for such a collection.

Maybe I am confusing “monograph,” which is focused on a topic, with “book,” which applies to works beyond a certain length.

I think of my postings, once you remove the dated notice materials, as potential essays or chapters in a book.

But they would need fleshing out and polishing to qualify for more formal publication.

FORCE 11

Thursday, March 21st, 2013

FORCE 11

Short description:

Force11 (the Future of Research Communications and e-Scholarship) is a virtual community working to transform scholarly communications toward improved knowledge creation and sharing. Currently, we have 315 active members.

A longer description from the “about” page:

Research and scholarship lead to the generation of new knowledge. The dissemination of this knowledge has a fundamental impact on the ways in which society develops and progresses; and at the same time, it feeds back to improve subsequent research and scholarship. Here, as in so many other areas of human activity, the Internet is changing the way things work: it opens up opportunities for new processes that can accelerate the growth of knowledge, including the creation of new means of communicating that knowledge among researchers and within the wider community. Two decades of emergent and increasingly pervasive information technology have demonstrated the potential for far more effective scholarly communication. However, the use of this technology remains limited; research processes and the dissemination of research results have yet to fully assimilate the capabilities of the Web and other digital media. Producers and consumers remain wedded to formats developed in the era of print publication, and the reward systems for researchers remain tied to those delivery mechanisms.

Force11 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing. Individually and collectively, we aim to bring about a change in modern scholarly communications through the effective use of information technology. Force11 has grown from a small group of like-minded individuals into an open movement with clearly identified stakeholders associated with emerging technologies, policies, funding mechanisms and business models. While not disputing the expressive power of the written word to communicate complex ideas, our foundational assumption is that scholarly communication by means of semantically enhanced media-rich digital publishing is likely to have a greater impact than communication in traditional print media or electronic facsimiles of printed works. However, to date, online versions of ‘scholarly outputs’ have tended to replicate print forms, rather than exploit the additional functionalities afforded by the digital terrain. We believe that digital publishing of enhanced papers will enable more effective scholarly communication, which will also broaden to include, for example, the publication of software tools, and research communication by means of social media channels. We see Force11 as a starting point for a community that we hope will grow and be augmented by individual and collective efforts by the participants and others. We invite you to join and contribute to this enterprise.

Force11 grew out of the FORC Workshop held in Dagstuhl, Germany in August 2011.

FORCE11 is a movement of people interested in furthering the goals stated in the FORCE11 manifesto. An important part of our work is information gathering and dissemination. We invite anyone with relevant information to provide us links which we may include on our websites. We ask anyone with similar and/or related efforts to include links to FORCE11. We are a neutral information market, and do not endorse or seek to block any relevant work.

The Tools and Resources page is particularly interesting.

Current divisions are:

  • Alternative metrics
  • Author Identification
  • Annotation
  • Authoring tools
  • Citation analysis
  • Computational Linguistics/Text Mining Efforts
  • Data citation
  • Ereaders
  • Hypothesis/claim-based representation of the rhetorical structure of a scientific paper
  • Mapping initiatives between ontologies
  • Metadata standards and ontologies
  • Modular formats for science publishing
  • Open Citations
  • Peer Review: New Models
  • Provenance
  • Publications and reports relevant to scholarly digital publication and data
  • Semantic publishing initiatives and other enriched forms of publication
  • Structured Digital Abstracts – modeling science (especially biology) as triples
  • Structured experimental methods and workflows
  • Text Extraction

Topic maps fit into communication agendas quite easily.

The first step in communication is capturing something to say.

The second step in communication is expressing what has been captured so it can be understood by others (or yourself next week).

Topic maps do both quite nicely.

I first saw this in a tweet by Anita de Waard.

What tools do you use for information gathering and publishing?

Thursday, January 24th, 2013

What tools do you use for information gathering and publishing? by Mac Slocum.

From the post:

Many apps claim to be the pinnacle of content consumption and distribution. Most are a tangle of silly names and bad interfaces, but some of these tools are useful. A few are downright empowering.

Finding those good ones is the tricky part. I queried O’Reilly colleagues to find out what they use and why, and that process offered a decent starting point. We put all our notes together into this public Hackpad — feel free to add to it. I also went through and plucked out some of the top choices. Those are posted below.

Information gathering, however humble it may be, is the start of any topic map authoring project.

Mac asks for the tools you use every week.

Let’s not disappoint him!

Intelligent Content:…

Monday, January 14th, 2013

Intelligent Content: How APIs Can Supply the Right Content to the Right Reader by Adam DuVander.

From the post:

When you buy a car, it comes with a thick manual that probably sits in your glove box for the life of the car. The experience with a new luxury car may be much different. That printed, bound manual may only contain the information relevant to your car. No leather seats, no two page spread on caring for the hide. That’s intelligent content. And it’s an opportunity for APIs to help publishers go way beyond the cookie cutter printed book. It also happens to be an exciting conference coming to San Francisco in February.

It takes effort to segment content, especially when it was originally written as one piece. There are many benefits to those that put in the effort to think of their content as a platform. Publisher Pearson did this with a number of its titles, most notably with its Pearson Eyewitness Guides API. Using the API, developers can take what was a standalone travel book–say, the Eyewitness Guide to London–and query individual locations. One can imagine travel apps using the content to display great restaurants or landmarks that are nearby, for example.

Traditional publishing is a market that is ripe for disruption, characterized by Berkeley professor Robert Glushko co-creating a new approach to academic textbooks with his students in the Future of E-books. Glushko is one of the speakers at the Intelligent Content Conference, which will bring together content creators, technologists and publishers to discuss the many opportunities. Also speaking is Netflix’s Daniel Jacobson, who architected a large redesign of the Netflix API in order to support hundreds of devices. And yes, I will discuss the opportunities for content-as-a-service via APIs.

ProgrammableWeb readers can still get in on the early bird discount to attend Intelligent Content, which takes place February 7-8 in San Francisco.

San Francisco in February sounds like a good idea. Particularly if the future of publishing is on the agenda.

Would observe that “intelligent content” implies that some one, that is a person, has both authored the content and designed the API. Doesn’t happen auto-magically.

And with people involved, our old friend semantic diversity is going to be in the midst of the discussions, proposals and projects.

Reliable collation of data from different publishers (universities with multiple subscriptions should be pushing for this now) could make access seamless to end users.

A Paywall In Your Future? [Curated Data As Revenue Stream]

Tuesday, December 25th, 2012

The New York Times Paywall Is Working Better Than Anyone Had Guessed by Edmund Lee.

From the post:

Ever since the New York Times rolled out its so-called paywall in March 2011, a perennial dispute has waged. Anxious publishers say they can’t afford to give away their content for free, while the blogger set claim paywalls tend to turn off readers accustomed to a free and open Web.

More than a year and a half later, it’s clear the New York Times’ paywall is not only valuable, it’s helped turn the paper’s subscription dollars, which once might have been considered the equivalent of a generous tithing, into a significant revenue-generating business. As of this year, the company is expected to make more money from subscriptions than from advertising — the first time that’s happened.

Digital subscriptions will generate $91 million this year, according to Douglas Arthur, an analyst with Evercore Partners. The paywall, by his estimate, will account for 12 percent of total subscription sales, which will top $768.3 million this year. That’s $52.8 million more than advertising. Those figures are for the Times newspaper and the International Herald Tribune, largely considered the European edition of the Times.

It’s a milestone that upends the traditional 80-20 ratio between ads and circulation that publishers once considered a healthy mix and that is now no longer tenable given the industrywide decline in newsprint advertising. Annual ad dollars at the Times, for example, has fallen for five straight years.

More importantly, subscription sales are rising faster than ad dollars are falling. During the 12 months after the paywall was implemented, the Times and the International Herald Tribune increased circulation dollars 7.1 percent compared with the previous 12-month period, while advertising fell 3.7 percent. Subscription sales more than compensated for the ad losses, surpassing them by $19.2 million in the first year they started charging readers online.

I don’t think gate-keeper and camera-ready copy publishers should take much comfort from this report.

Unlike those outlets, the New York Times has a “value-add” with regard to the news it reports.

Much like UI/UX design, the open question is: What do users see as a value-add? (Hopefully a significant number of users.)

A life or death question for a new content stream, fighting for attention.

Paying for What Was Free: Lessons from the New York Times Paywall

Sunday, November 4th, 2012

Paying for What Was Free: Lessons from the New York Times Paywall

From the post:

In a national online longitudinal survey, participants reported their attitudes and behaviors in response to the recently implemented metered paywall by the New York Times. Previously free online content now requires a digital subscription to access beyond a small free monthly allotment. Participants were surveyed shortly after the paywall was announced and again 11 weeks after it was implemented to understand how they would react and adapt to this change. Most readers planned not to pay and ultimately did not. Instead, they devalued the newspaper, visited its Web site less frequently, and used loopholes, particularly those who thought the paywall would lead to inequality. Results of an experimental justification manipulation revealed that framing the paywall in terms of financial necessity moderately increased support and willingness to pay. Framing the paywall in terms of a profit motive proved to be a noncompelling justification, sharply decreasing both support and willingness to pay. Results suggest that people react negatively to paying for previously free content, but change can be facilitated with compelling justifications that emphasize fairness.

The original article: Jonathan E. Cook and Shahzeen Z. Attari. Cyberpsychology, Behavior, and Social Networking. -Not available-, ahead of print. doi:10.1089/cyber.2012.0251

Another data point in the struggle to find a viable model for delivery of online content.

The difficulty with “free” content, followed by discovering you still need to pay expenses for that content, is that consumers, when charged, gain nothing over when the content was free. They are losers in that proposition.

I mention this because topic maps that provide content over the web face the same economic challenges as other online content providers.

A model that I haven’t seen (you may have so sing out) is one that offers the content for free, but the links to other materials, the research adds value to the content, are dead links without subscription. True, someone could track down each and every reference but if you are using the content as part of your job, do you really want to do that?

The full and complete content is simply made available. To anyone who want a copy. After all, the wider the circulation of the content, the more free advertising you are getting for your publication.

Delivery of PDF files with citations, sans links, for non-subscribers is perhaps one line of XSL-FO code. It satisfies the question of “access” and yet leaves publishers a new area to fill with features and value-added content.

Take for example, less than full article level linking. If I wanted to read another thirty pages to find a citation was just boiler-plate, I hardly need a citation network do I? Of course value-added content isn’t found directly under the lamp post, but requires some imagination.

JournalTOCs

Wednesday, October 24th, 2012

JournalTOCs

Most publishers have TOC services for new issues of their journals.

JournalTOCs aggregates TOCs from publishers and maintains a searchable database of their TOC postings.

A database that is accessible via a free API I should add.

The API should be a useful way to add journal articles to a topic map, particularly when you want to add selected articles and not entire issues.

I am looking forward to using and exploring JournalTOCs.

Suggest you do the same.

Books as Islands/Silos – e-book formats

Sunday, September 9th, 2012

After posting about the panel discussion on the future of the book, I looked up the listing of e-book formats at Wikipedia and found:

  1. Archos Diffusion
  2. Broadband eBooks (BBeB)
  3. Comic Book Archive file
  4. Compiled HTML
  5. DAISY – ANSI/NISO Z39.86
  6. Desktop Author
  7. DjVu
  8. EPUB
  9. eReader
  10. FictionBook (Fb2)
  11. Founder Electronics
  12. Hypertext Markup Language
  13. iBook (Apple)
  14. IEC 62448
  15. KF8 (Amazon Kindle)
  16. Microsoft LIT
  17. Mobipocket
  18. Multimedia eBooks
  19. Newton eBook
  20. Open Electronic Package
  21. Portable Document Format
  22. Plain text files
  23. Plucker
  24. PostScript
  25. SSReader
  26. TealDoc
  27. TEBR
  28. Text Encoding Initiative
  29. TomeRaider

Beyond different formats, the additional issue being that each book stands on its own.

Imagine a “hover” over a section of interest in a book and relevant other “sections” from other books are also displayed.

Is anyone working on a mapping across these various formats? (Not conversion, “mapping across” language chosen deliberately. Conversion might violate a EULA. Navigation with due regard to the EULA would be difficult to prohibit.)

I realize some of them are too seldom used for commercially viable material to be of interest. Or may be of interest only in certain markets (SSReader for instance).

Not the classic topic map case of identifying duplicate content in different guises but producing navigation across different formats to distinct material.

Books, Bookstores, Catalogs [30% Digital by end of 2012, Books as Islands/Silos]

Sunday, September 9th, 2012

Books, Bookstores, Catalogs by Kevin Hillstrom.

From the post:

The parallels between books, bookstores, and catalogs are significant.

So take fifty minutes this weekend, and watch this session that was recently broadcast on BookTV, titled “The Future of the Book and Bookstore“.

This is fifty minutes of absolutely riveting television, seriously! Boring setting, riveting topic.

Jim Milliot (Publishers Weekly) tossed out an early tidbit: 30% of book sales will be digital by the end of 2012.

LIssa Muscatine, Politics & Prose bootstore owner: When books are a smaller part of the revenue stream, have to diversify the revenue stream. Including print on demand from a catalog of 7 million books.

Sam Dorrance Potomac Books (publisher): Hard copy sales will likely decrease by ten percent (10%) per year for the next several years.

Recurrent theme: Independent booksellers can provide guidance to readers. Not the same thing as “recommendation” because it is more nuanced.

Rafe Sagalyn Sagalyn Literary Agency: Now a buyers market. Almost parity between hard copy and ebook sales.

Great panel but misses the point that books, hard copy or digital, remain isolated islands/silos.

Want to have a value-add that is revolutionary?

Create links across Kindle and other electronic formats, so that licensed users are not isolated within single works.

Did I hear someone say topic maps?

Applied and implied semantics in crystallographic publishing

Thursday, August 30th, 2012

Applied and implied semantics in crystallographic publishing by Brian McMahon. Journal of Cheminformatics 2012, 4:19 doi:10.1186/1758-2946-4-19.

Abstract:

Background

Crystallography is a data-rich, software-intensive scientific discipline with a community that has undertaken direct responsibility for publishing its own scientific journals. That community has worked actively to develop information exchange standards allowing readers of structure reports to access directly, and interact with, the scientific content of the articles.

Results

Structure reports submitted to some journals of the International Union of Crystallography (IUCr) can be automatically validated and published through an efficient and cost-effective workflow. Readers can view and interact with the structures in three-dimensional visualization applications, and can access the experimental data should they wish to perform their own independent structure solution and refinement. The journals also layer on top of this facility a number of automated annotations and interpretations to add further scientific value.

Conclusions

The benefits of semantically rich information exchange standards have revolutionised the scholarly publishing process for crystallography, and establish a model relevant to many other physical science disciplines.

A strong reminder to authors and publishers of the costs and benefits of making semantics explicit. (And the trade-offs involved.)

Topic Map Based Publishing

Monday, August 20th, 2012

After asking for ideas on publishing cheat sheets this morning, I have one to offer as well.

One problem with traditional cheat sheets is what any particular user wants in a cheat sheet?

Another problem is how expand the content of a cheat sheet?

And what if you want to sell the content? How does that work?

I don’t have a working version (yet) but here is my thinking on how topic maps could power a “cheat sheet” that meets all those requirements.

Solving the problem of what content to include seems critical to me. It is the make or break point in terms of attracting paying customers for a cheat sheet.

Content of no interest is as deadly as poor quality content. Either way, paying customers will vote with their feet.

The first step is to allow customers to “build” their own cheat sheet from some list of content. In topic map terminology, they specify an association between themselves and a set of topics to appear in “their” cheat sheet.

Most of the cheat sheets that I have seen (and printed out more than a few) are static artifacts. WYSIWYG artifacts. What there is and there ain’t no more.

Works for some things but what if what you need to know lies just beyond the edge of the cheat sheet? That’s that bad thing about static artifacts, they have edges.

In addition to building their own cheat sheet, the only limits to a topic map based cheat sheet are those imposed by lack of payment or interest. ;-)

You may not need troff syntax examples on a daily basis but there are times when they could come in quite handy. (Don’t laugh. Liam Quin got hired on the basis of the troff typesetting of his resume.)

The second step is to have a cheat sheet that can expand or contract based on the immediate needs of the user. Sometimes more or less content, depending on their need. Think of an expandable “nutshell” reference.

A WYWIWYG (What You Want Is What You Get) approach as opposed to WWWTSYIWYG (What We Want To Sell You Is What You Get) (any publishers come to mind?).

What’s more important? Your needs or the needs of your publisher?

Finally, how to “sell” the content? The value-add?

Here’s one model: The user buys a version of the cheat sheet, which has embedded links to addition content. Links that when the user authenticates to a server, are treated as subject identifiers. Subject identifiers that cause merging to occur with topics on the server and deliver additional content. Each user subject identifier can be auto-generated on purchase and so are uniquely tied to a particular login.

The user can freely distribute the version of the cheat sheet they purchased, free advertising for you. But the additional content requires a separate purchase by the new user.

What blind alleys, pot holes and other hazards/dangers am I failing to account for in this scenario?

Three Steps to Heaven: Semantic Publishing in a Real World Workflow

Tuesday, July 3rd, 2012

Three Steps to Heaven: Semantic Publishing in a Real World Workflow by Phillip Lord, Simon Cockell, and Robert Stevens.

Abstract:

Semantic publishing offers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a `lumpen’ PDF. In this paper, we discuss a web-first approach to publication, and describe a three-tiered approach which integrates with the existing authoring tooling. Critically, although it adds limited semantics, it does provide value to all the participants in the process: the author, the reader and the machine.

With a touch of irony and gloom the authors write:

… There are signi cant barriers to the acceptance of semantic publishing as a standard mechanism for academic publishing. The web was invented around 1990 as a light-weight mechanism for publication of documents. It has subsequently had a massive impact on society in general. It has, however, barely touched most scientifi c publishing; while most journals have a website, the publication process still revolves around the generation of papers, moving from Microsoft Word or LATEX [5], through to a final PDF which looks, feels and is something designed to be printed onto paper4. Adding semantics into this environment is difficult or impossible; the content of the PDF has to be exposed and semantic content retrofi tted or, in all likelihood, a complex process of author and publisher interaction has to be devised and followed. If semantic data publishing and semantic publishing of academic narratives are to work together, then academic publishing needs to change.

4. This includes conferences dedicated to the web and the use of web technologies.

One could add “…includes papers about changing the publishing process” but I digress.

I don’t disagree that adding semantics to the current system has proved problematic.

I do disagree that changing the current system, which is deeply embedded in research, publishing and social practices is likely to succeed.

At least if success is defined as a general solution to adding semantics to scientific research and publishing in general. Such projects may be successful in creating new methods of publishing scientific research but that just expands the variety of methods we must account for.

That doesn’t have a “solution like” feel to me. You?

Readersourcing—a manifesto

Monday, July 2nd, 2012

Readersourcing—a manifesto by Stefano Mizzaro. (Mizzaro, S. (2012), Readersourcing—a manifesto. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22668)

Abstract:

This position paper analyzes the current situation in scholarly publishing and peer review practices and presents three theses: (a) we are going to run out of peer reviewers; (b) it is possible to replace referees with readers, an approach that I have named “Readersourcing”; and (c) it is possible to avoid potential weaknesses in the Readersourcing model by adopting an appropriate quality control mechanism. The readersourcing.org system is then presented as an independent, third-party, nonprofit, and academic/scientific endeavor aimed at quality rating of scholarly literature and scholars, and some possible criticisms are discussed.

Mizzaro touches a number of issues that have speculative answers in his call for “readersourcing” of research. There is a website in progress, www.readersourcing.org.

I am interested in the approach as an aspect of crowdsourcing the creation of topic maps.

FYI, his statement that:

Readersourcing is a solution to a problem, but it immediately raises another problem, for which we need a solution: how to distinguish good readers from bad readers. If 200 undergraduate students say that a paper is good, but five experts (by reputation) in the field say that it is not, then it seems obvious that the latter should be given more importance when calculating the paper’s quality.

Seems problematic to me. Particularly for graduate students. If professors at their school rate research high or low, that should be calculated into a rating for that particular reader.

If that seems pessimistic, read: Fish, Stanley, “Transmuting the Lump: Paradise Lost, 1942-1979,” in Doing What Comes Naturally. Fish, Stanley (ed.), Duke University Press, 1989), which treats changing “expert” opinions on the closing chapters of Paradise Lost. So far as I know, the text did not change between 1942 and 1979 but “expert” opinion certainly did.

I offer that as a caution that all of our judgements are a matter of social consensus that changes over time. On some issues more quickly than others. Our information systems should reflect the ebb and flow of that semantic renegotiation.