Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 21, 2013

FORCE 11

Filed under: Communication,Publishing — Patrick Durusau @ 5:19 am

FORCE 11

Short description:

Force11 (the Future of Research Communications and e-Scholarship) is a virtual community working to transform scholarly communications toward improved knowledge creation and sharing. Currently, we have 315 active members.

A longer description from the “about” page:

Research and scholarship lead to the generation of new knowledge. The dissemination of this knowledge has a fundamental impact on the ways in which society develops and progresses; and at the same time, it feeds back to improve subsequent research and scholarship. Here, as in so many other areas of human activity, the Internet is changing the way things work: it opens up opportunities for new processes that can accelerate the growth of knowledge, including the creation of new means of communicating that knowledge among researchers and within the wider community. Two decades of emergent and increasingly pervasive information technology have demonstrated the potential for far more effective scholarly communication. However, the use of this technology remains limited; research processes and the dissemination of research results have yet to fully assimilate the capabilities of the Web and other digital media. Producers and consumers remain wedded to formats developed in the era of print publication, and the reward systems for researchers remain tied to those delivery mechanisms.

Force11 is a community of scholars, librarians, archivists, publishers and research funders that has arisen organically to help facilitate the change toward improved knowledge creation and sharing. Individually and collectively, we aim to bring about a change in modern scholarly communications through the effective use of information technology. Force11 has grown from a small group of like-minded individuals into an open movement with clearly identified stakeholders associated with emerging technologies, policies, funding mechanisms and business models. While not disputing the expressive power of the written word to communicate complex ideas, our foundational assumption is that scholarly communication by means of semantically enhanced media-rich digital publishing is likely to have a greater impact than communication in traditional print media or electronic facsimiles of printed works. However, to date, online versions of ‘scholarly outputs’ have tended to replicate print forms, rather than exploit the additional functionalities afforded by the digital terrain. We believe that digital publishing of enhanced papers will enable more effective scholarly communication, which will also broaden to include, for example, the publication of software tools, and research communication by means of social media channels. We see Force11 as a starting point for a community that we hope will grow and be augmented by individual and collective efforts by the participants and others. We invite you to join and contribute to this enterprise.

Force11 grew out of the FORC Workshop held in Dagstuhl, Germany in August 2011.

FORCE11 is a movement of people interested in furthering the goals stated in the FORCE11 manifesto. An important part of our work is information gathering and dissemination. We invite anyone with relevant information to provide us links which we may include on our websites. We ask anyone with similar and/or related efforts to include links to FORCE11. We are a neutral information market, and do not endorse or seek to block any relevant work.

The Tools and Resources page is particularly interesting.

Current divisions are:

  • Alternative metrics
  • Author Identification
  • Annotation
  • Authoring tools
  • Citation analysis
  • Computational Linguistics/Text Mining Efforts
  • Data citation
  • Ereaders
  • Hypothesis/claim-based representation of the rhetorical structure of a scientific paper
  • Mapping initiatives between ontologies
  • Metadata standards and ontologies
  • Modular formats for science publishing
  • Open Citations
  • Peer Review: New Models
  • Provenance
  • Publications and reports relevant to scholarly digital publication and data
  • Semantic publishing initiatives and other enriched forms of publication
  • Structured Digital Abstracts – modeling science (especially biology) as triples
  • Structured experimental methods and workflows
  • Text Extraction

Topic maps fit into communication agendas quite easily.

The first step in communication is capturing something to say.

The second step in communication is expressing what has been captured so it can be understood by others (or yourself next week).

Topic maps do both quite nicely.

I first saw this in a tweet by Anita de Waard.

January 24, 2013

What tools do you use for information gathering and publishing?

Filed under: Data Mining,Publishing,Text Mining — Patrick Durusau @ 8:07 pm

What tools do you use for information gathering and publishing? by Mac Slocum.

From the post:

Many apps claim to be the pinnacle of content consumption and distribution. Most are a tangle of silly names and bad interfaces, but some of these tools are useful. A few are downright empowering.

Finding those good ones is the tricky part. I queried O’Reilly colleagues to find out what they use and why, and that process offered a decent starting point. We put all our notes together into this public Hackpad — feel free to add to it. I also went through and plucked out some of the top choices. Those are posted below.

Information gathering, however humble it may be, is the start of any topic map authoring project.

Mac asks for the tools you use every week.

Let’s not disappoint him!

January 14, 2013

Intelligent Content:…

Filed under: eBooks,Information Reuse,Publishing — Patrick Durusau @ 8:39 pm

Intelligent Content: How APIs Can Supply the Right Content to the Right Reader by Adam DuVander.

From the post:

When you buy a car, it comes with a thick manual that probably sits in your glove box for the life of the car. The experience with a new luxury car may be much different. That printed, bound manual may only contain the information relevant to your car. No leather seats, no two page spread on caring for the hide. That’s intelligent content. And it’s an opportunity for APIs to help publishers go way beyond the cookie cutter printed book. It also happens to be an exciting conference coming to San Francisco in February.

It takes effort to segment content, especially when it was originally written as one piece. There are many benefits to those that put in the effort to think of their content as a platform. Publisher Pearson did this with a number of its titles, most notably with its Pearson Eyewitness Guides API. Using the API, developers can take what was a standalone travel book–say, the Eyewitness Guide to London–and query individual locations. One can imagine travel apps using the content to display great restaurants or landmarks that are nearby, for example.

Traditional publishing is a market that is ripe for disruption, characterized by Berkeley professor Robert Glushko co-creating a new approach to academic textbooks with his students in the Future of E-books. Glushko is one of the speakers at the Intelligent Content Conference, which will bring together content creators, technologists and publishers to discuss the many opportunities. Also speaking is Netflix’s Daniel Jacobson, who architected a large redesign of the Netflix API in order to support hundreds of devices. And yes, I will discuss the opportunities for content-as-a-service via APIs.

ProgrammableWeb readers can still get in on the early bird discount to attend Intelligent Content, which takes place February 7-8 in San Francisco.

San Francisco in February sounds like a good idea. Particularly if the future of publishing is on the agenda.

Would observe that “intelligent content” implies that some one, that is a person, has both authored the content and designed the API. Doesn’t happen auto-magically.

And with people involved, our old friend semantic diversity is going to be in the midst of the discussions, proposals and projects.

Reliable collation of data from different publishers (universities with multiple subscriptions should be pushing for this now) could make access seamless to end users.

December 25, 2012

A Paywall In Your Future? [Curated Data As Revenue Stream]

Filed under: News,Publishing — Patrick Durusau @ 8:23 pm

The New York Times Paywall Is Working Better Than Anyone Had Guessed by Edmund Lee.

From the post:

Ever since the New York Times rolled out its so-called paywall in March 2011, a perennial dispute has waged. Anxious publishers say they can’t afford to give away their content for free, while the blogger set claim paywalls tend to turn off readers accustomed to a free and open Web.

More than a year and a half later, it’s clear the New York Times’ paywall is not only valuable, it’s helped turn the paper’s subscription dollars, which once might have been considered the equivalent of a generous tithing, into a significant revenue-generating business. As of this year, the company is expected to make more money from subscriptions than from advertising — the first time that’s happened.

Digital subscriptions will generate $91 million this year, according to Douglas Arthur, an analyst with Evercore Partners. The paywall, by his estimate, will account for 12 percent of total subscription sales, which will top $768.3 million this year. That’s $52.8 million more than advertising. Those figures are for the Times newspaper and the International Herald Tribune, largely considered the European edition of the Times.

It’s a milestone that upends the traditional 80-20 ratio between ads and circulation that publishers once considered a healthy mix and that is now no longer tenable given the industrywide decline in newsprint advertising. Annual ad dollars at the Times, for example, has fallen for five straight years.

More importantly, subscription sales are rising faster than ad dollars are falling. During the 12 months after the paywall was implemented, the Times and the International Herald Tribune increased circulation dollars 7.1 percent compared with the previous 12-month period, while advertising fell 3.7 percent. Subscription sales more than compensated for the ad losses, surpassing them by $19.2 million in the first year they started charging readers online.

I don’t think gate-keeper and camera-ready copy publishers should take much comfort from this report.

Unlike those outlets, the New York Times has a “value-add” with regard to the news it reports.

Much like UI/UX design, the open question is: What do users see as a value-add? (Hopefully a significant number of users.)

A life or death question for a new content stream, fighting for attention.

November 4, 2012

Paying for What Was Free: Lessons from the New York Times Paywall

Filed under: Marketing,News,Publishing — Patrick Durusau @ 3:26 pm

Paying for What Was Free: Lessons from the New York Times Paywall

From the post:

In a national online longitudinal survey, participants reported their attitudes and behaviors in response to the recently implemented metered paywall by the New York Times. Previously free online content now requires a digital subscription to access beyond a small free monthly allotment. Participants were surveyed shortly after the paywall was announced and again 11 weeks after it was implemented to understand how they would react and adapt to this change. Most readers planned not to pay and ultimately did not. Instead, they devalued the newspaper, visited its Web site less frequently, and used loopholes, particularly those who thought the paywall would lead to inequality. Results of an experimental justification manipulation revealed that framing the paywall in terms of financial necessity moderately increased support and willingness to pay. Framing the paywall in terms of a profit motive proved to be a noncompelling justification, sharply decreasing both support and willingness to pay. Results suggest that people react negatively to paying for previously free content, but change can be facilitated with compelling justifications that emphasize fairness.

The original article: Jonathan E. Cook and Shahzeen Z. Attari. Cyberpsychology, Behavior, and Social Networking. -Not available-, ahead of print. doi:10.1089/cyber.2012.0251

Another data point in the struggle to find a viable model for delivery of online content.

The difficulty with “free” content, followed by discovering you still need to pay expenses for that content, is that consumers, when charged, gain nothing over when the content was free. They are losers in that proposition.

I mention this because topic maps that provide content over the web face the same economic challenges as other online content providers.

A model that I haven’t seen (you may have so sing out) is one that offers the content for free, but the links to other materials, the research adds value to the content, are dead links without subscription. True, someone could track down each and every reference but if you are using the content as part of your job, do you really want to do that?

The full and complete content is simply made available. To anyone who want a copy. After all, the wider the circulation of the content, the more free advertising you are getting for your publication.

Delivery of PDF files with citations, sans links, for non-subscribers is perhaps one line of XSL-FO code. It satisfies the question of “access” and yet leaves publishers a new area to fill with features and value-added content.

Take for example, less than full article level linking. If I wanted to read another thirty pages to find a citation was just boiler-plate, I hardly need a citation network do I? Of course value-added content isn’t found directly under the lamp post, but requires some imagination.

October 24, 2012

JournalTOCs

Filed under: Data Source,Library,Library software,Publishing — Patrick Durusau @ 4:02 pm

JournalTOCs

Most publishers have TOC services for new issues of their journals.

JournalTOCs aggregates TOCs from publishers and maintains a searchable database of their TOC postings.

A database that is accessible via a free API I should add.

The API should be a useful way to add journal articles to a topic map, particularly when you want to add selected articles and not entire issues.

I am looking forward to using and exploring JournalTOCs.

Suggest you do the same.

September 9, 2012

Books as Islands/Silos – e-book formats

Filed under: eBooks,Publishing — Patrick Durusau @ 3:03 pm

After posting about the panel discussion on the future of the book, I looked up the listing of e-book formats at Wikipedia and found:

  1. Archos Diffusion
  2. Broadband eBooks (BBeB)
  3. Comic Book Archive file
  4. Compiled HTML
  5. DAISY – ANSI/NISO Z39.86
  6. Desktop Author
  7. DjVu
  8. EPUB
  9. eReader
  10. FictionBook (Fb2)
  11. Founder Electronics
  12. Hypertext Markup Language
  13. iBook (Apple)
  14. IEC 62448
  15. KF8 (Amazon Kindle)
  16. Microsoft LIT
  17. Mobipocket
  18. Multimedia eBooks
  19. Newton eBook
  20. Open Electronic Package
  21. Portable Document Format
  22. Plain text files
  23. Plucker
  24. PostScript
  25. SSReader
  26. TealDoc
  27. TEBR
  28. Text Encoding Initiative
  29. TomeRaider

Beyond different formats, the additional issue being that each book stands on its own.

Imagine a “hover” over a section of interest in a book and relevant other “sections” from other books are also displayed.

Is anyone working on a mapping across these various formats? (Not conversion, “mapping across” language chosen deliberately. Conversion might violate a EULA. Navigation with due regard to the EULA would be difficult to prohibit.)

I realize some of them are too seldom used for commercially viable material to be of interest. Or may be of interest only in certain markets (SSReader for instance).

Not the classic topic map case of identifying duplicate content in different guises but producing navigation across different formats to distinct material.

Books, Bookstores, Catalogs [30% Digital by end of 2012, Books as Islands/Silos]

Filed under: Books,Publishing — Patrick Durusau @ 2:12 pm

Books, Bookstores, Catalogs by Kevin Hillstrom.

From the post:

The parallels between books, bookstores, and catalogs are significant.

So take fifty minutes this weekend, and watch this session that was recently broadcast on BookTV, titled “The Future of the Book and Bookstore“.

This is fifty minutes of absolutely riveting television, seriously! Boring setting, riveting topic.

Jim Milliot (Publishers Weekly) tossed out an early tidbit: 30% of book sales will be digital by the end of 2012.

LIssa Muscatine, Politics & Prose bootstore owner: When books are a smaller part of the revenue stream, have to diversify the revenue stream. Including print on demand from a catalog of 7 million books.

Sam Dorrance Potomac Books (publisher): Hard copy sales will likely decrease by ten percent (10%) per year for the next several years.

Recurrent theme: Independent booksellers can provide guidance to readers. Not the same thing as “recommendation” because it is more nuanced.

Rafe Sagalyn Sagalyn Literary Agency: Now a buyers market. Almost parity between hard copy and ebook sales.

Great panel but misses the point that books, hard copy or digital, remain isolated islands/silos.

Want to have a value-add that is revolutionary?

Create links across Kindle and other electronic formats, so that licensed users are not isolated within single works.

Did I hear someone say topic maps?

August 30, 2012

Applied and implied semantics in crystallographic publishing

Filed under: Publishing,Semantics — Patrick Durusau @ 10:54 am

Applied and implied semantics in crystallographic publishing by Brian McMahon. Journal of Cheminformatics 2012, 4:19 doi:10.1186/1758-2946-4-19.

Abstract:

Background

Crystallography is a data-rich, software-intensive scientific discipline with a community that has undertaken direct responsibility for publishing its own scientific journals. That community has worked actively to develop information exchange standards allowing readers of structure reports to access directly, and interact with, the scientific content of the articles.

Results

Structure reports submitted to some journals of the International Union of Crystallography (IUCr) can be automatically validated and published through an efficient and cost-effective workflow. Readers can view and interact with the structures in three-dimensional visualization applications, and can access the experimental data should they wish to perform their own independent structure solution and refinement. The journals also layer on top of this facility a number of automated annotations and interpretations to add further scientific value.

Conclusions

The benefits of semantically rich information exchange standards have revolutionised the scholarly publishing process for crystallography, and establish a model relevant to many other physical science disciplines.

A strong reminder to authors and publishers of the costs and benefits of making semantics explicit. (And the trade-offs involved.)

August 20, 2012

Topic Map Based Publishing

Filed under: Marketing,Publishing,Topic Map Software,Topic Maps — Patrick Durusau @ 10:21 am

After asking for ideas on publishing cheat sheets this morning, I have one to offer as well.

One problem with traditional cheat sheets is what any particular user wants in a cheat sheet?

Another problem is how expand the content of a cheat sheet?

And what if you want to sell the content? How does that work?

I don’t have a working version (yet) but here is my thinking on how topic maps could power a “cheat sheet” that meets all those requirements.

Solving the problem of what content to include seems critical to me. It is the make or break point in terms of attracting paying customers for a cheat sheet.

Content of no interest is as deadly as poor quality content. Either way, paying customers will vote with their feet.

The first step is to allow customers to “build” their own cheat sheet from some list of content. In topic map terminology, they specify an association between themselves and a set of topics to appear in “their” cheat sheet.

Most of the cheat sheets that I have seen (and printed out more than a few) are static artifacts. WYSIWYG artifacts. What there is and there ain’t no more.

Works for some things but what if what you need to know lies just beyond the edge of the cheat sheet? That’s that bad thing about static artifacts, they have edges.

In addition to building their own cheat sheet, the only limits to a topic map based cheat sheet are those imposed by lack of payment or interest. 😉

You may not need troff syntax examples on a daily basis but there are times when they could come in quite handy. (Don’t laugh. Liam Quin got hired on the basis of the troff typesetting of his resume.)

The second step is to have a cheat sheet that can expand or contract based on the immediate needs of the user. Sometimes more or less content, depending on their need. Think of an expandable “nutshell” reference.

A WYWIWYG (What You Want Is What You Get) approach as opposed to WWWTSYIWYG (What We Want To Sell You Is What You Get) (any publishers come to mind?).

What’s more important? Your needs or the needs of your publisher?

Finally, how to “sell” the content? The value-add?

Here’s one model: The user buys a version of the cheat sheet, which has embedded links to addition content. Links that when the user authenticates to a server, are treated as subject identifiers. Subject identifiers that cause merging to occur with topics on the server and deliver additional content. Each user subject identifier can be auto-generated on purchase and so are uniquely tied to a particular login.

The user can freely distribute the version of the cheat sheet they purchased, free advertising for you. But the additional content requires a separate purchase by the new user.

What blind alleys, pot holes and other hazards/dangers am I failing to account for in this scenario?

July 3, 2012

Three Steps to Heaven: Semantic Publishing in a Real World Workflow

Filed under: Publishing,Semantics — Patrick Durusau @ 2:27 pm

Three Steps to Heaven: Semantic Publishing in a Real World Workflow by Phillip Lord, Simon Cockell, and Robert Stevens.

Abstract:

Semantic publishing offers the promise of computable papers, enriched visualisation and a realisation of the linked data ideal. In reality, however, the publication process contrives to prevent richer semantics while culminating in a `lumpen’ PDF. In this paper, we discuss a web-first approach to publication, and describe a three-tiered approach which integrates with the existing authoring tooling. Critically, although it adds limited semantics, it does provide value to all the participants in the process: the author, the reader and the machine.

With a touch of irony and gloom the authors write:

… There are signi cant barriers to the acceptance of semantic publishing as a standard mechanism for academic publishing. The web was invented around 1990 as a light-weight mechanism for publication of documents. It has subsequently had a massive impact on society in general. It has, however, barely touched most scientifi c publishing; while most journals have a website, the publication process still revolves around the generation of papers, moving from Microsoft Word or LATEX [5], through to a final PDF which looks, feels and is something designed to be printed onto paper4. Adding semantics into this environment is difficult or impossible; the content of the PDF has to be exposed and semantic content retrofi tted or, in all likelihood, a complex process of author and publisher interaction has to be devised and followed. If semantic data publishing and semantic publishing of academic narratives are to work together, then academic publishing needs to change.

4. This includes conferences dedicated to the web and the use of web technologies.

One could add “…includes papers about changing the publishing process” but I digress.

I don’t disagree that adding semantics to the current system has proved problematic.

I do disagree that changing the current system, which is deeply embedded in research, publishing and social practices is likely to succeed.

At least if success is defined as a general solution to adding semantics to scientific research and publishing in general. Such projects may be successful in creating new methods of publishing scientific research but that just expands the variety of methods we must account for.

That doesn’t have a “solution like” feel to me. You?

July 2, 2012

Readersourcing—a manifesto

Filed under: Crowd Sourcing,Publishing,Reviews — Patrick Durusau @ 5:24 pm

Readersourcing—a manifesto by Stefano Mizzaro. (Mizzaro, S. (2012), Readersourcing—a manifesto. J. Am. Soc. Inf. Sci.. doi: 10.1002/asi.22668)

Abstract:

This position paper analyzes the current situation in scholarly publishing and peer review practices and presents three theses: (a) we are going to run out of peer reviewers; (b) it is possible to replace referees with readers, an approach that I have named “Readersourcing”; and (c) it is possible to avoid potential weaknesses in the Readersourcing model by adopting an appropriate quality control mechanism. The readersourcing.org system is then presented as an independent, third-party, nonprofit, and academic/scientific endeavor aimed at quality rating of scholarly literature and scholars, and some possible criticisms are discussed.

Mizzaro touches a number of issues that have speculative answers in his call for “readersourcing” of research. There is a website in progress, www.readersourcing.org.

I am interested in the approach as an aspect of crowdsourcing the creation of topic maps.

FYI, his statement that:

Readersourcing is a solution to a problem, but it immediately raises another problem, for which we need a solution: how to distinguish good readers from bad readers. If 200 undergraduate students say that a paper is good, but five experts (by reputation) in the field say that it is not, then it seems obvious that the latter should be given more importance when calculating the paper’s quality.

Seems problematic to me. Particularly for graduate students. If professors at their school rate research high or low, that should be calculated into a rating for that particular reader.

If that seems pessimistic, read: Fish, Stanley, “Transmuting the Lump: Paradise Lost, 1942-1979,” in Doing What Comes Naturally. Fish, Stanley (ed.), Duke University Press, 1989), which treats changing “expert” opinions on the closing chapters of Paradise Lost. So far as I know, the text did not change between 1942 and 1979 but “expert” opinion certainly did.

I offer that as a caution that all of our judgements are a matter of social consensus that changes over time. On some issues more quickly than others. Our information systems should reflect the ebb and flow of that semantic renegotiation.

March 25, 2012

How to Get Published – Elsevier

Filed under: Publishing — Patrick Durusau @ 7:15 pm

How to Get Published.

Author training webcasts from Elsevier.

Whether you are thinking about publishing in professional journals or simply want to improve (write?) useful user documentation, this isn’t a bad resource.

November 12, 2011

Real scientists never report fraud

Filed under: Peer Review,Publishing,Research Methods — Patrick Durusau @ 8:41 pm

Real scientists never report fraud

Daniel Lemire writes (in part):

People who want to believe that “peer reviewed work” means “correct work” will object that this is just one case. But what about the recently dismissed Harvard professor Marc Hauser? We find exactly the same story. Marc Hauser published over 200 papers in the best journals, making up data as he went. Again colleagues, journals and collaborators failed to openly challenge him: it took naive students, that is, outsiders, to report the fraud.

While I agree that other “professionals” may not have time to closely check work in the peer review process (see some of the comments), I think that illustrates the valuable role that students can play in the publication process.

Why not have a departmental requirement that papers for publication be circulated among students with an anonymous but public comment mechanism? Students are as pressed for time as anyone but they have the added incentive of wanting to become skilled at criticism of ideas and writing.

Not only would such a review process increase the likelihood of detection of fraud, but it would catch all manner of poor writing or citation practices. I regularly encounter published CS papers that incorrectly cite other published work or that cite work eventually published but under other titles. No fraud, just poor practices.

« Newer Posts

Powered by WordPress