Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 14, 2014

Is PDF the Problem?

Filed under: Bibliometrics,PDF — Patrick Durusau @ 1:39 pm

The solutions to all our problems may be buried in PDFs that nobody reads by Christopher Ingraham.

From the post:

What if someone had already figured out the answers to the world’s most pressing policy problems, but those solutions were buried deep in a PDF, somewhere nobody will ever read them?

According to a recent report by the World Bank, that scenario is not so far-fetched. The bank is one of those high-minded organizations — Washington is full of them — that release hundreds, maybe thousands, of reports a year on policy issues big and small. Many of these reports are long and highly technical, and just about all of them get released to the world as a PDF report posted to the organization’s Web site.

The World Bank recently decided to ask an important question: Is anyone actually reading these things? They dug into their Web site traffic data and came to the following conclusions: Nearly one-third of their PDF reports had never been downloaded, not even once. Another 40 percent of their reports had been downloaded fewer than 100 times. Only 13 percent had seen more than 250 downloads in their lifetimes. Since most World Bank reports have a stated objective of informing public debate or government policy, this seems like a pretty lousy track record.

I’m not so sure that the PDF format, annoying as it can be, lies at the heart of non-reading of World Bank reports.

Consider Rose Eveleth’s recent (2014) Academics Write Papers Arguing Over How Many People Read (And Cite) Their Papers.

Eveleth writes:

There are a lot of scientific papers out there. One estimate puts the count at 1.8 million articles published each year, in about 28,000 journals. Who actually reads those papers? According to one 2007 study, not many people: half of academic papers are read only by their authors and journal editors, the study’s authors write.

But not all academics accept that they have an audience of three. There’s a heated dispute around academic readership and citation—enough that there have been studies about reading studies going back for more than two decades.

In the 2007 study, the authors introduce their topic by noting that “as many as 50% of papers are never read by anyone other than their authors, referees and journal editors.” They also claim that 90 percent of papers published are never cited. Some academics are unsurprised by these numbers. “I distinctly remember focusing not so much on the hyper-specific nature of these research topics, but how it must feel as an academic to spend so much time on a topic so far on the periphery of human interest,” writes Aaron Gordon at Pacific Standard. “Academia’s incentive structure is such that it’s better to publish something than nothing,” he explains, even if that something is only read by you and your reviewers.

Fifty (50%) of papers have an audience of three? Being mindful these aren’t papers from the World Bank but papers spread across a range of disciplines.

Before you decide that PDF format is the issue or that academic journal articles aren’t read, you need to consider other evidence from sources such as: Measuring Total Reading of Journal, Donald W. King, Carol Tenopir, and, Michael Clarke, D-Lib Magazine, October 2006, Volume 12 Number 10, ISSN 1082-9873.

King, Tenopir, and, Clarke write in part:

The Myth of Low Use of Journal Articles

A myth that journal articles are read infrequently persisted over a number of decades (see, for example, Williams 1975, Lancaster 1978, Schauder 1994, Odlyzko 1996). In fact, early on this misconception led to a series of studies funded by the National Science Foundation (NSF) in the 1960s and 1970s to seek alternatives to traditional print journals, which were considered by many to be a huge waste of paper. The basis for this belief was generally twofold. First, many considered citation counts to be the principal indicator of reading articles, and studies showed that articles averaged about 10 to 20 citations to them (a number that has steadily grown over the past 25 years). Counts of citations to articles tend to be highly skewed with a few articles having a large number of citations and many with few or even no citation to them. This led to the perception that articles were read infrequently or simply not at all.

King, Tenopir, and, Clarke make a convincing case that “readership” for an article is a more complex question than checking download statistics.

Let’s say that the question of usage/reading of reports/articles is open to debate. Depending on who you ask, some measures are thought to be better than others.

But there is a common factor that all of these studies ignore: Usage, however you define it, is based on article or paper level access.

What if instead of looking for an appropriate World Bank PDF (or other format) file, I could search for the data used in such a file? Or the analysis of some particular data that is listed in a file? I may or may not be interested in the article as a whole.

An author’s arrangement of data and their commentary on it is one presentation of data, shouldn’t we divorce access to the data from reading it through the lens of the author?

If we want greater re-use of experimental, financial, survey and other data, then let’s stop burying it in an author’s presentation, whether delivered as print, PDF, or some other format.

I first saw this in a tweet by Duncan Hull.

June 3, 2012

Discussion of scholarly information in research blogs

Filed under: Bibliometrics,Blogs,Citation Analysis,Citation Indexing — Patrick Durusau @ 3:13 pm

Discussion of scholarly information in research blogs by Hadas Shema.

From the post:

As some of you know, Mike Thelwall, Judit Bar-Ilan (both are my dissertation advisors) and myself published an article called “Research Blogs and the Discussion of Scholarly Information” in PLoS One. Many people showed interest in the article, and I thought I’d write a “director’s commentary” post. Naturally, I’m saving all your tweets and blog posts for later research.

The Sample

We characterized 126 blogs with 135 authors from Researchblogging.Org (RB), an aggregator of blog posts dealing with peer-review research. Two over-achievers had two blogs each, and 11 blogs had two authors.

While our interest in research blogs started before we ever heard of RB, it was reading an article using RB that really kick-started the project. Groth & Gurney (2010) wrote an article titled “Studying scientific discourse on the Web using bibliometrics: A chemistry blogging case study.” The article made for a fascinating read, because it applied bibliometric methods to blogs. Just like it says in the title, Groth & Gurney took the references from 295 blog posts about Chemistry and analyzed them the way one would analyze citations from peer-reviewed articles. They managed that because they used RB, which aggregates only posts by bloggers who take the time to formally cite their sources. Major drooling ensued at that point. People citing in a scholarly manner out of their free will? It’s Christmas!

Questions that stand out for me on blogs:

Will our indexing/searching of blogs have the same all or nothing granularity of scholarly articles?

If not, why not?

Powered by WordPress