Scienceography: the study of how science is written by Graham Cormode, S. Muthukrishnan and Jinyun Yun.
Abstract:
Scientific literature has itself been the subject of much scientific study, for a variety of reasons: understanding how results are communicated, how ideas spread, and assessing the influence of areas or individuals. However, most prior work has focused on extracting and analyzing citation and stylistic patterns. In this work, we introduce the notion of ‘scienceography’, which focuses on the writing of science. We provide a first large scale study using data derived from the arXiv e-print repository. Crucially, our data includes the “source code” of scientific papers-the LATEX source-which enables us to study features not present in the “final product”, such as the tools used and private comments between authors. Our study identifies broad patterns and trends in two example areas-computer science and mathematics-as well as highlighting key differences in the way that science is written in these fields. Finally, we outline future directions to extend the new topic of scienceography.
What content are you searching/indexing in a scientific context?
The authors discover what many of us have overlooked. The “source” of scientific papers. A source that can reflects a richer history than the final product.
Some questions:
Will searching the source give us finer grained access to the content? That is can we separate portions of text that recite history, related research, background, from new insights/conclusions? To access the other material only if needed. (Every graph paper starts off with nodes and edges, complete with citations. Anyone reading a graph paper is likely to know those terms.)
Other disciplines use LaTeX. Do those LaTeX files differ from the ones reported here? If so, in what way?