Graphic whimsy via Bruce Sterling, bruces@well.com.
Are your information requirements met by finding something or by finding the right thing?
Graphic whimsy via Bruce Sterling, bruces@well.com.
Are your information requirements met by finding something or by finding the right thing?
Just in case you missed Cory’s tweet on April 21, 2016:
Saying “Librarians are obsolete now that we have the Internet” is like saying “Doctors are obsolete now that we have the plague”
If that doesn’t make sense to you:
Do you get it now?
Modern Pathfinders: Creating Better Research Guides by Jason Puckett.
From the Amazon description:
Whether you call them research guides, subject guides or pathfinders, web-based guides are a great way to create customized support tools for a specific audience: a class, a group, or anyone engaging in research. Studies show that library guides are often difficult, confusing, or overwhelming, causing users to give up and just fall back on search engines such as Google. How can librarians create more effective, less confusing, and simply better research guides?
In Modern Pathfinders: Creating Better Research Guides, author Jason Puckett takes proven ideas from instructional design and user experience web design and combines them into easy-to-understand principles for making your research guides better teaching tools. It doesn’t matter what software your library uses; the advice and techniques in this book will help you create guides that are easier for your users to understand and more effective to use.
This may be a very good book.
I say “may be” because at $42.00 for 157 pages in paperback and/or Kindle, I’m very unlikely to find out.
The American Library Association (publisher of this work) is doing its members, authors and the reading public a disservice by maintaining a pinched audience for its publications.
Works by librarians and on pathfinders in particular would be help, albeit belated help, for technologists who have tried to recreate the labor of librarians. Poorly.
If and when this work appears at a more reasonable price, I hope to offer a review for your consideration.
How to Read a Paper by S. Keshav.
Abstract:
Researchers spend a great deal of time reading research papers. However, this skill is rarely taught, leading to much wasted effort. This article outlines a practical and efficient three-pass method for reading research papers. I also describe how to use this method to do a literature survey.
Sean Cribbs mentions this paper in: The Refreshingly Rewarding Realm of Research Papers but it is important enough for a separate post.
You should keep a copy of it at hand until the three-pass method becomes habit.
Other resources that Keshav mentions:
T. Roscoe, Writing Reviews for Systems Conferences
H. Schulzrinne, Writing Technical Articles
G.M. Whitesides, Whitesides’ Group: Writing a Paper (updated URL)
All three are fairly short and well worth your time to read and re-read.
Experienced writers as well!
After more than thirty years of professional writing I still benefit from well-written writing/editing advice.
55 Articles Every Librarian Should Read (Updated) by Christina Magnifico.
The articles cover a wide range of subjects but you remember the line:
“People become librarians because they know too much.”
A good starting place if you are looking for sparks for new ideas.
Enjoy!
Comprehensive Index of Legal Reports (Law Library of Congress)
From the announcement that came via email:
In an effort to highlight the legal reports produced by the Law Library of Congress, we have revamped our display of the reports on our website.
The new Comprehensive Index of Legal Reports will house all reports available on our website. This will also be the exclusive location to find reports written before 2011.
The reports listed on the Comprehensive Index page are divided into specific topics designed to point you to the reports of greatest interest and relevance. Each report listed is under only one topic and several topics are not yet filled (“forthcoming”). We plan to add many reports from our archives to this page over the next few months, filling in all of the topics.
The Current Legal Topics page (http://www.loc.gov/law/help/current-topics.php) will now only contain the most current reports. The list of reports by topic also includes a short description explaining what you will find in each report.
No links will be harmed in this change, so any links you have created to individual reports will continue to work. Just remember to add http://loc.gov/law/help/legal-reports.php as a place to find research, especially of a historical nature, and http://loc.gov/law/help/current-topics.php to find recently written reports.
There are US entities that rival the British Library and the British Museum. The Library of Congress is one of those, as is the Law Library of Congress (the law library is a part of the Library of Congress but merits separate mention).
Every greedy, I would like to see something similar for the Congressional Research Service.
From the webpage:
The Congressional Research Service (CRS) works exclusively for the United States Congress, providing policy and legal analysis to committees and Members of both the House and Senate, regardless of party affiliation. As a legislative branch agency within the Library of Congress, CRS has been a valued and respected resource on Capitol Hill for more than a century.
CRS is well-known for analysis that is authoritative, confidential, objective and nonpartisan. Its highest priority is to ensure that Congress has 24/7 access to the nation’s best thinking.
Imagine US voters being given “…analysis that is authoritative, …, objective and nonpartisan,” analysis that they are paying for today and have for more than the last century.
I leave it to your imagination why Congress would prefer to have “confidential” reports that aren’t available to ordinary citizens. Do you prefer incompetence or malice?
Cybersecurity: Authoritative Reports and Resources, by Topic by Rita Tehan, Information Specialist (Congressional Research Service).
From the summary:
This report provides references to analytical reports on cybersecurity from CRS, other government agencies, trade associations, and interest groups. The reports and related websites are grouped under the following cybersecurity topics:
- Policy overview
- National Strategy for Trusted Identities in Cyberspace (NSTIC)
- Cloud computing and the Federal Risk and Authorization Management Program (FedRAMP)
- Critical infrastructure
- Cybercrime, data breaches, and data security
- National security, cyber espionage, and cyberwar (including Stuxnet)
- International efforts
- Education/training/workforce
- Research and development (R&D)
In addition, the report lists selected cybersecurity-related websites for congressional and government agencies; news; international organizations; and other organizations, associations, and institutions.
Great report on cybersecurity resources!
As well as a killer demo for why we need librarians, now more than ever.
Here’s the demo. Print and show the coverpage of the report to a library doubter.
Let them pick a category from the table of contents and then you count the number of federal government resources in the report. Give them a week to duplicate the contents of the section they have chosen. 😉
Anyone, including your colleagues, can find something relevant on the WWW. The question is whether they can find all the good stuff.
(Sorry, librarians and search experts are no eligible for this challenge.)
I first saw this in Gary Price’s The Research Desk.
D-Lib Magazine January/February 2015
From the table of contents (see the original toc for abstracts):
Editorials
2nd International Workshop on Linking and Contextualizing Publications and Datasets by Laurence Lannom, Corporation for National Research Initiatives
Data as “First-class Citizens” by Łukasz Bolikowski, ICM, University of Warsaw, Poland; Nikos Houssos, National Documentation Centre / National Hellenic Research Foundation, Greece; Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy and Jochen Schirrwagen, Bielefeld University Library, Germany
Articles
Semantic Enrichment and Search: A Case Study on Environmental Science Literature by Kalina Bontcheva, University of Sheffield, UK; Johanna Kieniewicz and Stephen Andrews, British Library, UK; Michael Wallis, HR Wallingford, UK
A-posteriori Provenance-enabled Linking of Publications and Datasets via Crowdsourcing by Laura Drăgan, Markus Luczak-Rösch, Elena Simperl, Heather Packer and Luc Moreau, University of Southampton, UK; Bettina Berendt, KU Leuven, Belgium
A Framework Supporting the Shift from Traditional Digital Publications to Enhanced Publications by Alessia Bardi and Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy
Science 2.0 Repositories: Time for a Change in Scholarly Communication by Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy
Data Citation Practices in the CRAWDAD Wireless Network Data Archive by Tristan Henderson, University of St Andrews, UK and David Kotz, Dartmouth College, USA
A Methodology for Citing Linked Open Data Subsets by Gianmaria Silvello, University of Padua, Italy
Challenges in Matching Dataset Citation Strings to Datasets in Social Science by Brigitte Mathiak and Katarina Boland, GESIS Leibniz Institute for the Social Sciences, Germany
Enabling Living Systematic Reviews and Clinical Guidelines through Semantic Technologies by Laura Slaughter; The Interventional Centre, Oslo University Hospital (OUS), Norway; Christopher Friis Berntsen and Linn Brandt, Internal Medicine Department, Innlandet Hosptial Trust and MAGICorg, Norway and Chris Mavergames, Informatics and Knowledge Management Department, The Cochrane Collaboration, Germany
Data without Peer: Examples of Data Peer Review in the Earth Sciences by Sarah Callaghan, British Atmospheric Data Centre, UK
The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite by Jan Brase and Irina Sens, German National Library of Science and Technology, Germany and Michael Lautenschlager, German Climate Computing Centre, Germany
New Events
N E W S & E V E N T S
In Brief: Short Items of Current Awareness
In the News: Recent Press Releases and Announcements
Clips & Pointers: Documents, Deadlines, Calls for Participation
Meetings, Conferences, Workshops: Calendar of Activities Associated with Digital Libraries Research and Technologies
The quality of D-Lib Magazine meets or exceeds the quality claimed by pay-per-view publishers.
Enjoy!
This is your Brain on Big Data: A Review of “The Organized Mind” by Stephen Few.
From the post:
In the past few years, several fine books have been written by neuroscientists. In this blog I’ve reviewed those that are most useful and placed Daniel Kahneman’s Thinking, Fast & Slow at the top of the heap. I’ve now found its worthy companion: The Organized Mind: Thinking Straight in the Age of Information Overload.
This new book by Daniel J. Levitin explains how our brains have evolved to process information and he applies this knowledge to several of the most important realms of life: our homes, our social connections, our time, our businesses, our decisions, and the education of our children. Knowing how our minds manage attention and memory, especially their limitations and the ways that we can offload and organize information to work around these limitations, is essential for anyone who works with data.
…
See Stephen’s review for an excerpt from the introduction and summary comments on the work as a whole.
I am particularly looking forward to reading Levitin’s take on the transfer of information tasks to us and the resulting cognitive overload.
I don’t have the volume, yet, but it occurs to me that the shift from indexes (Readers Guide to Periodical Literature and the like) and librarians to full text search engines, is yet another example of the transfer of information tasks to us.
Indexers and librarians do a better job of finding information than we do because discovery of information is a difficult intellectual task. Well, perhaps, discovering relevant and useful information is a difficult task. Almost without exception, every search produces a result on major search engines. Perhaps not a useful result but a result none the less.
Using indexers and librarians will produce a line item in someone’s budget. What is needed is research on the differential between the results from indexer/librarians versus us and what that translates to as a line item in enterprise budgets.
That type of research could influence university, government and corporate budgets as the information age moves into high gear.
The Organized Mind by Daniel J. Levitin is a must have for the holiday wish list!
Realtime personalization and recommendation with stream mining by Mikio L. Braun.
From the post:
Last Tuesday, I gave a talk at this year’s Berlin Buzzword conference on using stream mining algorithms to efficiently store information extracted from user behavior to perform personalization and recommendation effectively already using a single computer, which is of course key behind streamdrill.
If you’ve been following my talks, you’ll probably recognize a lot of stuff I’ve talked about before, but what is new in this talk is that I tried to take the next step from simply talking about Heavy Hitters and Count- Min Sketches to using these data structures as an approximate storage for all kinds of analytics related data like counts, profiles, or even sparse matrices, as they occur recommendations algorithms.
I think reformulating our approach as basically an efficient approximate data structure also helped to steer the discussion away from comparing streamdrill to other big data frameworks (“Can’t you just do that in Storm?” — “define ‘just’”). As I said in the talk, the question is not whether you can do it in Big Data Framework X, because you probably could. I have started look at it from the other direction: we did not use any Big Data framework and were still able to achieve some serious performance numbers.
Slides and video are available at this page.
Using Google Search Appliance (GSA) to Search Digital Library Collections: A Case Study of the INIS Collection Search by Dobrica Savic.
From the post:
In February 2014, I gave a presentation at the conference on Faster, Smarter and Richer: Reshaping the library catalogue (FSR 2014), which was organized by the Associazione Italiana Biblioteche (AIB) and Biblioteca Apostolica Vaticana in Rome, Italy. My presentation focused on the experience of the International Nuclear Information System (INIS) in using Google Search Appliance (GSA) to search digital library collections at the International Atomic Energy Agency (IAEA).
Libraries are facing many challenges today. In addition to diminished funding and increased user expectations, the use of classic library catalogues is becoming an additional challenge. Library users require fast and easy access to information resources, regardless of whether the format is paper or electronic. Google Search, with its speed and simplicity, has established a new standard for information retrieval which did not exist with previous generations of library search facilities. Put in a position of David versus Goliath, many small, and even larger libraries, are losing the battle to Google, letting many of its users utilize it rather than library catalogues.
The International Nuclear Information System (INIS)
The International Nuclear Information System (INIS) hosts one of the world's largest collections of published information on the peaceful uses of nuclear science and technology. It offers on-line access to a unique collection of 3.6 million bibliographic records and 483,000 full texts of non-conventional (grey) literature. This large digital library collection suffered from most of the well-known shortcomings of the classic library catalogue. Searching was complex and complicated, it required training in Boolean logic, full-text searching was not an option, and response time was slow. An opportune moment to improve the system came with the retirement of the previous catalogue software and the adoption of Google Search Appliance (GSA) as an organization-wide search engine standard.
….
To be completely honest, my first reaction wasn’t a favorable one.
But even the complete blog post does not do justice to the project in question.
Take a look at the slides, which include screen shots of the new interface before reaching an opinion.
Take this as a lesson on what your search interface should be offering by default.
There are always other screens you can fill with advanced features.
Data Mining the Internet Archive Collection by Caleb McDaniel.
From the “Lesson Goals:”
The collections of the Internet Archive (IA) include many digitized sources of interest to historians, including early JSTOR journal content, John Adams’s personal library, and the Haiti collection at the John Carter Brown Library. In short, to quote Programming Historian Ian Milligan, “The Internet Archive rocks.”
In this lesson, you’ll learn how to download files from such collections using a Python module specifically designed for the Internet Archive. You will also learn how to use another Python module designed for parsing MARC XML records, a widely used standard for formatting bibliographic metadata.
For demonstration purposes, this lesson will focus on working with the digitized version of the Anti-Slavery Collection at the Boston Public Library in Copley Square. We will first download a large collection of MARC records from this collection, and then use Python to retrieve and analyze bibliographic information about items in the collection. For example, by the end of this lesson, you will be able to create a list of every named place from which a letter in the antislavery collection was written, which you could then use for a mapping project or some other kind of analysis.
This rocks!
In particular for librarians and library students who will already be familiar with MARC records.
Some 7,000 items from the Boston Public Library’s anti-slavery collection at Copley Square are the focus of this lesson.
That means historians have access to rich metadata, full images, and partial descriptions for thousands of antislavery letters, manuscripts, and publications.
Would original anti-slavery materials, written by actual participants, have interested you as a student? Do you think such materials would interest students now?
I first saw this in a tweet by Gregory Piatetsky.
Texas Conference on Digital Libraries 2013
Abstracts and in many cases presentations from the Texas Conference on Digital Libraries 2013.
A real treasure trove on digital libraries projects and issues.
Library: A place where IR isn’t limited by software.
“Why don’t libraries get better the more they are used?”
From the post:
On June 19-20, 2013, the 8th Handheld Librarian Online Conference will take place, an online conference about encouraging innovation inside libraries.
Register now, as an individual, group or site, and receive access to all interactive, live online events and recordings of the sessions!
(…)
The keynote presentation is delivered by Michael Edson, Smithsonian Institution’s Director of Web and New Media Strategy, and is entitled “Faking the Internet”. His central question:
“Why don’t libraries get better the more they are used? Not just a little better—exponentially better, like the Internet. They could, and, in a society facing colossal challenges, they must, but we won’t get there without confronting a few taboos about what a library is, who it’s for, and who’s in charge.”
I will register for this conference.
Mostly to hear Michael Edson’s claim that the Internet has gotten “exponentially better.”
In my experience (yours?), the Internet has gotten exponentially noisier.
If you don’t believe me, write down a question (not the query) and give it to ten (10) random people outside your IT department or library.
Have them print out the first page of search results.
Enough proof?
Edson’s point that information resources should improve with use, on the other hand, is a good one.
For example, contrast your local librarian with a digital resource.
The more questions your librarian fields, the better they become with related information and resources on any subject.
A digital resource which no matter how many times it is queried, the result it returns will always be the same.
A librarian is a dynamic accumulator of information and relationships between information. A digital resource is a static reporter of information.
Unlike librarians, digital resources aren’t designed to accumulate new information or relationships between information from users at the point of interest. (A blog response several screen scrolls away is unseen and unhelpful.)
What we need are UIs for digital resources that enable users to map into those digital resources their insights, relationships and links to other resources.
In their own words.
That type of digital resource could become “exponentially better.”
Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement by Charles W. Bailey, Jr.
From the webpage:
In a rapidly changing technological environment, the difficult task of ensuring long-term access to digital information is increasingly important. The Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, 2012 Supplement presents over 130 English-language articles, books, and technical reports published in 2012 that are useful in understanding digital curation and preservation. This selective bibliography covers digital curation and preservation copyright issues, digital formats (e.g., media, e-journals, research data), metadata, models and policies, national and international efforts, projects and institutional implementations, research studies, services, strategies, and digital repository concerns.
It is a supplement to the Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works, which covers over 650 works published from 2000 through 2011. All included works are in English. The bibliography does not cover conference papers, digital media works (such as MP3 files), editorials, e-mail messages, letters to the editor, presentation slides or transcripts, unpublished e-prints, or weblog postings.
The bibliography includes links to freely available versions of included works. If such versions are unavailable, italicized links to the publishers' descriptions are provided.
Links, even to publisher versions and versions in disciplinary archives and institutional repositories, are subject to change. URLs may alter without warning (or automatic forwarding) or they may disappear altogether. Inclusion of links to works on authors' personal websites is highly selective. Note that e-prints and published articles may not be identical.
The bibliography is available under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Supplement to “the” starting point for research on digital curation.
Research Data Symposium – Columbia.
Posters from the Research Data Symposium, held at Columbia University, February 27, 2013.
Subject to the limitations of the poster genre but useful as a quick overview of current projects and directions.
Shedding Light on the Dark Data in the Long Tail of Science by P. Bryan Heidorn. (P. Bryan Heidorn. “Shedding Light on the Dark Data in the Long Tail of Science.” Library Trends 57.2 (2008): 280-299. Project MUSE. Web. 28 Feb. 2013.
Abstract:
One of the primary outputs of the scientific enterprise is data, but many institutions such as libraries that are charged with preserving and disseminating scholarly output have largely ignored this form of documentation of scholarly activity. This paper focuses on a particularly troublesome class of data, termed dark data. “Dark data” is not carefully indexed and stored so it becomes nearly invisible to scientists and other potential users and therefore is more likely to remain underutilized and eventually lost. The article discusses how the concepts from long-tail economics can be used to understand potential solutions for better curation of this data. The paper describes why this data is critical to scientific progress, some of the properties of this data, as well as some social and technical barriers to proper management of this class of data. Many potentially useful institutional, social, and technical solutions are under development and are introduced in the last sections of the paper, but these solutions are largely unproven and require additional research and development.
From the article:
In this paper we will use the term dark data to refer to any data that is not easily found by potential users. Dark data may be positive or negative research findings or from either “large” or “small” science. Like dark matter, this dark data on the basis of volume may be more important than that which can be easily seen. The challenge for science policy is to develop institutions and practices such as institutional repositories, which make this data useful for society.
Dark Data = Any data that is not easily found by potential users.
A number of causes are discussed, not the least of which is our old friend, the Tower of Babel.
A final barrier that cannot be overlooked is the Digital Tower of Babel that we have created with seemingly countless proprietary as well as open data formats. This can include versions of the same software products that are incompatible. Some of these formats are very efficient for the individual applications for which they were designed including word processing, databases, spreadsheets, and others, but they are ineffective to support interoperability and preservation.
As you know already, I don’t think the answer to data curation, long term, lies in uniform formats.
Uniform formats are very useful but are domain, project and time bound.
The questions always are:
“What do we do when we change data formats?”
“Do we dump data in old formats that we spent $$$ developing?”
“Do we migrate data in old formats, assuming anyone remembers the old format?”
“Do we document and map across old and new formats, preparing for the next ‘new’ format?”
None of the answers are automatic or free.
But it is better to make in informed choice than a default one of letting potentially valuable data rot.
Looking out for the little guy: Small data curation by Katherine Goold Akers. (Akers, K. G. (2013), Looking out for the little guy: Small data curation. Bul. Am. Soc. Info. Sci. Tech., 39: 58–59. doi: 10.1002/bult.2013.1720390317)
Abstract:
While big data and its management are in the spotlight, a vast number of important research projects generate relatively small amounts of data that are nonetheless valuable yet rarely preserved. Such studies are often focused precursors to follow-up work and generate less noisy data than grand scale projects. Yet smaller quantity does not equate to simpler management. Data from smaller studies may be captured in a variety of file formats with no standard approach to documentation, metadata or preparation for archiving or reuse, making its curation even more challenging than for big data. As the information managers most likely to encounter small datasets, academic librarians should cooperate to develop workable strategies to document, organize, preserve and disseminate local small datasets so that valuable scholarly information can be discovered and shared.
A reminder that for every “big data” project in need of curation, there are many more smaller, less well known projects that need the same services.
Since topic maps don’t require global or even regional agreement on ontology or methodological issues, it should be easier for academic librarians to create topic maps to curate small datasets.
When it is necessary or desired to merge small datasets that were curated with different topic map assumptions, new topics can be created that merge the data that existed in separate topic maps.
But only when necessary and at the point of merging.
To say it another way, topic maps need not anticipate or fear the future. Tomorrow will take care of itself.
Unlike “now I am awake” approaches, that must fear the next moment of consciousness will bring change.
International Conference on Theory and Practice of Digital Libraries (TPDL)
Valletta, Malta, September 22-26, 2013. I thought that would get your attention. Details follow.
Dates:
Full and Short papers, Posters, Panels, and Demonstrations deadline: March 23, 2013
Workshops and Tutorials proposals deadline: March 4, 2013
Doctoral Consortium papers submission deadline: June 2, 2013
Notification of acceptance for Papers, Posters, and Demonstrations: May 20, 2013
Notification of acceptance for Panels, Workshops and Tutorials: April 22, 2013
Doctoral Consortium acceptance notification: June 24, 2013
Camera ready versions: June 9, 2013
End of early registration: July 31, 2013
Conference dates: September 22-26, 2013
The general theme of the conference is “Sharing meaningful information,” a theme reflected in the topics for conference submissions:
General areas of interests include, but are not limited to, the following topics, organized in four categories, according to a conceptualization that coincides with the four arms of the Maltese Cross:
Foundations
- Information models
- Digital library conceptual models and formal issues
- Digital library 2.0
- Digital library education curricula
- Economic and legal aspects (e.g. rights management) landscape for digital libraries
- Theoretical models of information interaction and organization
- Information policies
- Studies of human factors in networked information
- Scholarly primitives
- Novel research tools and methods with emphasis on digital humanities
- User behavior analysis and modeling
- Social-technical perspectives of digital information
Infrastructures
- Digital library architectures
- Cloud and grid deployments
- Federation of repositories
- Collaborative and participatory information environments
- Data storage and indexing
- Big data management
- e-science, e-government, e-learning, cultural heritage infrastructures
- Semi structured data
- Semantic web issues in digital libraries
- Ontologies and knowledge organization systems
- Linked Data and its applications
Content
- Metadata schemas with emphasis to metadata for composite content (Multimedia, geographical, statistical data and other special content formats)
- Interoperability and Information integration
- Digital Curation and related workflows
- Preservation, authenticity and provenance
- Web archiving
- Social media and dynamically generated content for particular uses/communities (education, science, public, etc.)
- Crowdsourcing
- 3D models indexing and retrieval
- Authority management issues
Services
- Information Retrieval and browsing
- Multilingual and Multimedia Information Retrieval
- Personalization in digital libraries
- Context awareness in information access
- Semantic aware services
- Technologies for delivering/accessing digital libraries, e.g. mobile devices
- Visualization of large-scale information environments
- Evaluation of online information environments
- Quality metrics
- Interfaces to digital libraries
- Data mining/extraction of structure from networked information
- Social networks analysis and virtual organizations
- Traditional and alternative metrics of scholarly communication
- Mashups of resources
Do you know if there are plans for recording presentations?
Given the location and diminishing travel funding, an efficient way to increase the impact of the presentations.
Research Data Curation Bibliography (version 2) by Charles W. Bailey.
From the introduction:
The Research Data Curation Bibliography includes selected English-language articles, books, and technical reports that are useful in understanding the curation of digital research data in academic and other research institutions. For broader coverage of the digital curation literature, see the author's Digital Curation Bibliography: Preservation and Stewardship of Scholarly Works,which presents over 650 English-language articles, books, and technical reports.
The "digital curation" concept is still evolving. In "Digital Curation and Trusted Repositories: Steps toward Success," Christopher A. Lee and Helen R. Tibbo define digital curation as follows:
Digital curation involves selection and appraisal by creators and archivists; evolving provision of intellectual access; redundant storage; data transformations; and, for some materials, a commitment to long-term preservation. Digital curation is stewardship that provides for the reproducibility and re-use of authentic digital data and other digital assets. Development of trustworthy and durable digital repositories; principles of sound metadata creation and capture; use of open standards for file formats and data encoding; and the promotion of information management literacy are all essential to the longevity of digital resources and the success of curation efforts.
This bibliography does not cover conference papers, digital media works (such as MP3 files), editorials, e-mail messages, interviews, letters to the editor, presentation slides or transcripts, unpublished e-prints, or weblog postings. Coverage of technical reports is very selective.
Most sources have been published from 2000 through 2012; however, a limited number of earlier key sources are also included. The bibliography includes links to freely available versions of included works. If such versions are unavailable, italicized links to the publishers' descriptions are provided.
Such links, even to publisher versions and versions in disciplinary archives and institutional repositories, are subject to change. URLs may alter without warning (or automatic forwarding) or they may disappear altogether. Inclusion of links to works on authors' personal websites is highly selective. Note that e prints and published articles may not be identical.
An archive of prior versions of the bibliography is available.
If you are a beginning library student, take the time to know the work of Charles Bailey. He has consistently made a positive contribution for researchers from very early in the so-called digital revolution.
To the extent that you want to design topic maps for data curation, long or short term, the 200+ items in this bibliography will introduce you to some of the issues you will be facing.
Does Time Fix All? by Daniel Lemire, starts off:
As an graduate, finding useful references was painful. What the librarians had come up with were terrible time-consuming systems. It took an outsider (Berners-Lee) to invent the Web. Even so, the librarians were slow to adopt the Web and you could often see them warn students against using the Web as part of their research. Some of us ignored them and posted our papers online, or searched for papers online. Many, many years later, we are still a crazy minority but a new generation of librarians has finally adopted the Web.
What do you conclude from this story?
Whenever you point to a difficult systemic problem (e.g., it is time consuming to find references), someone will reply that “time fixes everything”. A more sophisticated way to express this belief is to say that systems are self-correcting.
…
Here is my response:
From above: “… What the librarians had come up with were terrible time-consuming systems. It took an outsider (Berners-Lee) to invent the Web….”
Really?
You mean the librarians who had been working on digital retrieval since the late 1940’s and subject retrieval longer than that? Those librarians?
With the web, every user repeats the search effort of others. Why isn’t repeating the effort of others a “terrible time-consuming system?”
BTW, Berners-Lee invented allowing 404s for hyperlinks. Significant because it lowered the overhead of hyperlinking enough to be practical. It was other CS types with high overhead hyperlinking. Not librarians.
Berners-Lee fixed hyperlinking maintenance, failed and continues to fail on IR. Or have you not noticed?
I won’t amplify my answer here but will wait to see what happens to my comment at Daniel’s blog.
Data-Intensive Librarians for Data-Intensive Research by Chelcie Rowell.
From the post:
A packed house heard Tony Hey and Clifford Lynch present on The Fourth Paradigm: Data-Intensive Research, Digital Scholarship and Implications for Libraries at the 2012 ALA Annual Conference.
Jim Gray coined The Fourth Paradigm in 2007 to reflect a movement toward data-intensive science. Adapting to this change would, Gray noted, require an infrastructure to support the dissemination of both published work and underlying research data. But the return on investment for building the infrastructure would be to accelerate the transformation of raw data to recombined data to knowledge.
In outlining the current research landscape, Hey and Lynch underscored how right Gray was.
Hey led the audience on a whirlwind tour of how scientific research is practiced in the Fourth Paradigm. He showcased several projects that manage data from capture to curation to analysis and long-term preservation. One example he mentioned was the Dataverse Network Project that is working to preserve diverse scholarly outputs from published work to data, images and software.
Lynch reflected on the changing nature of the scientific record and the different collaborative structures that will be needed to define, generate and preserve that record. He noted that we tend to think of the scholarly record in terms of published works. In light of data-intensive science, Lynch said the definition must be expanded to include the datasets which underlie results and the software required to render data.
I wasn’t able to find a video of the presentations and/or slides but while you wait for those to appear, you can consult the homepages of Lynch and Hey for related materials.
Librarians already have searching and bibliographic skills, which are appropriate to the Fourth Paradigm.
What if they were to add big data design, if not processing, skills to their resumes?
What if articles in professional journals carried a byline in addition to the authors: Librarian(s): ?
MaRC and SolrMaRC by Owen Stephens.
From the post:
At the recent Mashcat event I volunteered to do a session called ‘making the most of MARC’. What I wanted to do was demonstrate how some of the current ‘resource discovery’ software are based on technology that can really extract value from bibliographic data held in MARC format, and how this creates opportunities for in both creating tools for users, and also library staff.
One of the triggers for the session was seeing, over a period of time, a number of complaints about the limitations of ‘resource discovery’ solutions – I wanted to show that many of the perceived limitations were not about the software, but about the implementation. I also wanted to show that while some technical knowledge is needed, some of these solutions can be run on standard PCs and this puts the tools, and the ability to experiment and play with MARC records, in the grasp of any tech-savvy librarian or user.
Many of the current ‘resource discovery’ solutions available are based on a search technology called Solr – part of a project at the Apache software foundation. Solr provides a powerful set of indexing and search facilities, but what makes it especially interesting for libraries is that there has been some significant work already carried out to use Solr to index MARC data – by the SolrMARC project. SolrMARC delivers a set of pre-configured indexes, and the ability to extract data from MARC records (gracefully handling ‘bad’ MARC data – such as badly encoded characters etc. – as well). While Solr is powerful, it is SolrMARC that makes it easy to implement and exploit in a library context.
SolrMARC is used by two open source resource discovery products – VuFind and Blacklight. Although VuFind and Blacklight have differences, and are written in different languages (VuFind is PHP while Blacklight is Ruby), since they both use Solr and specifically SolrMARC to index MARC records the indexing and search capabilities underneath are essentially the same. What makes the difference between implementations is not the underlying technology but the configuration. The configuration allows you to define what data, from which part of the MARC records, goes into which index in Solr.
Owen explains his excitement over these tools as:
These tools excite me for a couple of reasons:
- A shared platform for MARC indexing, with a standard way of programming extensions gives the opportunty to share techniques and scripts across platforms – if I write a clever set of bean shell scripts to calculate page counts from the 300 field (along the lines demonstrated by Tom Meehan in another Mashcat session), you can use the same scripts with no effort in your SolrMARC installation
- The ability to run powerful, but easy to configure, search tools on standard computers. I can get Blacklight or VuFind running on a laptop (Windows, Mac or Linux) with very little effort, and I can have a few hundred thousand MARC records indexed using my own custom routines and searchable via an interface I have complete control over
I like the “geek” appeal of #2, but creating value-add interfaces for the casual user is more likely to attract positive PR for a library.
As far as #1, how uniform are the semantics of MARC fields?
I suspect physical data, page count, etc., are fairly stable/common, what about more subjective fields? How would you test that proposition?
Are librarians choosing to disappear from the information & knowledge delivery process? by Carl Grant.
Carl Grant writes:
As librarians, we frequently strive to connect users to information as seamlessly as possible. A group of librarians said to me recently: “As librarian intermediation becomes less visible to our users/members, it seems less likely it is that our work will be recognized. How do we keep from becoming victims of our own success?”
This is certainly not an uncommon question or concern. As our library collections have become virtual and as we increasingly stop housing the collections we offer, there is a tendency to see us as intermediaries serving as little more than pipelines to our members. We have to think about where we’re adding value to that information so that when delivered to the user/member that value is recognized. Then we need to make that value part of our brand. Otherwise, as stated by this concern, librarians become invisible and that seems to be an almost assured way to make sure our funding does the same. As evidenced by this recently updated chart on the Association of Research Libraries website, this seems to be the track we are on currently:
I ask Carl’s question more directly to make it clear that invisibility is a matter of personal choice for librarians.
Vast institutional and profession wide initiatives are needed but those do not relieve librarians of the personal responsibility for demonstrating the value add of library services in their day to day activities.
It is the users of libraries, those whose projects, research and lives are impacted by librarians who can (and will) come to the support of libraries and librarians, but only if asked and only if librarians stand out as the value-adds in libraries.
Without librarians, libraries may as well be random crates of books. (That might be a good demonstration of the value-add of libraries by the way.) All of the organization, retrieval, and other value adds are present due to librarians. Makes that and other value adds visible. Market librarians as value adds at every opportunity.
At the risk of quoting too much, Grant gives a starting list of value adds for librarians:
… Going forward, we should be focusing on more fine-grained service goals and objectives and then selecting technology that supports those goals/objectives. For instance, in today’s (2012) environment, I think we should be focusing on providing products that support these types of services:
- Access to the library collections and services from any device, at anytime from anywhere. (Mobile products)
- Massive aggregates of information that have been selected for inclusion because of their quality by either: a) librarians, or b) filtered by communities of users through ranking systems and ultimately reviewed and signed-off by librarians for final inclusion in those aggregates. (Cloud computing products are the foundation technology here)
- Discovery workbenches or platforms that allow the library membership to discover existing knowledge and build new knowledge in highly personalized manners. (Discovery products serve as the foundation, but they don’t yet have the necessary extensions)
- Easy access and integration of the full range of library services into other products they use frequently, such as course or learning management systems, social networking, discussion forums, etc. (Products that offer rich API’s, extensive support of Apps and standards to support all types of other extensions)
- Contextual support, i.e. the ability for librarianship to help members understand the environment in which a particular piece of work was generated (for instance, Mark Twain’s writings, or scientific research—is this a peer reviewed publication? Who funded it and what are their biases?) is an essential value-add we provide. Some of this is conveyed by the fact that the item is in collections we provide access to, but other aspects of this will require new products we’ve yet to see.
- Unbiased information. I’ve written about this in another post and I strongly believe we aren’t conveying the distinction we offer our members by providing access to knowledge that is not biased by constructs based on data unknown and inaccessible to them. This is a huge differentiator and we must promote and ensure is understood. If we do decide to use filtering technologies, and there are strong arguments this is necessary to meet the need of providing “appropriate” knowledge, then we should provide members with the ability to see and/or modify the data driving that filtering. I’ve yet to see the necessary technology or products that provides good answers here.
- Pro-active services (Analytics). David Lankes makes the point in many of his presentations (here is one) that library services need to be far more pro-active. He and I couldn’t agree more. We need go get out there in front of our member needs. Someone is up for tenure? Let’s go to their office. Find out what they need and get it to them. (Analytic tools, coupled with massive aggregates of data are going to enable us to do this and a lot more.)
Which of these are you going to reduce down to an actionable item for discussion with your supervisor this week? It is really that simple. The recognition of the value add of librarians is up to you.
The Lady Librarian of Toronto
Cynthia Murrell writes:
Ah, the good old days. Canada’s The Globe and Mail profiles a powerhouse of a librarian who recently passed away at the age of 100 in “When Lady Librarians Always Wore Skirts and You didn’t Dare Make Noise.” When Alice Moulton began her career, libraries were very different than they are today. Writer Judy Stoffman describes:
When Alice Moulton went to work at the University of Toronto library in 1942, libraries were forbidding, restricted spaces organized around the near-sacred instrument known as the card catalogue. They were ruled by a chief librarian, always male, whose word was law. Staff usually consisted of prim maiden ladies, dressed in skirts and wearing serious glasses, like the character played by Donna Reed in It’s a Wonderful Life, in the alternate life she would have had without Jimmy Stewart.
The article about Alice Moulton is very much worth reading.
True enough that libraries are different today than they were say forty or more years ago, but not all of that has been for the good.
Libraries in my memory were places where librarians, who are experts at uncovering information, would assist patrons find more information than they thought possible. Often teaching the techniques necessary to use arcane publications to do so.
Make no mistake, librarians still fulfill that role in many places but it is a popular mistake among funders to think that searching the WWW should be good enough for anyone. Why spend extra money for reference services?
True, if you are interested in superficial information, then by all means, use the WWW. Ask your doctor to consult it on your next visit. Or your lawyer. Good enough for anyone else, should be good enough for you.
I read posts about “big data” everyday and post reports of some of them here. It will take technological innovations to master “big data,” but that is only part of the answer.
To find useful information in the universe of “big data,” we are going to need something else.
The something else is going to be librarians like Alice Moulton, who can find resources we never imagined existed.
Ned Potter outlines a call to arms for librarians!
Librarians need to aggressively make the case for libraries…., but I would tweak Ned’s message a bit.
Once upon a time, being the best, most complete, skilled, collection point or guide to knowledge was enough for libraries. People knew libraries and education were their tickets to economic/social mobility, out of the slums, to a better life.
Today people are mired in the vast sea of the “middle-class” and information is pushed upon willing and/or unwilling information consumers. Infotainment, advertising, spam of all types, vying for our attention, with little basis for distinguishing the useful from the useless, the important from the idiotic and graceful from the graceless.
Libraries and librarians cannot be heard in the vortex of noise that surrounds the average information consumer, while passively waiting for a question or reference interview.
Let’s drop pose of passivity. Librarians are passionate about libraries and the principles they represent.
Become information pushers. Bare-fisted information brawlers who fight for the attention of information consumers.
Push information on news channels, politicians, even patrons. When a local story breaks, feed the news sources with background material and expect credit for it. Same for political issues. Position papers that explore both sides of issues. Not bland finding aids but web-based multi-media resources with a mixture of web and more traditional resources.
Information consumers can be dependent on the National Enquirer, Jon Stewart, Rupert Murdock, or libraries/librarians. Your choice.
CENDI: Federal STI Managers Group
From the webpage:
Welcome to the CENDI web site
CENDI’s vision is to provide its member federal STI agencies a cooperative enterprise where capabilities are shared and challenges are faced together so that the sum of accomplishments is greater than each individual agency can achieve on its own.
CENDI’s mission is to help improve the productivity of federal science- and technology-based programs through effective scientific, technical, and related information-support systems. In fulfilling its mission, CENDI agencies play an important role in addressing science- and technology-based national priorities and strengthening U.S. competitiveness.
CENDI is an interagency working group of senior scientific and technical information (STI) managers from 14 U.S. federal agencies:
- Defense Technical Information Center (Department of Defense)
- Office of Research and Development & Office of Environmental Information (Environmental Protection Agency)
- Government Printing Office
- Library of Congress
- NASA Scientific and Technical Information Program
- National Agricultural Library (Department of Agriculture)
- National Archives and Records Administration
- National Library of Education (Department of Education)
- National Library of Medicine (Department of Health and Human Services)
- National Science Foundation
- National Technical Information Service (Department of Commerce)
- National Transportation Library (Department of Transportation)
- Office of Scientific and Technical Information (Department of Energy)
- USGS/Core Science Systems (Department of Interior)
These programs represent over 97% of the federal research and development budget.
The CENDI web site is hosted by the Defense Technical Information Center (DTIC), and is maintained by the CENDI secretariat. (emphasis added)
Yeah, I thought the 97% figure would catch your attention. 😉 Not sure how it compares with spending on IT and information systems in law enforcement and the spook agencies.
Topic Maps Class Project: Select one of the fourteen members and prepare a report for the class on their primary web interface. What did you like/dislike about the interface? How would you integrate the information you found there with your “home” library site (for students already employed elsewhere) or with the GSLIS site?
BTW, I think you will find that these agencies and their personnel have bee thinking deeply about information integration for decades. It is an extremely difficult problem that has no fixed or easy solution.
Linked Literature, Linked TV – Everything Looks like a Graph
From the post:
…When do graphs become maps?
I report here on some experiments that stem from two collaborations around Linked Data. All the visuals in the post are views of bibliographic data, based on similarity measures derrived from book / subject keyword associations, with visualization and a little additional analysis using Gephi. Click-through to Flickr to see larger versions of any image. You can’t always see the inter-node links, but the presentation is based on graph layout tools.
Firstly, in my ongoing work in the NoTube project, we have been working with TV-related data, ranging from ‘social Web’ activity streams, user profiles, TV archive catalogues and classification systems like Lonclass. Secondly, over the summer I have been working with the Library Innovation Lab at Harvard, looking at ways of opening up bibliographic catalogues to the Web as Linked Data, and at ways of cross-linking Web materials (e.g. video materials) to a Webbified notion of ‘bookshelf‘.
I like the exploratory perspective of this post.
What other data could you link to materials in a library holding?
Since I live in the Deep South, what if entries in the library catalog on desegregation had links to local residents who participated in civil rights (or resisted) activities? The stories of the leadership are well known. What about all the thousands of others who played their own parts, without being sought after by PBS during Pledge week years later?
Or people who resisted the draft, were interred during WW II, by the Axis or Allied Powers, or who were missile launch officers, sworn to “turn the keys” on receipt of a valid launch order.
Would that help make your library a more obvious resource of community and continuity?
An interesting bibliographic/library blog that I encountered. Posts on URLs, microdata, etc.
Powered by WordPress