Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 31, 2015

Self-Censorship and Terrorism (Hosting Taliban material)

Filed under: Censorship,Government,Library — Patrick Durusau @ 4:53 pm

British Library declines Taliban archive over terror law fears

From the BBC:

The British Library has declined to store a large collection of Taliban-related documents because of concerns regarding terrorism laws.

The collection, related to the Afghan Taliban, includes official newspapers, maps and radio broadcasts.

Academics have criticised the decision saying it would be a valuable resource to understand the ongoing insurgency in Afghanistan.

The library said it feared it could be in breach of counter-terrorism laws.

It said it had been legally advised not to make the material accessible.

The Terrorism Acts of 2000 and 2006 make it an offence to “collect material which could be used by a person committing or preparing for an act of terrorism” and criminalise the “circulation of terrorist publications”.

The Home Office declined to comment saying it was a matter for library.

Of course the Home Office has no comment. The more it can bully people and institutions into self-censorship the better.

A number of academics have pointed out the absurdity of the decision. But there is some risk and most institutions are “risk adverse,” which also explains why governments tremble at the thought “terrorist publications.”

While governments and some libraries try to outdo each other in terms of timidity, the rest of us should be willing to take that risk. Take that risk for freedom of inquiry and the sharing of knowledge. Putting a finger in the eye of timid governments and institutions strikes me as a good reason as well.

No promises but perhaps individuals offering to and hosting parts of the Taliban collection will shame timid institutions into hosting it and similar collections (like the alleged torrents of pro-Islamic State tweets).

I am willing to host some material from the Taliban archive. It doesn’t have to be the interesting parts (which everyone will want).

Are you?

PS: No, I’m not a Taliban sympathizer, at least in so far as I understand what the Taliban represents. I am deeply committed to enabling others to reach their own conclusions based on evidence about the Taliban and others. We might agree and we might not. That is one of the exciting (government drones read “dangerous”) aspects of intellectual freedom.

July 9, 2015

Digital Bodleian

Filed under: History,Library — Patrick Durusau @ 2:20 pm

I know very little of what there is to be known about the Bodleian Library but as soon as I saw Digital Bodleian, I had to follow the link.

As of today, there are 115,179 images and more are on their way. Check the collections frequently and for new collections as well.

One example that is near and dear to me:

Exploring Egypt in the 19th Century

The popup reads:

A complete facsimile of publications from the early-nineteeth-century expeditions to Egypt by Champollion and Rosellini.

The growth of “big data” isn’t just from the production of new data but from the digitization of existing collections as well.

Now the issue is how to collate copies of inscriptions by Champollion in these works with much later materials. So that a scholar finding one such resource will be automatically made aware of the others.

That may not sound like a difficult task but given the amount of material published every year, it remains a daunting one.

June 17, 2015

Comprehensive Index of Legal Reports (Law Library of Congress)

Filed under: Law,Law - Sources,Librarian/Expert Searchers,Library — Patrick Durusau @ 4:56 pm

Comprehensive Index of Legal Reports (Law Library of Congress)

From the announcement that came via email:

In an effort to highlight the legal reports produced by the Law Library of Congress, we have revamped our display of the reports on our website.

The new Comprehensive Index of Legal Reports will house all reports available on our website. This will also be the exclusive location to find reports written before 2011.

The reports listed on the Comprehensive Index page are divided into specific topics designed to point you to the reports of greatest interest and relevance. Each report listed is under only one topic and several topics are not yet filled (“forthcoming”). We plan to add many reports from our archives to this page over the next few months, filling in all of the topics.

The Current Legal Topics page (http://www.loc.gov/law/help/current-topics.php) will now only contain the most current reports. The list of reports by topic also includes a short description explaining what you will find in each report.

No links will be harmed in this change, so any links you have created to individual reports will continue to work. Just remember to add http://loc.gov/law/help/legal-reports.php as a place to find research, especially of a historical nature, and http://loc.gov/law/help/current-topics.php to find recently written reports.

There are US entities that rival the British Library and the British Museum. The Library of Congress is one of those, as is the Law Library of Congress (the law library is a part of the Library of Congress but merits separate mention).

Every greedy, I would like to see something similar for the Congressional Research Service.

From the webpage:

The Congressional Research Service (CRS) works exclusively for the United States Congress, providing policy and legal analysis to committees and Members of both the House and Senate, regardless of party affiliation. As a legislative branch agency within the Library of Congress, CRS has been a valued and respected resource on Capitol Hill for more than a century.

CRS is well-known for analysis that is authoritative, confidential, objective and nonpartisan. Its highest priority is to ensure that Congress has 24/7 access to the nation’s best thinking.

Imagine US voters being given “…analysis that is authoritative, …, objective and nonpartisan,” analysis that they are paying for today and have for more than the last century.

I leave it to your imagination why Congress would prefer to have “confidential” reports that aren’t available to ordinary citizens. Do you prefer incompetence or malice?

June 4, 2015

The Archive Is Closed [Library of Congress Twitter Archive]

Filed under: Library,Tweets,Twitter — Patrick Durusau @ 1:57 pm

The Archive Is Closed by Scott McLemee.

From the post:

Five years ago, this column looked into scholarly potential of the Twitter archive the Library of Congress had recently acquired. That potential was by no means self-evident. The incensed “my tax dollars are being used for this?” comments practically wrote themselves, even without the help of Twitter bots.

For what — after all — is the value of a dead tweet? Why would anyone study 140-character messages, for the most part concerning mundane and hyperephemeral topics, with many of them written as if to document the lowest possible levels of functional literacy?
As I wrote at the time, papers by those actually doing the research treated Twitter as one more form of human communication and interaction. The focus was not on the content of any specific message, but on the patterns that emerged when they were analyzed in the aggregate. Gather enough raw data, apply suitable methods, and the results could be interesting. (For more detail, see the original discussion.)

The key thing was to have enough tweets on hand to grind up and analyze. So, yes, an archive. In the meantime, the case for tweet preservation seems easier to make now that elected officials, religious leaders and major media outlets use Twitter. A recent volume called Twitter and Society (Peter Lang, 2014) collects papers on how politics, journalism, the marketplace and (of course) academe itself have absorbed the impact of this high-volume, low-word-count medium.

As far as the Library of Congress archive, Scott reports:


The Library of Congress finds itself in the position of someone who has agreed to store the Atlantic Ocean in his basement. The embarrassment is palpable. No report on the status of the archive has been issued in more than two years, and my effort to extract one elicited nothing but a statement of facts that were never in doubt.

“The library continues to collect and preserve tweets,” said Gayle Osterberg, the library’s director of communications, in reply to my inquiry. “It was very important for the library to focus initially on those first two aspects — collection and preservation. If you don’t get those two right, the question of access is a moot point. So that’s where our efforts were initially focused and we are pleased with where we are in that regard.”

That’s as helpful as the responses I get about the secret ACM committee that determines the fate of feature requests for the ACM digital library. You can’t contact them directly nor can you find any record of their discussions/decisions.

Let’s hope greater attention and funding can move the Library of Congress Twitter Archive towards public access, for all the reasons enumerated by Scott.

One does have to wonder, given the role of the U.S. government in pushing for censorship of Twitter accounts, will the Library of Congress archive be complete and free from censorship? Or will it have dark spots depending upon the whims and caprices of the current regime?

May 13, 2015

New: Library of Congress Demographic Group Terms (LCDGT)

Filed under: Demographics,Library,Vocabularies — Patrick Durusau @ 1:51 pm

From an email:

As part of its ongoing effort to provide effective access to library materials, the Library of Congress is developing a new vocabulary, entitled Library of Congress Demographic Group Terms (LCDGT). This vocabulary will be used to describe the creators of, and contributors to, resources, and also the intended audience of resources. It will be created and maintained by the Policy and Standards Division, and be distinct from the other vocabularies that are maintained by that division: Library of Congress Subject Headings (LCSH), Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT), and the Library of Congress Medium of Performance Thesaurus for Music (LCMPT).

A general rationale for the development of LCDGT, information about the pilot vocabulary, and a link to the Tentative List of terms in the pilot may be found on LC’s Acquisitions and Bibliographic Access website at http://www.loc.gov/catdir/cpso/lcdgt-announcement.html.

The Policy and Standards Division is accepting comments on the pilot vocabulary and the principles guiding its development through June 5, 2015. Comments may be sent to Janis L. Young at jayo@loc.gov.

A follow-up question to this post asked:

Is there a list of the codes used in field 072 in these lists? Some I can figure out, but it would be nice to see a list of the categories you’re using.

The list in question is: DEMOGRAPHIC GROUP TERMS.

To which Adam Schiff replied:

The list of codes is in http://www.loc.gov/catdir/cpso/lcdgt-principles.pdf and online at http://www.loc.gov/standards/valuelist/lcdgt.html (although the latter is still lacking a few of the codes found in the former).

Enjoy!

May 8, 2015

Digital Approaches to Hebrew Manuscripts

Filed under: Digital Research,Humanities,Library,Manuscripts — Patrick Durusau @ 7:48 pm

Digital Approaches to Hebrew Manuscripts

Monday 18th – Tuesday 19th of May 2015

From the webpage:

We are delighted to announce the programme for On the Same Page: Digital Approaches to Hebrew Manuscripts at King’s College London. This two-day conference will explore the potential for the computer-assisted study of Hebrew manuscripts; discuss the intersection of Jewish Studies and Digital Humanities; and share methodologies. Amongst the topics covered will be Hebrew palaeography and codicology, the encoding and transcription of Hebrew texts, the practical and theoretical consequences of the use of digital surrogates and the visualisation of manuscript evidence and data. For the full programme and our Call for Posters, please see below.

Organised by the Departments of Digital Humanities and Theology & Religious Studies (Jewish Studies)
Co-sponsor: Centre for Late Antique & Medieval Studies (CLAMS), King’s College London

I saw this at the blog for DigiPal: Digital Resource and Database of Palaeolography, Manuscript Studies and Diplomatic. Confession, I have never understood how the English derive acronyms and this confounds me as much as you. 😉

Be sure to look around at the DigiPal site. There are numerous manuscript images, annotation techniques, and other resources for those who foster scholarship by contributing to it.

May 5, 2015

One Subject, Three Locators

Filed under: Identifiers,Library,Topic Maps — Patrick Durusau @ 2:01 pm

As you may know, the Library of Congress actively maintains its subject headings. Not surprising to anyone other than purveyors of fixed ontologies. New subjects appear, terminology changes, old subjects have new names, etc.

The Subject Authority Cooperative Program (SACO) has a mailing list:

About the SACO Listserv (sacolist@loc.gov)

The SACO Program welcomes all interested parties to subscribe to the SACO listserv. This listserv was established first and foremost to facilitate communication with SACO contributors throughout the world. The Summaries of the Weekly Subject Editorial Review Meeting are posted to enable SACO contributors to keep abreast of changes and know if proposed headings have been approved or not. The listserv may also be used as a vehicle to foster discussions on the construction, use, and application of subject headings. Questions posted may be answered by any list member and not necessarily by staff in the Cooperative Programs Section (Coop) or PSD. Furthermore, participants are encouraged to provide comments, share examples, experiences, etc.

On the list this week was the question:

Does anyone know how these three sites differ as sources for consulting approved subject lists?

http://www.loc.gov/aba/cataloging/subject/weeklylists/

http://www.loc.gov/aba/cataloging/subject/

http://classificationweb.net/approved-subjects/

Janis L. Young, Policy and Standards Division, Library of Congress replied:

Just to clarify: all of the links that you and Paul listed take you to the same Approved Lists. We provide multiple access points to the information in order to accommodate users who approach our web site in different ways.

Depending upon your goals, the Approved Lists could be treated as a subject that has three locators.

May 1, 2015

OPenn

Filed under: Archives,Library,Manuscripts — Patrick Durusau @ 7:29 pm

OPenn: Primary Digital Resources Available to All through Penn Libraries’ New Online Platform by Jessie Dummer.

From the post:

The Penn Libraries and the Schoenberg Institute for Manuscript Studies are thrilled to announce the launch of OPenn: Primary Resources Available to Everyone (http://openn.library.upenn.edu), a new website that makes digitized cultural heritage material freely available and accessible to the public. OPenn is a major step in the Libraries’ strategic initiative to embrace open data, with all images and metadata on this site available as free cultural works to be freely studied, applied, copied, or modified by anyone, for any purpose. It is crucial to the mission of SIMS and the Penn Libraries to make these materials of great interest and research value easy to access and reuse. The OPenn team at SIMS has been working towards launching the website for the past year. Director Will Noel’s original idea to make our Medieval and Renaissance manuscripts open to all has grown into a space where the Libraries can collaborate with other institutions who want to open their data to the world.

Images of the manuscripts are currently available on OPenn at full resolution, with derivatives also provided for easy reuse on the web. Downloading, whether several select images or the entire dataset, is easily accomplished by following instructions or recipes posted in the Technical Read Me on OPenn. The website is designed to be machine-readable, but easy for individuals to use, too.

Oh, the manuscripts themselves? http://openn.library.upenn.edu/html/LJSchoenbergManuscripts.html.

Licensing is a real treat:

All images and their contents from the Lawrence J. Schoenberg Collection are free of known copyright restrictions and in the public domain. See the Creative Commons Public Domain Mark page for more information on terms of use:

Unless otherwise stated, all manuscript descriptions and other cataloging metadata are ©2015 The University of Pennsylvania Libraries. They are licensed for use under a Creative Commons Attribution Licensed version 4.0 (CC-BY-4.0):

For a description of the terms of use see, the Creative Commons Deed:

In substance and licensing such a departure from academic societies that still consider comping travel and hotel rooms as “fostering scholarship.” “Ye shall know them by their fruits.” (Matthew 7:16)

April 9, 2015

Almost a Topic Map? Or Just a Mashup?

Filed under: Digital Library,Library,Mashups,Topic Maps — Patrick Durusau @ 4:34 pm

WikipeDPLA by Eric Phetteplace.

From the webpage:

See relevant results from the Digital Public Library of America on any Wikipedia article. This extension queries the DPLA each time you visit a Wikipedia article, using the article’s title, redirects, and categories to find relevant items. If you click a link at the top of the article, it loads in a series of links to the items. The original code behind WikiDPLA was written at LibHack, a hackathon at the American Library Association’s 2014 Midwinter Meeting in Philadelphia: http://www.libhack.org/.

Google Chrome App Home Page

GitHub page

Wikipedia:The Wikipedia Library/WikipeDPLA

How you resolve the topic map versus mashup question depends on how much precision you expect from a topic map. While knowing additional places to search is useful, I never have a problem with assembling more materials than can be read in the time allowed. On the other hand, some people may need more prompting than others, so I can’t say that general references are out of bounds.

Assuming you were maintaining data sets with locally unique identifiers, using a modification of this script to query an index of all local scripts (say Pig scripts) to discover other scripts using the same data could be quite useful.

BTW, you need to have a Wikipedia account and be logged in for the extension to work. Or at least that was my experience.

Enjoy!

March 6, 2015

Galleries, Libraries, Archives, and Museums (GLAM CC Licensing)

Filed under: Archives,Library,Museums — Patrick Durusau @ 4:00 pm

Galleries, Libraries, Archives, and Museums (GLAM CC Licensing)

A very extensive list of galleries, libraries, archives, and museums (GLAM) that are using CC licensing.

A good resource to have at hand if you need to argue for CC licensing with your gallerys, library, archive, or museum.

I first saw this in a tweet by Adrianne Russell.

Update: Resource List for March 5 Open Licensing Online Program

Data Visualization as a Communication Tool

Filed under: Graphics,Library,Visualization — Patrick Durusau @ 3:53 pm

Data Visualization as a Communication Tool by Susan [Gardner] Archambault, Joanne Helouvry, Bonnie Strohl, and Ginger Williams.

Abstract:

This paper provides a framework for thinking about meaningful data visualization in ways that can be applied to routine statistics collected by libraries. An overview of common data display methods is provided, with an emphasis on tables, scatter plots, line charts, bar charts, histograms, pie charts, and infographics. Research on “best practices” in data visualization design is presented as well as a comparison of free online data visualization tools. Different data display methods are best suited for different quantitative relationships. There are rules to follow for optimal data visualization design. Ten free online data visualization tools are recommended by the authors.

Good review of basic visualization techniques with an emphasis on library data. You don’t have to be in Tufte‘s league in order to make effective data visualizations.

January 19, 2015

D-Lib Magazine January/February 2015

Filed under: Data Science,Librarian/Expert Searchers,Library — Patrick Durusau @ 8:37 pm

D-Lib Magazine January/February 2015

From the table of contents (see the original toc for abstracts):

Editorials

2nd International Workshop on Linking and Contextualizing Publications and Datasets by Laurence Lannom, Corporation for National Research Initiatives

Data as “First-class Citizens” by Łukasz Bolikowski, ICM, University of Warsaw, Poland; Nikos Houssos, National Documentation Centre / National Hellenic Research Foundation, Greece; Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy and Jochen Schirrwagen, Bielefeld University Library, Germany

Articles

Semantic Enrichment and Search: A Case Study on Environmental Science Literature by Kalina Bontcheva, University of Sheffield, UK; Johanna Kieniewicz and Stephen Andrews, British Library, UK; Michael Wallis, HR Wallingford, UK

A-posteriori Provenance-enabled Linking of Publications and Datasets via Crowdsourcing by Laura Drăgan, Markus Luczak-Rösch, Elena Simperl, Heather Packer and Luc Moreau, University of Southampton, UK; Bettina Berendt, KU Leuven, Belgium

A Framework Supporting the Shift from Traditional Digital Publications to Enhanced Publications by Alessia Bardi and Paolo Manghi, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy

Science 2.0 Repositories: Time for a Change in Scholarly Communication by Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano, Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, Italy

Data Citation Practices in the CRAWDAD Wireless Network Data Archive by Tristan Henderson, University of St Andrews, UK and David Kotz, Dartmouth College, USA

A Methodology for Citing Linked Open Data Subsets by Gianmaria Silvello, University of Padua, Italy

Challenges in Matching Dataset Citation Strings to Datasets in Social Science by Brigitte Mathiak and Katarina Boland, GESIS — Leibniz Institute for the Social Sciences, Germany

Enabling Living Systematic Reviews and Clinical Guidelines through Semantic Technologies by Laura Slaughter; The Interventional Centre, Oslo University Hospital (OUS), Norway; Christopher Friis Berntsen and Linn Brandt, Internal Medicine Department, Innlandet Hosptial Trust and MAGICorg, Norway and Chris Mavergames, Informatics and Knowledge Management Department, The Cochrane Collaboration, Germany

Data without Peer: Examples of Data Peer Review in the Earth Sciences by Sarah Callaghan, British Atmospheric Data Centre, UK

The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite by Jan Brase and Irina Sens, German National Library of Science and Technology, Germany and Michael Lautenschlager, German Climate Computing Centre, Germany

New Events

N E W S   &   E V E N T S

In Brief: Short Items of Current Awareness

In the News: Recent Press Releases and Announcements

Clips & Pointers: Documents, Deadlines, Calls for Participation

Meetings, Conferences, Workshops: Calendar of Activities Associated with Digital Libraries Research and Technologies

The quality of D-Lib Magazine meets or exceeds the quality claimed by pay-per-view publishers.

Enjoy!

January 7, 2015

Harvard Library adopts LibraryCloud

Filed under: Library,Library software — Patrick Durusau @ 8:37 pm

Harvard Library adopts LibraryCloud by David Weinberger.

From the post:

According to a post by the Harvard Library, LibraryCloud is now officially a part of the Library toolset. It doesn’t even have the word “pilot” next to it. I’m very happy and a little proud about this.

LibraryCloud is two things at once. Internal to Harvard Library, it’s a metadata hub that lets lots of different data inputs be normalized, enriched, and distributed. As those inputs change, you can change LibraryCloud’s workflow process once, and all the apps and services that depend upon those data can continue to work without making any changes. That’s because LibraryCloud makes the data that’s been input available through an API which provides a stable interface to that data. (I am overstating the smoothness here. But that’s the idea.)

To the Harvard community and beyond, LibraryCloud provides open APIs to access tons of metadata gathered by Harvard Library. LibraryCloud already has metadata about 18M items in the Harvard Library collection — one of the great collections — including virtually all the books and other items in the catalog (nearly 13M), a couple of million of images in the VIA collection, and archives at the folder level in Harvard OASIS. New data can be added relatively easily, and because LibraryCloud is workflow based, that data can be updated, normalized and enriched automatically. (Note that we’re talking about metadata here, not the content. That’s a different kettle of copyrighted fish.)

LibraryCloud began as an idea of mine (yes, this is me taking credit for the idea) about 4.5 years ago. With the help of the Harvard Library Innovation Lab, which I co-directed until a few months ago, we invited in local libraries and had a great conversation about what could be done if there were an open API to metadata from multiple libraries. Over time, the Lab built an initial version of LibraryCloud primarily with Harvard data, but with scads of data from non-Harvard sources. (Paul Deschner, take many many bows. Matt Phillips, too.) This version of LibraryCloud — now called lilCloud — is still available and is still awesome.

Very impressive news from Harvard!

Plus, the LibraryCloud is open source!

Documentation. Well, that’s the future home of the documentation. For now, the current documentation is on Google Doc: LibraryCloud Item API

Overview:

The LibraryCloud Item API provides access to metadata about items in the Harvard Library collections. For the purposes of this API, an “item” is the metadata describing a catalog record within the Harvard Library.

Enjoy!

Serendipity in the Stacks:…

Filed under: Library,Serendipity — Patrick Durusau @ 3:45 pm

Serendipity in the Stacks: Libraries, Information Architecture, and the Problems of Accidental Discovery by Patrick L. Carr.

Abstract:

Serendipity in the library stacks is generally regarded as a positive occurrence. While acknowledging its benefits, this essay draws on research in library science, information systems, and other fields to argue that, in two important respects, this form of discovery can be usefully framed as a problem. To make this argument, the essay examines serendipity both as the outcome of a process situated within the information architecture of the stacks and as a user perception about that outcome.

A deeply dissatisfying essay on serendipity as evidenced in the author’s conclusion that reads in part:

While acknowledging the validity of Morville’s points, I nevertheless believe that, along with its positive aspects, serendipity in the stacks can be usefully framed as a problem. From a process-based standpoint, serendipity is problematic because it is an indicator of a potential misalignment between user intention and process outcome. And, from a perception-based standpoint, serendipity is problematic because it can encourage user-constructed meanings for libraries that are rooted in opposition to change rather than in users’ immediate and evolving information needs.

To illustrate the “…potential misalignment between user intention and process outcome,” Carr uses the illustration of a user looking for a specific volume by call number but the absence of the book for its location, results in the discovery of an even more useful book nearby. That Carr describes as:

Even if this information were to prove to be more valuable to the user than the information in the book that was sought, the user’s serendipitous discovery nevertheless signifies a misalignment of user intention and process outcome.

Sorry, that went by rather quickly. If the user considers the discovery to be a favorable outcome, why should we take Carr’s word that it “signifies a misalignment of user intention and process outcome?” What other measure for success should an information retrieval system have other than satisfaction of its users? What other measure would be meaningful?

Carr refuses to consider how libraries could seem to maximize what is seen as a positive experience by users because:

By situating the library as a tool that functions to facilitate serendipitous discovery in the stacks, librarians risk also situating the library as a mechanism that functions as a symbolic antithesis to the tools for discovery that are emerging in online environments. In this way, the library could signify a kind of bastion against change. Rather than being cast as a vital tool for meeting discovery needs in emergent online environments, the library could be marginalized in a way that suggests to users that they perceive it as a means of retreat from online environments.

I don’t doubt the same people who think librarians are superflous since “everyone can find what they need on the Internet” would be quick to find libraries as being “bastion[s] against change.” For any number of reasons. But the opinions of semi-literates should not dictate library policy.

What Carr fails to take into account is that a stacks “environment,” which he concedes does facilitate serendipitous discovery, can be replicated in digital space.

For example, while it is currently a prototype, StackLife at Harvard is an excellent demonstration of a virtual stack environment.

stacklife

Jonathan Zittrain, Vice-Dean for Library and Information Resources, Harvard Law School; Professor of Law at Harvard Law School and the Harvard Kennedy School of Government; Professor of Computer Science at the Harvard School of Engineering and Applied Sciences; Co-founder of the Berkman Center for Internet & Society, nominated StackLife for Stanford Prize for Innovation in Research Libraries, saying in part:

  • It always shows a book (or other item) in a context of other books.
  • That context is represented visually as a scrollable stack of items — a shelf rotated so that users can more easily read the information on the spines.
  • The stack integrates holdings from multiple libraries.
  • That stack is sorted by “StackScore,” a measure of how often the library’s community has used a book. At the Harvard Library installation, the computation includes ten year aggregated checkouts weighted by faculty, grad, or undergrad; number of holdings in the 73 campus libraries, times put on reserve, etc.
  • The visualization is simple and clean but also information-rich. (a) The horizontal length of the book reflects the physical book’s height. (b) The vertical height of the book in the stack represents its page count. (c) The depth of the color blue of the spine indicates its StackScore; a deeper blue means that the work is more often used by the community.
  • When clicked, a work displays its Library of Congress Subject Headings (among other metadata). Clicking one of those headings creates a new stack consisting of all the library’s items that share that heading.
  • If there is a Wikipedia page about that work, Stacklife also displays the Wikipedia categories on that page, and lets the user explore by clicking on them.
  • Clicking on a work creates an information box that includes bibliographic information, real-time availability at the various libraries, and, when available: (a) the table of contents; (b) a link to Google Books’ online reader; (c) a link to the Wikipedia page about that book; (d) a link to any National Public Radio audio about the work; (e) a link to the book’s page at Amazon.
  • Every author gets a page that shows all of her works in the library in a virtual stack. The user can click to see any of those works on a shelf with works on the same topic by other authors.
  • Stacklife is scalable, presenting enormous collections of items in a familiar way, and enabling one-click browsing, faceting, and subject-based clustering.

Does StackLife sound like a library “…that [is] rooted in opposition to change rather than in users’ immediate and evolving information needs.”

I can’t speak for you but it doesn’t sound that way to me. It sounds like a library that isn’t imposing its definition of satisfaction upon users (good for Harvard) and that is working to blend the familiar with new to the benefit of its users.

We can only hope that College & Research Libraries will have a response from the StackLife project to Carr’s essay in the same issue.

PS: If you have library friends who don’t read this blog, please forward a link to this post to their attention. I know they are consumed with their current tasks but the StackLife project is one they need to be aware of. Thanks!

I first saw the essay on Facebook in a posting by Simon St.Laurent.

January 2, 2015

Early English Books Online – Good News and Bad News

Early English Books Online

The very good news is that 25,000 volumes from the Early English Books Online collection have been made available to the public!

From the webpage:

The EEBO corpus consists of the works represented in the English Short Title Catalogue I and II (based on the Pollard & Redgrave and Wing short title catalogs), as well as the Thomason Tracts and the Early English Books Tract Supplement. Together these trace the history of English thought from the first book printed in English in 1475 through to 1700. The content covers literature, philosophy, politics, religion, geography, science and all other areas of human endeavor. The assembled collection of more than 125,000 volumes is a mainstay for understanding the development of Western culture in general and the Anglo-American world in particular. The STC collections have perhaps been most widely used by scholars of English, linguistics, and history, but these resources also include core texts in religious studies, art, women’s studies, history of science, law, and music.

Even better news from Sebastian Rahtz Sebastian Rahtz (Chief Data Architect, IT Services, University of Oxford):

The University of Oxford is now making this collection, together with Gale Cengage’s Eighteenth Century Collections Online (ECCO), and Readex’s Evans Early American Imprints, available in various formats (TEI P5 XML, HTML and ePub) initially via the University of Oxford Text Archive at http://www.ota.ox.ac.uk/tcp/, and offering the source XML for community collaborative editing via Github. For the convenience of UK universities who subscribe to JISC Historic Books, a link to page images is also provided. We hope that the XML will serve as the base for enhancements and corrections.

This catalogue also lists EEBO Phase 2 texts, but the HTML and ePub versions of these can only be accessed by members of the University of Oxford.

[Technical note]
Those interested in working on the TEI P5 XML versions of the texts can check them out of Github, via https://github.com/textcreationpartnership/, where each of the texts is in its own repository (eg https://github.com/textcreationpartnership/A00021). There is a CSV file listing all the texts at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/TCP.csv, and a simple Linux/OSX shell script to clone all 32853 unrestricted repositories at https://raw.githubusercontent.com/textcreationpartnership/Texts/master/cloneall.sh

Now for the BAD NEWS:

An additional 45,000 books:

Currently, EEBO-TCP Phase II texts are available to authorized users at partner libraries. Once the project is done, the corpus will be available for sale exclusively through ProQuest for five years. Then, the texts will be released freely to the public.

Can you guess why the public is barred from what are obviously public domain texts?

Because our funding is limited, we aim to key as many different works as possible, in the language in which our staff has the most expertise.

Academic projects are supposed to fund themselves and be self-sustaining. When anyone asks about sustainability of an academic project, ask them when the last time your countries military was “self sustaining?” The U.S. has spent $2.6 trillion on a “war on terrorism” and has nothing to show for it other than dead and injured military personnel, perversion of budgetary policies, and loss of privacy on a world wide scale.

It is hard to imagine what sort of life-time access for everyone on Earth could be secured for less than $1 trillion. No more special pricing and contracts if you are in countries A to Zed. Eliminate all that paperwork for publishers and to access all you need is a connection to the Internet. The publishers would have a guaranteed income stream, less overhead from sales personnel, administrative staff, etc. And people would have access (whether used or not) to educate themselves, to make new discoveries, etc.

My proposal does not involve payments to large military contractors or subversion of legitimate governments or imposition of American values on other cultures. Leaving those drawbacks to one side, what do you think about it otherwise?

November 25, 2014

Ferguson Municipal Public Library

Filed under: Library — Patrick Durusau @ 7:35 pm

Ferguson Municipal Public Library

Ashley Ford tweeted that donations should be made to the Ferguson Municipal Public Library.

While schools are closed in Ferguson, the library has stayed open and has been a safe refuge.

Support the Ferguson Municipal Public Library as well as your own.

Libraries are where our tragedies, triumphs, and history live on for future generations.

Treasury Island: the film

Filed under: Archives,Government,Government Data,History,Indexing,Library,Searching — Patrick Durusau @ 5:52 pm

Treasury Island: the film by Lauren Willmott, Boyce Keay, and Beth Morrison.

From the post:

We are always looking to make the records we hold as accessible as possible, particularly those which you cannot search for by keyword in our catalogue, Discovery. And we are experimenting with new ways to do it.

The Treasury series, T1, is a great example of a series which holds a rich source of information but is complicated to search. T1 covers a wealth of subjects (from epidemics to horses) but people may overlook it as most of it is only described in Discovery as a range of numbers, meaning it can be difficult to search if you don’t know how to look. There are different processes for different periods dating back to 1557 so we chose to focus on records after 1852. Accessing these records requires various finding aids and multiple stages to access the papers. It’s a tricky process to explain in words so we thought we’d try demonstrating it.

We wanted to show people how to access these hidden treasures, by providing a visual aid that would work in conjunction with our written research guide. Armed with a tablet and a script, we got to work creating a video.

Our remit was:

  • to produce a video guide no more than four minutes long
  • to improve accessibility to these records through a simple, step-by–step process
  • to highlight what the finding aids and documents actually look like

These records can be useful to a whole range of researchers, from local historians to military historians to social historians, given that virtually every area of government action involved the Treasury at some stage. We hope this new video, which we intend to be watched in conjunction with the written research guide, will also be of use to any researchers who are new to the Treasury records.

Adding video guides to our written research guides are a new venture for us and so we are very keen to hear your feedback. Did you find it useful? Do you like the film format? Do you have any suggestions or improvements? Let us know by leaving a comment below!

This is a great illustration that data management isn’t something new. The Treasury Board has kept records since 1557 and has accumulated a rather extensive set of materials.

The written research guide looks interesting but since I am very unlikely to ever research Treasury Board records, I am unlikely to need it.

However, the authors have anticipated that someone might be interested in process of record keeping itself and so provided this additional reference:

Thomas L Heath, The Treasury (The Whitehall Series, 1927, GP Putnam’s Sons Ltd, London and New York)

That would be an interesting find!

I first saw this in a tweet by Andrew Janes.

October 28, 2014

On Excess: Susan Sontag’s Born-Digital Archive

Filed under: Archives,Library,Open Access,Preservation — Patrick Durusau @ 6:23 pm

On Excess: Susan Sontag’s Born-Digital Archive by Jeremy Schmidt & Jacquelyn Ardam.

From the post:


In the case of the Sontag materials, the end result of Deep Freeze and a series of other processing procedures is a single IBM laptop, which researchers can request at the Special Collections desk at UCLA’s Research Library. That laptop has some funky features. You can’t read its content from home, even with a VPN, because the files aren’t online. You can’t live-Tweet your research progress from the laptop — or access the internet at all — because the machine’s connectivity features have been disabled. You can’t copy Annie Leibovitz’s first-ever email — “Mat and I just wanted to let you know we really are working at this. See you at dinner. xxxxxannie” (subject line: “My first Email”) — onto your thumb drive because the USB port is locked. And, clearly, you can’t save a new document, even if your desire to type yourself into recent intellectual history is formidable. Every time it logs out or reboots, the laptop goes back to ground zero. The folders you’ve opened slam shut. The files you’ve explored don’t change their “Last Accessed” dates. The notes you’ve typed disappear. It’s like you were never there.

Despite these measures, real limitations to our ability to harness digital archives remain. The born-digital portion of the Sontag collection was donated as a pair of external hard drives, and that portion is composed of documents that began their lives electronically and in most cases exist only in digital form. While preparing those digital files for use, UCLA archivists accidentally allowed certain dates to refresh while the materials were in “thaw” mode; the metadata then had to be painstakingly un-revised. More problematically, a significant number of files open as unreadable strings of symbols because the software with which they were created is long out of date. Even the fully accessible materials, meanwhile, exist in so many versions that the hapless researcher not trained in computer forensics is quickly overwhelmed.

No one would dispute the need for an authoritative copy of Sontag‘s archive, or at least as close to authoritative as humanly possible. The heavily protected laptop makes sense to me, assuming that the archive considers that to be the authoritative copy.

What has me puzzled, particularly since there are binary formats not recognized in the archive, is why isn’t a non-authoritative copy of the archive online. Any number of people may still possess the software necessary to read the files and/or be able to decrypt the file formats. That would be a net gain to the archive if recovery could be practiced on a non-authoritative copy. They may well encounter such files in the future.

After searching the Online Archive of California, I did encounter Finding Aid for the Susan Sontag papers, ca. 1939-2004 which reports:

Restrictions Property rights to the physical object belong to the UCLA Library, Department of Special Collections. Literary rights, including copyright, are retained by the creators and their heirs. It is the responsibility of the researcher to determine who holds the copyright and pursue the copyright owner or his or her heir for permission to publish where The UC Regents do not hold the copyright.

Availability Open for research, with following exceptions: Boxes 136 and 137 of journals are restricted until 25 years after Susan Sontag’s death (December 28, 2029), though the journals may become available once they are published.

Unfortunately, this finding aid does not mention Sontag’s computer or the transfer of the files to a laptop. A search of Melvyl (library catalog) finds only one archival collection and that is the one mentioned above.

I have written to the special collections library for clarification and will update this post when an answer arrives.

I mention this collection because of Sontag’s importance for a generation and because digital archives will soon be the majority of cases. One hopes the standard practice will be to donate all rights to an archival repository to insure its availability to future generations of scholars.

October 24, 2014

analyze the public libraries survey (pls) with r

Filed under: Library,R — Patrick Durusau @ 9:55 am

analyze the public libraries survey (pls) with r by Anthony Damico.

From the post:

each and every year, the institute of museum and library services coaxes librarians around the country to put down their handheld “shhhh…” sign and fill out a detailed online questionnaire about their central library, branch, even bookmobile. the public libraries survey (pls) is actually a census: nearly every public library in the nation responds annually. that microdata is waiting for you to check it out, no membership required. the american library association estimates well over one hundred thousand libraries in the country, but less than twenty thousand outlets are within the sample universe of this survey since most libraries in the nation are enveloped by some sort of school system. a census of only the libraries that are open to the general public, the pls typically hits response rates of 98% from the 50 states and dc. check that out.

A great way to practice your R skills!

Not to mention generating analysis to support your local library.

September 21, 2014

Medical Heritage Library (MHL)

Filed under: Biomedical,History,Library,Medical Informatics — Patrick Durusau @ 10:48 am

Medical Heritage Library (MHL)

From the post:

The Medical Heritage Library (MHL) and DPLA are pleased to announce that MHL content can now be discovered through DPLA.

The MHL, a specialized research collection stored in the Internet Archive, currently includes nearly 60,000 digital rare books, serials, audio and video recordings, and ephemera in the history of medicine, public health, biomedical sciences, and popular medicine from the medical special collections of 22 academic, special, and public libraries. MHL materials have been selected through a rigorous process of curation by subject specialist librarians and archivists and through consultation with an advisory committee of scholars in the history of medicine, public health, gender studies, digital humanities, and related fields. Items, selected for their educational and research value, extend from 1235 (Liber Aristotil[is] de nat[u]r[a] a[nima]li[u]m ag[res]tium [et] marino[rum]), to 2014 (The Grog Issue 40 2014) with the bulk of the materials dating from the 19th century.

“The rich history of medicine content curated by the MHL is available for the first time alongside collections like those from the Biodiversity Heritage Library and the Smithsonian, and offers users a single access point to hundreds of thousands of scientific and history of science resources,” said DPLA Assistant Director for Content Amy Rudersdorf.

The collection is particularly deep in American and Western European medical publications in English, although more than a dozen languages are represented. Subjects include anatomy, dental medicine, surgery, public health, infectious diseases, forensics and legal medicine, gynecology, psychology, anatomy, therapeutics, obstetrics, neuroscience, alternative medicine, spirituality and demonology, diet and dress reform, tobacco, and homeopathy. The breadth of the collection is illustrated by these popular items: the United States Naval Bureau of Medical History’s audio oral history with Doctor Walter Burwell (1994) who served in the Pacific theatre during World War II and witnessed the first Japanese kamikaze attacks; History and medical description of the two-headed girl : sold by her agents for her special benefit, at 25 cents (1869), the first edition of Gray’s Anatomy (1858) (the single most-downloaded MHL text at more than 2,000 downloads annually), and a video collection of Hanna – Barbera Production Flintstones (1960) commercials for Winston cigarettes.

“As is clear from today’s headlines, science, health, and medicine have an impact on the daily lives of Americans,” said Scott H. Podolsky, chair of the MHL’s Scholarly Advisory Committee. “Vaccination, epidemics, antibiotics, and access to health care are only a few of the ongoing issues the history of which are well documented in the MHL. Partnering with the DPLA offers us unparalleled opportunities to reach new and underserved audiences, including scholars and students who don’t have access to special collections in their home institutions and the broader interested public.“

Quick links:

Digital Public Library of America

Internet Archive

Medical Heritage Library website

I remember the Flintstone commercials for Winston cigarettes. Not all that effective a campaign, I smoked Marboros (reds in a box) for almost forty-five (45) years. 😉

As old vices die out, new ones, like texting and driving take their place. On behalf of current and former smokers, I am confident that smoking was not a factor in 1,600,000 accidents per year and 11 teen deaths every day.

September 19, 2014

Libraries may digitize books without permission, EU top court rules [Nation-wide Site Licenses?]

Filed under: Intellectual Property (IP),Library — Patrick Durusau @ 10:41 am

Libraries may digitize books without permission, EU top court rules by Loek Essers.

From the post:

European libraries may digitize books and make them available at electronic reading points without first gaining consent of the copyright holder, the highest European Union court ruled Thursday.

The Court of Justice of the European Union (CJEU) ruled in a case in which the Technical University of Darmstadt digitized a book published by German publishing house Eugen Ulmer in order to make it available at its electronic reading posts, but refused to license the publisher’s electronic textbooks.

A spot of good news to remember next on the next 9/11 anniversary. A Member State may authorise libraries to digitise, without the consent of the rightholders, books they hold in their collection so as to make them available at electronic reading points

Users can’t make copies onto a USB stick but under contemporary fictions about property rights represented in copyright statutes that isn’t surprising.

What is surprising is that nations have not yet stumbled upon the idea of nation-wide site licenses for digital materials.

A nation acquiring a site license the ACM Digital Library, IEEE, Springer and a dozen or so other resources/collections would have these positive impacts:

  1. Access to core computer science publications for everyone located in that nation
  2. Publishers would have one payor and could reduce/eliminate the staff that manage digital access subscriptions
  3. Universities and colleges would not require subscriptions nor the staff to manage those subscriptions (integration of those materials into collections would remain a library task)
  4. Simplify access software based on geographic IP location (fewer user/password issues)
  5. Universities and colleges could spend funds now dedicated to subscriptions for other materials
  6. Digitization of both periodical and monograph literature would be encouraged
  7. Avoids tiresome and not-likely-to-succeed arguments about balancing the public interest in IP rights discussions.

For me, #7 is the most important advantage of nation-wide licensing of digital materials. As you can tell by my reference to “contemporary fictions about property rights” I fall quite firmly on a particular side of the digital rights debate. However, I am more interested in gaining access to published materials for everyone than trying to convince others of the correctness of my position. Therefore, let’s adopt a new strategy: “Pay the man.”

As I outline above, there are obvious financial advantages to publishers from nation-wide site licenses, in the form of reduced internal costs, reduced infrastructure costs and a greater certainty in cash flow. There are advantages for the public as well as universities and colleges, so I would call that a win-win solution.

The Developing World Initiatives by Francis & Taylor is described as:

Taylor & Francis Group is committed to the widest distribution of its journals to non-profit institutions in developing countries. Through agreements with worldwide organisations, academics and researchers in more than 110 countries can access vital scholarly material, at greatly reduced or no cost.

Why limit access to materials to “non-profit institutions in developing countries?” Granting that the site-license fees for the United States would be higher than Liberia but the underlying principle is the same. The less you regulate access the simpler the delivery model and the higher the profit to the publisher. What publisher would object to that?

There are armies of clerks currently invested in the maintenance of one-off subscription models but the greater public interest in access to materials consistent with publisher IP rights should carry the day.

If Tim O’Reilly and friends are serious about changing access models to information, let’s use nation-wide site licenses to eliminate firewalls and make effective linking and transclusion a present day reality.

Publishers get paid, readers get access. It’s really that simple. Just on a larger scale than is usually discussed.

PS: Before anyone raises the issues of cost for national-wide site licenses, remember that the United States has spent more than $1 trillion in a “war” on terrorism that has made no progress in making the United States or its citizens more secure.

If the United Stated decided to pay Spinger Science+Business Media the €866m ($1113.31m) total revenue it made in 2012, for the cost of its ‘war” on terrorism, it could have purchased a site license to all Spinger Science+Business Media content for the entire United States for 898.47 years. (Check my math: 1,000,000,000,000 / 1,113,000,000 = 898.472.)

I first saw this in Nat Torkington’s Four short links: 15 September 2014.

September 16, 2014

Shanghai Library adds 2 million records to WorldCat…

Filed under: Chinese,Library,WorldCat — Patrick Durusau @ 9:58 am

Shanghai Library adds 2 million records to WorldCat to share its collection with the world Compiled by Ming POON, Josephine SCHE, and Mi Chu WIENS (November, 2004).

From the post:

Shanghai Library, the largest public library in China and one of the largest libraries in the world, has contributed 2 million holdings to WorldCat, including some 770,000 unique bibliographic records, to share its collection worldwide.

These records, which represent books and journals published between 1911 and 2013, were loaded in WorldCat earlier this year. The contribution from Shanghai Library, an OCLC member since 1996, enhances the richness and depth of Chinese materials in WorldCat as well as the discoverability of these collections around the world.

“We are pleased to add Shanghai Library’s holdings to WorldCat, which is the global union catalog of library collections,” said Dr. Jianzhong Wu, Director, Shanghai Library “Shanghai is a renowned, global city, and the library should be as well. With WorldCat, we not only raise the visibility of our collection to a global level but we also share our national heritage and identity with other libraries and their users through the OCLC WorldShare Interlibrary Loan service.”

“The leadership of Shanghai Library has a bold global vision,” says Andrew H. Wang, Vice President, OCLC Asia Pacific. “The addition of Shanghai Library’s holdings and unique records enriches coverage of the Chinese collection in WorldCat for researchers everywhere.”

I don’t have a feel for how many unique Chinese bibliographic records are online but 770,000 sounds like a healthy addition.

You may also be interested in: Online Resources for Chinese Studies in North American Libraries.

Given the compilation date, 2004, I ran the W3C Link Checker on http://www.loc.gov/rr/asian/china-bib/.

You can review the results at: http://www.durusau.net/publications/W3CLinkChecker:http:_www.loc.gov_rr_asian_china-bib_.html

Summary of results:

Code Occurrences What to do
(N/A) 6 The link was not checked due to robots exclusion rules. Check the link manually, and see also the link checker documentation on robots exclusion.
(N/A) 2 The hostname could not be resolved. Check the link for typos.
403 1 The link is forbidden! This needs fixing. Usual suspects: a missing index.html or Overview.html, or a missing ACL.
404 61 The link is broken. Double-check that you have not made any typo, or mistake in copy-pasting. If the link points to a resource that no longer exists, you may want to remove or fix the link.
500 5 This is a server side problem. Check the URI.

(emphasis added)

At a minimum, the broken links need to be corrected but updating the listing to include new resources would make a nice graduate student project.

I don’t have the background or language skills with Chinese resources to embark on such a project but would be happy to assist anyone who undertakes the task.

July 29, 2014

Cooper Hewitt, Color Interface

Filed under: Indexing,Library,Museums — Patrick Durusau @ 3:59 pm

From the about page:

Cooper Hewitt, Smithsonian Design Museum is the only museum in the nation devoted exclusively to historic and contemporary design. The Museum presents compelling perspectives on the impact of design on daily life through active educational and curatorial programming.

It is the mission of Cooper Hewitt’s staff and Board of Trustees to advance the public understanding of design across the thirty centuries of human creativity represented by the Museum’s collection. The Museum was founded in 1897 by Amy, Eleanor, and Sarah Hewitt—granddaughters of industrialist Peter Cooper—as part of The Cooper Union for the Advancement of Science and Art. A branch of the Smithsonian since 1967, Cooper-Hewitt is housed in the landmark Andrew Carnegie Mansion on Fifth Avenue in New York City.

I thought some background might be helpful because the Cooper Hewitt has a new interface:

COLORS

Color, or colour, is one of the attributes we’re interested in exploring for collection browsing. Bearing in mind that only a fraction of our collection currently has images, here’s a first pass.

Objects with images now have up to five representative colors attached to them. The colors have been selected by our robotic eye machines who scour each image in small chunks to create color averages. These have then been harvested and “snapped” to the grid of 120 different colors — derived from the CSS3 palette and naming conventions — below to make navigation a little easier.

My initial reaction was to recall the old library joke where a patron comes to the circulation desk and doesn’t know a book’s title or author, but does remember it had a blue cover. 😉 At which point you wish Basil from Faulty Towers was manning the circulation desk. 😉

It may be a good idea with physical artifacts because color/colour is a fixed attribute that may be associated with a particular artifact.

If you know the collection, you can amuse yourself by trying to guess what objects will be returned for particular colors.

BTW, the collection is interlinked by people, roles, periods, types, countries. Very impressive!

Don’t miss the resources for developers at: https://collection.cooperhewitt.org/developers/ and their GitHub account.

I first saw this in a tweet by Lyn Marie B.

PS: The use of people, roles, objects, etc. for browsing has a topic map-like feel. Since their data and other resources are downloadable, more investigation will follow.

July 16, 2014

Darwin’s ship library goes online

Filed under: Library — Patrick Durusau @ 3:48 pm

Darwin’s ship library goes online by Dennis Normile.

From the post:

As Charles Darwin cruised the world on the HMS Beagle, he had access to an unusually well-stocked 400-volume library. That collection, which contained the observations of numerous other naturalists and explorers, has now been recreated online. As of today, all of more than 195,000 pages and 5000 illustrations from the works are available for the perusal of scholars and armchair naturalists alike, thanks to the Darwin Online project.

Perhaps it isn’t the amount of information you have available but how deeply you understand it that makes a difference.

Yes?

May 23, 2014

Early Canadiana Online

Filed under: Data,Language,Library — Patrick Durusau @ 6:50 pm

Early Canadiana Online

From the webpage:

These collections contain over 80,000 rare books, magazines and government publications from the 1600s to the 1940s.

This rare collection of documentary heritage will be of interest to scholars, genealogists, history buffs and anyone who enjoys reading about Canada’s early days.

The Early Canadiana Online collection of rare books, magazines and government publications has over 80,000 titles (3,500,000 pages) and is growing. The collection includes material published from the time of the first European settlers to the first four decades of the 20th Century.

You will find books written in 21 languages including French, English, 10 First Nations languages and several European languages, Latin and Greek.

Every online collection such as this one, increases the volume of information that is accessible and also increases the difficulty of finding related information for any given subject. But the latter is such a nice problem to have!

I first saw this in a tweet from Lincoln Mullen.

May 16, 2014

APIs for Scholarly Resources

Filed under: Data,Library — Patrick Durusau @ 7:58 pm

APIs for Scholarly Resources

From the webpage:

APIs, short for application programming interface, are tools used to share content and data between software applications. APIs are used in a variety of contexts, but some examples include embedding content from one website into another, dynamically posting content from one application to display in another application, or extracting data from a database in a more programmatic way than a regular user interface might allow.

Many scholarly publishers, databases, and products offer APIs to allow users with programming skills to more powerfully extract data to serve a variety of research purposes. With an API, users might create programmatic searches of a citation database, extract statistical data, or dynamically query and post blog content.

Below is a list of commonly used scholarly resources at MIT that make their APIs available for use. If you have programming skills and would like to use APIs in your research, use the table below to get an overview of some available APIs.

If you have any questions or know of an API you would like to see include in this list, please contact Mark Clemente, Library Fellow for Scholarly Publishing and Licensing in the MIT Libraries (contact information at the bottom of this page).

A nice listing of scholarly resources with public APIs and your opportunity to contribute back to this listing with APIs that you discover.

Sadly, as far as I know (subject to your corrections), the ACM Digital Library has no public API.

Not all that surprising considering considering the other shortcomings of the ACM Digital Library interface. For example, you can only save items (their citations) to a binder one item at a time. Customer service will opine they have had this request before but no, you can’t contact the committee that makes decisions about Digital Library features. Nor will they tell you who is on that committee. Sounds like the current Whitehouse doesn’t it?

I first saw this in a tweet by Scott Chamberlain.

March 25, 2014

Codex Sinaiticus Added to Digitised Manuscripts

Filed under: Bible,British Library,Library,Manuscripts — Patrick Durusau @ 2:45 pm

Codex Sinaiticus Added to Digitised Manuscripts by Julian Harrison.

From the post (I have omitted the images, see the original post for those):

Codex Sinaiticus is one of the great treasures of the British Library. Written in the mid-4th century in the Eastern Mediterranean (possibly at Caesarea), it is one of the two oldest surviving copies of the Greek Bible, along with Codex Vaticanus, in Rome. Written in four narrow columns to the page (aside from in the Poetic books, in two columns), its visual appearance is particularly striking.

The significance of Codex Sinaiticus for the text of the New Testament is incalculable, not least because of the many thousands of corrections made to the manuscript between the 4th and 12th centuries.

The manuscript itself is now distributed between four institutions: the British Library, the Universitäts-Bibliothek at Leipzig, the National Library of Russia in St Petersburg, and the Monastery of St Catherine at Mt Sinai. Several years ago, these four institutions came together to collaborate on the Codex Sinaiticus Project, which resulted in full digital coverage and transcription of all extant parts of the manuscript. The fruits of these labours, along with many additional essays and scholarly resources, can be found on the Codex Sinaiticus website.

The British Library owns the vast majority of Codex Sinaiticus and only the British Library portion is being released as part of the Digitised Manuscripts project.

The world in which biblical scholarship is done has changed radically over the last 20 years.

This effort by the British Library should be applauded and supported.

March 24, 2014

Google Search Appliance and Libraries

Using Google Search Appliance (GSA) to Search Digital Library Collections: A Case Study of the INIS Collection Search by Dobrica Savic.

From the post:

In February 2014, I gave a presentation at the conference on Faster, Smarter and Richer: Reshaping the library catalogue (FSR 2014), which was organized by the Associazione Italiana Biblioteche (AIB) and Biblioteca Apostolica Vaticana in Rome, Italy. My presentation focused on the experience of the International Nuclear Information System (INIS) in using Google Search Appliance (GSA) to search digital library collections at the International Atomic Energy Agency (IAEA). 

Libraries are facing many challenges today. In addition to diminished funding and increased user expectations, the use of classic library catalogues is becoming an additional challenge. Library users require fast and easy access to information resources, regardless of whether the format is paper or electronic. Google Search, with its speed and simplicity, has established a new standard for information retrieval which did not exist with previous generations of library search facilities. Put in a position of David versus Goliath, many small, and even larger libraries, are losing the battle to Google, letting many of its users utilize it rather than library catalogues.

The International Nuclear Information System (INIS)

The International Nuclear Information System (INIS) hosts one of the world's largest collections of published information on the peaceful uses of nuclear science and technology. It offers on-line access to a unique collection of 3.6 million bibliographic records and 483,000 full texts of non-conventional (grey) literature. This large digital library collection suffered from most of the well-known shortcomings of the classic library catalogue. Searching was complex and complicated, it required training in Boolean logic, full-text searching was not an option, and response time was slow. An opportune moment to improve the system came with the retirement of the previous catalogue software and the adoption of Google Search Appliance (GSA) as an organization-wide search engine standard.
….

To be completely honest, my first reaction wasn’t a favorable one.

But even the complete blog post does not do justice to the project in question.

Take a look at the slides, which include screen shots of the new interface before reaching an opinion.

Take this as a lesson on what your search interface should be offering by default.

There are always other screens you can fill with advanced features.

March 11, 2014

Cataloguing projects

Filed under: Archives,Cataloging,Law - Sources,Legal Informatics,Library — Patrick Durusau @ 8:27 pm

Cataloguing projects (UK National Archive)

From the webpage:

The National Archives’ Cataloguing Strategy

The overall objective of our cataloguing work is to deliver more comprehensive and searchable catalogues, thus improving access to public records. To make online searches work well we need to provide adequate data and prioritise cataloguing work that tackles less adequate descriptions. For example, we regard ranges of abbreviated names or file numbers as inadequate.

I was lead to this delightful resource by a tweet from David Underdown, advising that his presentation from National Catalogue Day in 2013 was now onlne.

His presentation along with several others and reports about projects in prior years are available at this projects page.

I thought the presentation titled: Opening up of Litigation: 1385-1875 by Amanda Bevan and David Foster, was quite interesting in light of various projects that want to create new “public” citation systems for law and litigation.

I haven’t seen such a proposal yet that gives sufficient consideration to the enormity of what do you do with old legal materials?

The litigation presentation could be a poster child for topic maps.

I am looking forward to reading the other presentations as well.

February 3, 2014

UX Crash Course: 31 Fundamentals

Filed under: Interface Research/Design,Library,Library software,Usability,UX — Patrick Durusau @ 10:10 am

UX Crash Course: 31 Fundamentals by Joel Marsh.

From the post:

Basic UX Principles: How to get started

The following list isn’t everything you can learn in UX. It’s a quick overview, so you can go from zero-to-hero as quickly as possible. You will get a practical taste of all the big parts of UX, and a sense of where you need to learn more. The order of the lessons follows a real-life UX process (more or less) so you can apply these ideas as-you-go. Each lesson also stands alone, so feel free to bookmark them as a reference!

Main topics:

Introduction & Key Ideas

How to Understand Users

Information Architecture

Visual Design Principles

Functional Layout Design

User Psychology

Designing with Data

Users who interact with designers, librarians and library students come to mind, would do well to review these posts. If nothing else, it will give users better questions to ask vendors about their web interface design process.

« Newer PostsOlder Posts »

Powered by WordPress