Archive for the ‘Preservation’ Category

Create and Manage Data: Training Resources

Thursday, May 2nd, 2013

Create and Manage Data: Training Resources

From the webpage:

Our Managing and Sharing Data: Training Resources present a suite of flexible training materials for people who are charged with training researchers and research support staff in how to look after research data.

The Training Resources complement the UK Data Archive’s popular guide on ‘Managing and Sharing Data: best practice for researchers’, the most recent version published in May 2011.

They  have been designed and used as part of the Archive’s daily work in supporting ESRC applicants and award holders and have been made possible by a grant from the ESRC Researcher Development Initiative (RDI).

The Training Resources are modularised following the UK Data Archive’s seven key areas of managing and sharing data:

  • sharing data – why and how
  • data management planning for researchers and research centres
  • documenting data
  • formatting data
  • storing data, including data security, data transfer, encryption, and file sharing
  • ethics and consent
  • data copyright

Each section contains:

  • introductory powerpoint(s)
  • presenter’s guide – where necessary
  • exercises and introduction to exercises
  • quizzes
  • answers

The materials are presented as used in our own training courses  and are mostly geared towards social scientists. We anticipate trainers will create their own personalised and more context-relevant example, for example by discipline, country, relevant laws and regulations.

You can download individual modules from the relevant sections or download the whole resource in pdf format. Updates to pages were last made on 20 June 2012.

Download all resources.

Quite an impressive set of materials that will introduce you to some aspects of research data in the UK. Not all but some aspects.

What you don’t learn here you will pickup from interaction with people actively engaged with research data.

But it will give you a head start on understanding the research data community.

Unlike some technologies, topic maps are more about a community’s world view than the world view of topic maps.

Ultimate library challenge: taming the internet

Saturday, April 6th, 2013

Ultimate library challenge: taming the internet by Jill Lawless.

From the post:

Capturing the unruly, ever-changing internet is like trying to pin down a raging river. But the British Library is going to try.

For centuries, the library has kept a copy of every book, pamphlet, magazine and newspaper published in Britain. Starting on Saturday, it will also be bound to record every British website, e-book, online newsletter and blog in a bid to preserve the nation’s ”digital memory”.

As if that’s not a big enough task, the library also has to make this digital archive available to future researchers – come time, tide or technological change.

The library says the work is urgent. Ever since people began switching from paper and ink to computers and mobile phones, material that would fascinate future historians has been disappearing into a digital black hole. The library says firsthand accounts of everything from the 2005 London transit bombings to Britain’s 2010 election campaign have already vanished.

”Stuff out there on the web is ephemeral,” said Lucie Burgess the library’s head of content strategy. ”The average life of a web page is only 75 days, because websites change, the contents get taken down.

”If we don’t capture this material, a critical piece of the jigsaw puzzle of our understanding of the 21st century will be lost.”

For more details, see Jill’s post or, Click to save the nations digital memory (British Library press release), or 100 websites: Capturing the digital universe (sample of results of archiving with only 100 sites).

The content gathered by the project will be made available to the public.

A welcome venture, particularly since the results will be made available to the public.

An unanswerable question but I do wonder how we would view Greek drama if all of it had been preserved?

Hundreds if not thousands of plays were written and performed every year.

The Complete Greek Drama lists only forty-seven (47) that have survived to this day.

If whole scale preservation is the first step, how do we preserve paths to what’s worth reading in a data labyrinth as a second step?

I first saw this in a tweet by Jason Ronallo.

Making It Happen:…

Friday, March 22nd, 2013

Making It Happen: Sustainable Data Preservation and Use by Anita de Waard.

Great set of overview slides on why research data should be preserved.

Not to mention making the case that semantic diversity, in systems for capturing research data, between researchers, etc., needs to be addressed by any proffered solution.

If you don’t know Anita de Waard’s work, search for “Anita de Waard” on Slideshare.

As of today, I am getting one hundred and forty (140) presentations.

All of which you will find useful on a variety of data related topics.

….Comparing Digital Preservation Glossaries [Why Do We Need Common Vocabularies?]

Friday, August 10th, 2012

From AIP to Zettabyte: Comparing Digital Preservation Glossaries

Emily Reynolds (2012 Junior Fellow) writes:

As we mentioned in our introductory post last month, the OSI Junior Fellows are working on a project involving a draft digital preservation policy framework. One component of our work is revising a glossary that accompanies the framework. We’ve spent the last two weeks poring through more than two dozen glossaries relating to digital preservation concepts to locate and refine definitions to fit the terms used in the document.

We looked at dictionaries from well-established archival entities like the Society of American Archivists, as well as more strictly technical organizations like the Internet Engineering Task Force. While some glossaries take a traditional archival approach, others were more technical; we consulted documents primarily focusing on electronic records, archives, digital storage and other relevant fields. Because of influential frameworks like the OAIS Reference Model, some terms were defined similarly across the glossaries that we looked at. But the variety in the definitions for other terms points to the range of practitioners discussing digital preservation issues, and highlights the need for a common vocabulary. Based on what we found, that vocabulary will have to be broadly drawn and flexible to meet different kinds of requirements.

OSI = Office of Strategic Initiatives (Library of Congress)

Not to be overly critical, but I stumble over:

Because of influential frameworks like the OAIS Reference Model, some terms were defined similarly across the glossaries that we looked at. But the variety in the definitions for other terms points to the range of practitioners discussing digital preservation issues, and highlights the need for a common vocabulary.

Why does a “variety in the definitions for other terms…highlight[s] the need for a common vocabulary?”

I take it as a given that we have diverse vocabularies.

And that attempts at “common” vocabularies succeed in creating yet another “diverse” vocabulary.

So, why would anyone looking at “diverse” vocabularies jump to the conclusion that a “common” vocabulary is required?

Perhaps what is missing is the definition of the problem presented by “diverse” vocabularies.

Hard to solve a problem if you don’t know it is. (Hasn’t stopped some people that I know but that is a story for another day.)

I put it to you (and in your absence I will answer, so answer quickly):

What is the problem (or problems) presented by diverse vocabularies? (Feel free to use examples.)

Or if you prefer, Why do we need common vocabularies?

Digging into Data Challenge

Thursday, January 5th, 2012

Digging into Data Challenge

From the homepage:

What is the “challenge” we speak of? The idea behind the Digging into Data Challenge is to address how “big data” changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials used by scholars in the humanities and social sciences — ranging from digitized books, newspapers, and music to transactional data like web searches, sensor data or cell phone records — what new, computationally-based research methods might we apply? As the world becomes increasingly digital, new techniques will be needed to search, analyze, and understand these everyday materials. Digging into Data challenges the research community to help create the new research infrastructure for 21st century scholarship.

Winners for Round 2, some 14 projects out of 67, were announced on 3 January 2012.

Interested to hear your comments on the projects as I am sure the projects would as well.

Lots of Copies – Keeps Stuff Safe / Is Insecure – LOCKSS / LOCII

Saturday, January 1st, 2011

Is your enterprise accidentally practicing Lots of Copies Keeps Stuff Safe – LOCKSS?

Gartner analyst Drue Reeves says:

Use document management to make sure you don’t have copies everywhere, and purge nonrelevant material.

If you fall into the lots of copies category your slogan should be: Lots of Copies Is Insecure or LOCII (pronounced “lossee”).

Not all document preservations solutions depend upon being insecure.

Topic maps can help develop strategies to make your document management solution less LOCII.

One way they can help is by mapping out all the duplicate copies. Are they really necessary?

Another way they can help is by showing who has access to each of those copies.

If you trust someone with access, that means you trust everyone they trust.

Check their Facebook or Linkedin pages to see how many other people you are trusting, just by trusting the first person.

Ask yourself: How bad would a Wikileaks like disclosure be?

Then get serious about information security and topic maps.

CASPAR (Cultural, Artistic, and Scientific Knowledge for Preservation, Access and Retrieval)

Saturday, October 23rd, 2010

CASPAR (Cultural, Artistic, and Scientific Knowledge for Preservation, Access and Retrieval).

From the website:

CASPAR methodological and technological solution:

  • is compliant to the OAIS Reference Model – the main standard of reference in digital preservation
  • is technology-neutral: the preservation environment could be implemented using any kind of emerging technology
  • adopts a distributed, asynchronous, loosely coupled architecture and each key component is self-contained and portable: it may be deployed without dependencies on different platform and framework
  • is domain independent: it could be applied with low additional effort to multiple domains/contexts.
  • preserves knowledge and intelligibility, not just the “bits”
  • guarantees the integrity and identity of the information preserved as well as the protection of digital rights

FYI: OAIS Reference Model

As a librarian, you will be confronted with claims similar to these in vendor literature, grant applications and other marketing materials.

Questions:

  1. Pick one of these claims. What documentation/software produced by the project would you review to evaluate the claim you have chosen?
  2. What other materials do you think would be relevant to your review?
  3. Perform the actual review (10 – 15 pages, with citations, project)