The Dataverse Network Project

Tuesday, April 30th, 2013

The Dataverse Network Project sponsored by the Institute for Quantitative Social Science, Harvard University.

Described on its homepage:

A repository for research data that takes care of long term preservation and good archival practices, while researchers can share, keep control of and get recognition for their data.

Dataverses currently in operation:

One shortfall I hope is corrected quickly is the lack of searching across instances of the Dataverse software.

For example, if I go to UC Davis and choose the Center for Poverty Research dataverse, I can find: “The Research Supplemental Poverty Measure Public Use Research Files” by Kathleen Short (a study).

But, if I search at the Harvard Dataverse Advanced Search by “Kathleen Short,” or “The Research Supplemental Poverty Measure Public Use Research Files,” I get no results.

An isolated dataverse is more of a data island than a dataverse.

We have lots of experience with data islands. It’s time for something different.

PS: Semantic integration issues need to be addressed as well.

Harvard Dataverse Network

Tuesday, April 30th, 2013

Harvard Dataverse Network

From the webpage:

The Harvard Dataverse Network is open to all scientific data from all disciplines worldwide. It includes the world’s largest collection of social science research data. If you would like to upload your research data, first create a dataverse and then create a study. If you already have a dataverse, log in to add new studies.

Sharing of data that underlies published research.

Dataverses (520 of those) contain studies (52,289) which contain files (722,615).

For example, following the link for the Tom Clark dataverse, provides a listing of five (5) studies, ordered by their global ids.

Following the link to the Locating Supreme Court Opinions in Doctrine Space study, defaults to detailed cataloging information for the study.

The interface is under active development.

One feature that I hope is added soon is the ability to browse dataverses by author and self-assigned subjects.

Searching works, but is more reliable if you know the correct search terms to use.

I didn’t see any plans to deal with semantic ambiguity/diversity.