Dublin Core « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 18, 2016

Data Poverty At Youtube

Filed under: Conferences,Data,Dublin Core,Metadata — Patrick Durusau @ 3:26 pm

I was writing Clojure/west 2016 – Videos! [+ Unix Sort Trick] when the itch to use Youtube APIs to facilitate extraction and re-use of conference videos struck yet again!

It lasted long enough this time for me to discover the data poverty at Youtube, even using their APIs.

Here’s what little relevant information Youtube captures for a video resource for my purposes:

{
  "kind": "youtube#video",
  "etag": etag,
  "id": string,
  "snippet": {
    "publishedAt": datetime,
    "channelId": string,
    "title": string,
    "description": string,
    "thumbnails": {
      (key): {
        "url": string,
        "width": unsigned integer,
        "height": unsigned integer
      }
    },
    "channelTitle": string,
    "tags": [
      string
    ],
    "categoryId": string,
    "liveBroadcastContent": string,
    "defaultLanguage": string,
    "localized": {
      "title": string,
    "description": string
    },
    "defaultAudioLanguage": string
  },
...

"topicDetails": {
    "topicIds": [
      string
    ],
    "relevantTopicIds": [
      string
    ]
  },
...

Hmmmm, do you see author, date, location, followed by any number of other bits of data that even minimal retrieval would warrant?

The response that all of those fall under “description,” is true, but leaves users with a prone to fail on search information resource.

The really sad part of this tale is that Youtube has built up such a large legacy of data impoverished video, that any curation will be automated and only spot-checked.

Rather than dig this dark-data hole any deeper, YouTube should add additional metadata by some fixed date.

Let’s not gin up new metadata categories/values but call upon librarians to suggest existing metadata standards, such as Dublin Core or others.

Librarians have labored at this task for centuries and Youtube is a good example as a result of their absence. Usable, but only just, and that only with the aid of powerful digital computers.

Let’s stop spreading data darkness in Youtube and make its data reusable.

Comments Off

December 9, 2013

ROpenSci

Filed under: Data,Dublin Core,R,Science — Patrick Durusau @ 1:00 pm

ROpenSci

From the webpage:

At rOpenSci we are creating packages that allow access to data repositories through the R statistical programming environment that is already a familiar part of the workflow of many scientists. We hope that our tools will not only facilitate drawing data into an environment where it can readily be manipulated, but also one in which those analyses and methods can be easily shared, replicated, and extended by other researchers. While all the pieces for connecting researchers with these data sources exist as disparate entities, our efforts will provide a unified framework that will be quickly connect researchers to open data.

More than twenty (20) R packages are available today!

Great for data mining your favorite science data repository, but that isn’t the only reason I mention them.

One of the issues for topic maps has always been how to produce the grist for a topic map mill. There is a lot of data and production isn’t a thrilling task. 😉

But what if we could automate that production, at least to a degree?

The search functions in Treebase offer several examples of auto-generation of semantics would benefit both the data set and potential users.

In Treebase: An R package for discovery, access and manipulation of online phylogenies Carl Boettiger and Duncan Temple Lang point out that Treebase has search functions for “author,” and “subject.”

Err, but Dublin Core 1.1 refers to authors as “creators.” And “subject,” for Treebase means: “Matches in the subject terms.”

The ACM would say “keywords,” as would many others, instead of “subject.”

Not a great semantic speed bump* but one that if left unnoticed, will result in poorer, not richer search results.

What if for an R package like Treebase, a user could request what is identified by a field?

That is in addition to the fields being returned, one or more key/value pairs are returned for each field, which define what is identified by that field.

For example, for “author” an --iden switch could return:

Author Semantics
Creator http://purl.org/dc/elements/1.1/creator
Author/Creator http://catalog.loc.gov/help/author-keyword.htm

and so on, perhaps even including identifiers in other languages.

While this step only addresses identifying what a field identifies, it would be a first step towards documenting identifiers that could be used over and over again to improve access to scientific data.

Future changes and we know there will be future changes, are accommodated by simply appending to the currently documented identifiers.

Document identifier mappings once, Reuse identifier mappings many times.

PS: The mapping I suggest above is a blind mapping, there is no information is given about “why” I thought the alternatives given were alternatives to the main entry “author.”

Blind mappings are sufficient for many cases but are terribly insufficient for others. Biological taxonomies, for example, do change and capturing what characteristics underlie a particular mapping may be important in terms of looking forwards or backwards from some point in time in the development of a taxonomy.

* I note for your amusement that Wikipedia offers “vertical deflection traffic calming devices,” as a class that includes “speed bump, speed hump, speed cushion, and speed table.”

Like many Library of Congress subject headings, “vertical deflection traffic calming devices” doesn’t really jump to mind when doing a search for “speed bump.” 😉

Comments Off

March 18, 2013

Dublin Core Mapping Comments [by 7 April 2013]

Filed under: Dublin Core,Provenance — Patrick Durusau @ 4:21 am

Stuart Sutton, Managing Director, DCMI, calls on the Dublin Core community to comment on a mapping from Dublin Core terms to the PROV provenance ontology.

His call reads:

The DCMI Metadata Provenance Task Group [1] is collaborating with the W3C Provenance Working Group [2] on a mapping from Dublin Core terms to the PROV provenance ontology [3], currently a W3C Proposed Recommendation. More precisely, the document describes a partial mapping from DCMI Metadata Terms [4] to the PROV-O OWL2 ontology [5] — a set of classes and properties usable for representing and interchanging information about provenance. Numerous terms in the DCMI vocabulary provide information about the provenance of a resource. Translating these terms into PROV relates this information explicitly to the W3C provenance model.

The mapping is currently a W3C Working Draft. The final state of the document will be that of a W3C Note, to be published as part of a suite of documents in support of a W3C Recommendation for provenance interchange [6].

DCMI would like to point to the W3C Note as a DCMI Recommended Resource and therefore encourages the Dublin Core community to provide feedback and take part in the finalization of the mapping.

The deadline for all comments is 7 April 2013. We recommend that comments be provided directly to the public W3C list for comments: public-prov-comments@w3.org [7], ideally with a Cc: to DCMI’s dc-provenance list [8]. Comments sent only to the dc-provenance list will be summarized on the W3C list and addressed, and discussions on the W3C list will be summarized back on the dc-provenance list when appropriate.

Stuart Sutton, Managing Director, DCMI

[1] http://dublincore.org/groups/provenance/
[2] http://www.w3.org/2011/prov/wiki/Main_Page
[3] http://www.w3.org/TR/2013/WD-prov-dc-20130312/
[4] http://dublincore.org/documents/dcmi-terms/
[5] http://www.w3.org/TR/prov-o/
[6] http://www.w3.org/TR/prov-overview/
[7] http://lists.w3.org/Archives/Public/public-prov-comments/
[8] https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=dc-provenance

Comments Off

December 1, 2011

DC-2012 Metadata for Meeting Global Challenges

Filed under: Conferences,Dublin Core,Metadata,RDF,Semantic Web — Patrick Durusau @ 7:39 pm

DC-2012 Metadata for Meeting Global Challenges 3-7 September 2012, Kuching, Sarawak, Malaysia

DEADLINES & IMPORTANT DATES:
Submission Deadline: 23 March 2012
Author Notification: 25 May 2012
Final Copy: 29 June 2012

From the call for papers:

DC-2012 will explore the global, national and regional roles of metadata in addressing global challenges such as food security, the digital divide, and sustainable development. Metadata plays a significant role globally in information systems shaping how we know, monitor and change social and governmental systems affecting everything from the environment, human rights and justice to education and peace. DC-2012 will bring together in Kuching the community of metadata scholars and practitioners to engage in the exchange of knowledge and best practices in developing languages of description to meet these global challenges. Beyond the conference theme, papers, reports, and poster submissions are welcome on a wide range of metadata topics, such as:

Metadata principles, guidelines, and best practices

Metadata quality (methods, tools, and practices)

Conceptual models and frameworks (e.g., RDF, DCAM, OAIS)

Application profiles

Metadata generation (methods, tools, and practices)

Metadata interoperability across domains, languages, time, structures, and scales.

Cross-domain metadata uses (e.g., recordkeeping, preservation, curation, institutional repositories, publishing)

Domain metadata (e.g., for corporations, cultural memory institutions, education, government, and scientific fields)

Bibliographic standards (e.g., RDA, FRBR, subject headings) as Semantic Web vocabularies

Accessibility metadata

Metadata for scientific data, e-Science and grid applications

Social tagging and user participation in building metadata

Usage data (paradata/attention metadata)

Knowledge Organization Systems (e.g., ontologies, taxonomies, authority files, folksonomies, and thesauri) and Simple Knowledge Organization Systems (SKOS)

Ontology design and development

Integration of metadata and ontologies

Search engines and metadata

Linked data and the Semantic Web (metadata and applications)

Vocabulary registries and registry services

Comments Off

April 13, 2011

dc:subject meets the 5 Ws (and one H)

Filed under: Dublin Core,Marketing,Metadata,Topic Maps — Patrick Durusau @ 1:25 pm

A recent post by Andrew Townley, Mechanisms for expressing ‘aboutness’ effectively? made me reconsider dc:subject.

Or perhaps more accurately, what does dc:subject, and similar properties, lack?

It has a certain flatness that can best be illustrated by evaluating it using the 5 W’s (and one H). As listed by Wikipedia (Five Ws):

Who is it about?

What happened (what’s the story)?

Where did it take place?

When did it take place?

Why did it happen?

How did it happen?

The who is answered, but only by attachment to a particular item.

And it can be argued that dc:subject answers what by its value. A simple string value but that is a question of the degree of usefulness of an answer.

But what of where, when, why and how?

They go unanswered.

True enough, that is information about the subject being assigned.

To use misleading terminology poorly, metadata about metadata.

But knowing why a particular dc:subject property value was assigned to an item could help with consistent use of that property value.

Or even discover when other dc:subject values were being used for the same subject.

Just as knowing when a subject was assigned to an item could be used to establish chronologies of subject classification as applied to particular items.

Or knowing where, both as in geographic as well as institutional location may reveal differences in subject assignment for the same items.

Data (by which I encompass the misnomer “metadata”) that lacks the means to answer the 5 Ws (and one H) for itself is impoverished, and unnecessarily so.

Comments Off

April 4, 2011

DC-2011 Extended Deadline

Filed under: Conferences,Dublin Core,Metadata — Patrick Durusau @ 6:33 pm

DC-2011 Extended Deadline

New submission deadline: 30 April 2011

See DC-2011 for conference details (Dublin Core Metadata)

Comments Off

March 15, 2011

DC-2011

Filed under: Conferences,Dublin Core,Metadata — Patrick Durusau @ 5:19 am

DC-2011

International Conference on Dublin Core and Metadata Applications:
Metadata Harmonization: Bridging Languages of Description

Bridging Languages of Description: Would be hard to think of a more apt description of topic maps in a metadata context.

21-23 September 2011, The Hague, Netherlands

“Metadata is an increasingly central tool in the current web environment, enabling large-scale, distributed management of resources. Recent years has seen a growth in interaction between previously relatively isolated metadata communities, driven by the need for cross-domain collaboration and exchange. However, metadata standards have not been able to meet the needs of interoperability between independent standardization communities. For this reason the notion of metadata harmonization, defined as interoperability of combinations of metadata specifications, has arisen as a core issue for the future of web-based metadata.”[1] Resting at the heart of application profiles, metadata harmonization presents a little understood, but critical challenge in design of languages of description. DC-2011 will explore the conceptual and practical issues of design when the language solution calls for cross-fertilization from different metadata specifications.

[1] Nilsson, Mikael. (2010). From Interoperability to Harmonization in Metadata Standardization: Designing an Evolvable Framework for Metadata Harmonization. Dissertation. KTH School of Computer Science and Communication. Stockholm, Sweden. http://kmr.nada.kth.se/papers/SemanticWeb/FromInteropToHarm-MikaelsThesis.pdf

Important dates:

Submission Deadline: 16 April 2011 – Extended 30 April 2011, see: DC-2011 Extended Deadline
Author Notification: 18 June 2011
Final Copy 23 July 2011

Beyond the conference theme, papers, reports, and poster submissions are welcome on a wide range of metadata topics, such as:

Metadata principles, guidelines, and best practices

Metadata quality (methods, tools, and practices)

Conceptual models and frameworks (e.g., RDF, DCAM, OAIS)

Application profiles

Metadata generation (methods, tools, and practices)

Metadata interoperability across domains, languages, time, structures, and scales

Cross-domain metadata uses (e.g., recordkeeping, preservation, curation, institutional repositories, publishing)

Domain metadata (e.g., for corporations, cultural memory institutions, education, government, and scientific fields)

Bibliographic standards (e.g., RDA, FRBR, subject headings) as Semantic Web vocabularies

Accessibility metadata

Metadata for scientific data, e-Science and grid applications

Social tagging and user participation in building metadata

Usage data (paradata/attention metadata)

Knowledge Organization Systems (e.g., ontologies, taxonomies, authority files, folksonomies, and thesauri) and Simple Knowledge Organization Systems (SKOS)

Ontology design and development

Integration of metadata and ontologies

Search engines and metadata

Linked data and the Semantic Web (metadata and applications)

Vocabulary registries and registry services

*****
A couple of notes for veteran topic map folks:

First, the conference is in the Hague, a truly remarkable place to visit so add one or two days to your trip to take a walk about the town. You won’t regret it.

Second, do note that the dissertation cited in the CFP doesn’t cite topic maps once in 235 pages. And concludes that RDF is the answer to semantic integration.

A rather remarkable claim considering RDF can’t distinguish between a locator and an identifier. Just so you know where to start the conversation.

Think of it as a missionary sort of adventure.

Comments (1)