I was writing Clojure/west 2016 – Videos! [+ Unix Sort Trick] when the itch to use Youtube APIs to facilitate extraction and re-use of conference videos struck yet again!
It lasted long enough this time for me to discover the data poverty at Youtube, even using their APIs.
Here’s what little relevant information Youtube captures for a video resource for my purposes:
{ "kind": "youtube#video", "etag": etag, "id": string, "snippet": { "publishedAt": datetime, "channelId": string, "title": string, "description": string, "thumbnails": { (key): { "url": string, "width": unsigned integer, "height": unsigned integer } }, "channelTitle": string, "tags": [ string ], "categoryId": string, "liveBroadcastContent": string, "defaultLanguage": string, "localized": { "title": string, "description": string }, "defaultAudioLanguage": string }, ... "topicDetails": { "topicIds": [ string ], "relevantTopicIds": [ string ] }, ...
Hmmmm, do you see author, date, location, followed by any number of other bits of data that even minimal retrieval would warrant?
The response that all of those fall under “description,” is true, but leaves users with a prone to fail on search information resource.
The really sad part of this tale is that Youtube has built up such a large legacy of data impoverished video, that any curation will be automated and only spot-checked.
Rather than dig this dark-data hole any deeper, YouTube should add additional metadata by some fixed date.
Let’s not gin up new metadata categories/values but call upon librarians to suggest existing metadata standards, such as Dublin Core or others.
Librarians have labored at this task for centuries and Youtube is a good example as a result of their absence. Usable, but only just, and that only with the aid of powerful digital computers.
Let’s stop spreading data darkness in Youtube and make its data reusable.