Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 21, 2012

Making sense of Wikipedia categories

Filed under: Annotation,Classification,Wikipedia — Patrick Durusau @ 8:00 pm

Making sense of Wikipedia categories

Hal Daume III writes:

Wikipedia’s category hierarchy forms a graph. It’s definitely cyclic (Category:Ethology belongs to Category:Behavior, which in turn belongs to Category:Ethology).

At any rate, did you know that “Chicago Stags coaches” are a subcategory of “Natural sciences”? If you don’t believe me, go to the Wikipedia entry for the Natural sciences category, and expand the following list of subcategories:

(subcategories omitted)

I guess it kind of makes sense. There are some other fun ones, like “Rhaeto-Romance languages”, “American World War I flying aces” and “1911 films”. Of course, these are all quite deep in the “hierarchy” (all of those are at depth 15 or higher).

Hal examines several strategies and concludes asking:

Has anyone else tried and succeed at using the Wikipedia category structure?

Some other questions:

Is Hal right that hand annotation doesn’t “scale?”

I have heard that more times than I can count but never seen any studies cited to support it.

After all, Wikipedia was manually edited and produced. Yes? No automated process created its content. So, what is the barrier to hand annotation?

If you think about it, the same could be said about email but most email (yes?) is written by hand. Not produced by automated processes (well, except for spam), so why can’t it be hand annotated? Or at least why can’t we capture semantics of email at the point of composition and annotate it there by automated means?

Hand annotation may not scale for sensor data or financial data streams but is hand annotation needed for such sources?

Hand annotation may not scale for say twitter posts by non-English speakers. But only for agencies with very short-sighted if not actively bigoted hiring/contracting practices.

Has anyone loaded the Wikipedia categories into a graph database? What sort of interface would you suggest for trial arrangement of the categories?

PS: If you are interested in discussing how-to establish assisted annotation for twitter, email or other data streams, with or without user awareness, send me a post.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress