Wikidata « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 2, 2014

Property Suggester

Filed under: Authoring Topic Maps,Wikidata — Patrick Durusau @ 7:09 pm

Wikidata just got 10 times easier to use by Lydia Pintscher.

From an email post:

We have just deployed the entity suggester. This helps you with suggesting properties. So when you now add a new statement to an item it will suggest what should most likely be added to that item. One example: You are on an item about a person but it doesn’t have a date of birth yet. Since a lot of other items about persons have a date of birth it will suggest you also add one to this item. This will make it a lot easier for you to figure out what the hell is missing on an item and which property to use.

Thank you so much to the student team who worked on this as part of their bachelor thesis over the last months as well as everyone who gave feedback and helped them along the way.

I’m really happy to see this huge improvement towards making Wikidata easier to use. I hope so are you.

I suspect such a suggester for topic map authoring would need to be domain specific but it would certainly be a useful feature.

At least so long as I can say: No more suggestions of X property.

An added wrinkle could be suggested properties and why, from a design standpoint, they could be useful to include.

Comments Off

March 20, 2014

Wikidata: A Free Collaborative Knowledge Base

Filed under: Wikidata,Wikipedia — Patrick Durusau @ 7:29 pm

Wikidata: A Free Collaborative Knowledge Base by Denny Vrandečić and Markus Krötzsch.

Abstract:

Unnoticed by most of its readers, Wikipedia is currently undergoing dramatic changes, as its sister project Wikidata introduces a new multilingual ‘Wikipedia for data’ to manage the factual information of the popular online encyclopedia. With Wikipedia’s data becoming cleaned and integrated in a single location, opportunities arise for many new applications.

In this article, we provide an extended overview of Wikidata, including its essential design choices and data model. Based on up-to-date statistics, we discuss the project’s development so far and outline interesting application areas for this new resource.

Denny Vrandečić, Markus Krötzsch. Wikidata: A Free Collaborative Knowledge Base. In Communications of the ACM (to appear). ACM 2014.

If you aren’t already impressed by Wikidata, this article should be the cure!

Comments Off

January 22, 2014

Wikidata in 2014 [stable identifiers]

Filed under: Identifiers,Merging,Wikidata — Patrick Durusau @ 3:00 pm

Wikidata in 2014

From the development plans for Wikidata in 2014, it looks like a busy year.

There are a number of interesting work items but one in particular caught my attention:

Merges and redirects

bugzilla:57744 and bugzilla:38664

When two different items about the same topic are created they can be merged. Labels, descriptions, aliases, sitelinks and statements are merged if they do not conflict. The item that is left empty can then be turned into a redirect to the other. This way, Wikidata IDs can be regarded as stable identifiers by 3rd-parties.

As more data sets come online, preserving “stable identifiers” from each data set is going to be important. You can’t know in advance which data set a particular researcher may have used as a source of identifiers.

Here of course they are talking about “stable identifiers” inside of Wikidata.

In principle though, I don’t see any reason we can treat “foreign” identifiers as stable.

You?

Comments Off

January 3, 2014

Wikibase DataModel released!

Filed under: Data Models,Identification,Precision,Subject Identity,Wikidata,Wikipedia — Patrick Durusau @ 5:04 pm

Wikibase DataModel released! by Jeroen De Dauw.

From the post:

I’m happy to announce the 0.6 release of Wikibase DataModel. This is the first real release of this component.

DataModel?

Wikibase is the software behind Wikidata.org. At its core, this software is about describing entities. Entities are collections of claims, which can have qualifiers, references and values of various different types. How this all fits together is described in the DataModel document written by Markus and Denny at the start of the project. The Wikibase DataModel component contains (PHP) domain objects representing entities and their various parts, as well as associated domain logic.

I wanted to draw your attention to this discussion of “items:”

Items are Entities that are typically represented by a Wikipage (at least in some Wikipedia languages). They can be viewed as “the thing that a Wikipage is about,” which could be an individual thing (the person Albert Einstein), a general class of things (the class of all Physicists), and any other concept that is the subject of some Wikipedia page (including things like History of Berlin).

The IRI of an Item will typically be closely related to the URL of its page on Wikidata. It is expected that Items store a shorter ID string (for example, as a title string in MediaWiki) that is used in both cases. ID strings might have a standardized technical format such as “wd1234567890” and will usually not be seen by users. The ID of an Item should be stable and not change after it has been created.

The exact meaning of an Item cannot be captured in Wikidata (or any technical system), but is discussed and decided on by the community of editors, just as it is done with the subject of Wikipedia articles now. It is possible that an Item has multiple “aspects” to its meaning. For example, the page Orca describes a species of whales. It can be viewed as a class of all Orca whales, and an individual whale such as Keiko would be an element of this class. On the other hand, the species Orca is also a concept about which we can make individual statements. For example, one could say that the binomial name (a Property) of the Orca species has the Value “Orcinus orca (Linnaeus, 1758).”

However, it is intended that the information stored in Wikidata is generally about the topic of the Item. For example, the Item for History of Berlin should store data about this history (if there is any such data), not about Berlin (the city). It is not intended that data about one subject is distributed across multiple Wikidata Items: each Item fully represents one thing. This also helps for data integration across languages: many languages have no separate article about Berlin’s history, but most have an article about Berlin.

What do you make of the claim:

The exact meaning of an Item cannot be captured in Wikidata (or any technical system), but is discussed and decided on by the community of editors, just as it is done with the subject of Wikipedia articles now. It is possible that an Item has multiple “aspects” to its meaning. For example, the page Orca describes a species of whales. It can be viewed as a class of all Orca whales, and an individual whale such as Keiko would be an element of this class. On the other hand, the species Orca is also a concept about which we can make individual statements. For example, one could say that the binomial name (a Property) of the Orca species has the Value “Orcinus orca (Linnaeus, 1758).”

I may write an information system that fails to distinguish between a species of whales, a class of whales and a particular whale, but that is a design choice, not a foregone conclusion.

In the case of Wikipedia, which relies upon individuals repeating the task of extracting relevant information from loosely gathered data, that approach words quite well.

But there isn’t one degree of precision of identification that works for all cases.

My suspicion is that for more demanding search applications, such as drug interactions, less precise identifications could lead to unfortunate, even fatal, results.

Yes?

Comments (1)

August 13, 2013

Wikidata RDF export available [And a tale of “part of.”]

Filed under: RDF,Wikidata — Patrick Durusau @ 3:04 pm

Wikidata RDF export available by Markus Krötzsch.

From the post:

I am happy to report that an initial, yet fully functional RDF export for Wikidata is now available. The exports can be created using the wda-export-data.py script of the wda toolkit [1]. This script downloads recent Wikidata database dumps and processes them to create RDF/Turtle files. Various options are available to customize the output (e.g., to export statements but not references, or to export only texts in English and Wolof). The file creation takes a few (about three) hours on my machine depending on what exactly is exported.

Wikidata (homepage)

WikiData:Database download.

I read an article about combining data released under different licenses earlier today. No problems here because the data is released under Creative Commons CCO License. What for content in other namespaces. Different licensing may apply.

To run the Python script wda-export-data.py I had to install Python-bitarray, just in case you get an error message it is missing.

Use the data with caution.

The entry for Wikipedia reports in part:

part of List of Wikimedia projects

If you follow “part of” you will find:

this item is a part of that item

Also known as:

section of
system of
subsystem of
subassembly of
sub-system of
sub-assembly of
merged into
contained within
assembly of
within a set

“[P]art of” covers enough semantic range to return Google-like results (bad).

Not to mention that as a subject, I think “Wikipedia” is a bit more than an entry in a list.

Don’t you?

Comments Off

April 26, 2013

The Wikidata revolution is here:…

Filed under: Data,Wikidata,Wikipedia — Patrick Durusau @ 5:52 pm

The Wikidata revolution is here: enabling structured data on Wikipedia by Tilman Bayer.

From the post:

A year after its announcement as the first new Wikimedia project since 2006, Wikidata has now begun to serve the over 280 language versions of Wikipedia as a common source of structured data that can be used in more than 25 million articles of the free encyclopedia.

By providing Wikipedia editors with a central venue for their efforts to collect and vet such data, Wikidata leads to a higher level of consistency and quality in Wikipedia articles across the many language editions of the encyclopedia. Beyond Wikipedia, Wikidata’s universal, machine-readable knowledge database will be freely reusable by anyone, enabling numerous external applications.

“Wikidata is a powerful tool for keeping information in Wikipedia current across all language versions,” said Wikimedia Foundation Executive Director Sue Gardner. “Before Wikidata, Wikipedians needed to manually update hundreds of Wikipedia language versions every time a famous person died or a country’s leader changed. With Wikidata, such new information, entered once, can automatically appear across all Wikipedia language versions. That makes life easier for editors and makes it easier for Wikipedia to stay current.”

This is a great source of curated data!

Comments Off

October 31, 2012

Wikidata

Filed under: Data,Wikidata — Patrick Durusau @ 11:30 am

Wikidata

From the webpage:

Wikidata is a free knowledge base that can be read and edited by humans and machines alike. It is for data what Wikimedia Commons is for media files: it centralizes access and management of structured data, such as interwiki references and statistical information. Wikidata contains data in all languages for which there are Wikimedia projects.

Not fully operational but still quite interesting.

Particularly the re-use of information aspects.

Re-use of data being one advantage commonly found in topic maps.

Comments Off