Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 2, 2014

OpenPrism

Filed under: Open Data,Open Government — Patrick Durusau @ 2:50 pm

Searching Data Tables

From the webpage:

There are loads of open data portals There’s even portal about data portals. And each of these portals has loads of datasets.

OpenPrism is my most recent attempt at understanding what is going on in all of these portals. Read on if you want to see why I made it, or just go to the site and start playing with it.

Naive search method

One difficulty in discovering open data is the search paradigm.

Open data portals approach searching data as if data were normal prose; your search terms are some keywords, a category, &c., and your results are dataset titles and descriptions.

OpenPrism is one small attempt at making it easier to search. Rather than going to all of the different portals and making a separate search for each portal, you type your search in one search bar, and you get results from a bunch of different Socrata, CKAN and Junar portals.

Certainly more efficient than searching data portals separately but searching data portals is highly problematic in any event.

Or at least more problematic that using one of the standard web search engines. Search engines that rely upon the choices of millions of users to fine tune their results and even then they are often a mixed bag.

Inter-data portals and I suspect most intra-data portals do not share common schemas or metadata. Which means a search that is successful in one data portal may return no results in another data portal.

Not that I am about to advocate a “universal” schema for all data portals. 😉

A good first step would be enabling data silo to have searchable mappings for data columns as suggested by users. Not machine implemented but just simple prose. Users researching in particular areas are likely to encounter the same data sets and recording their mappings could well assist other users.

Relying on user suggested mappings would also enable improvements to those data sets that get used the most, the ones users care about possibly combining. As opposed to having IT guessing what data mappings should have priority.

Sound like a plan?

See the source at GitHub.

I first saw this in a tweet by Felienne Hermans

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress