Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 26, 2013

Ocean Data Interoperability Platform (ODIP)

Filed under: Interoperability,Open Data — Patrick Durusau @ 1:53 pm

Ocean Data Interoperability Platform (ODIP)

From the post:

The Ocean Data Interoperability Platform (ODIP) is a 3-year initiative (2013-2015) funded by the European Commission under the Seventh Framework Programme. It aims to contribute to the removal of barriers hindering the effective sharing of data across scientific domains and international boundaries.

ODIP brings together 11 organizations from United Kingdom, Italy, Belgium, The Netherlands, Greece and France with the objective to provide a forum to harmonise the diverse regional systems.

The First Workshop will take place from Monday 25 February 2013 to and including Thursday 28 February 2013. More information about the workshop at 1st ODIP Workshop.

From the workshop page, a listing of topics with links to further materials:

Gathering a snapshot of our present day semantic diversity is an extremely useful exercise. Whatever your ultimate choice for a “solution.”

February 19, 2013

G-8 International Conference on Open Data for Agriculture

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 6:38 am

G-8 International Conference on Open Data for Agriculture

April 29-30, 2013 Washington, D.C.

Deadline for proposals: Midnight, February 28, 2013.

From the call for ideas:

Are you interested in addressing global challenges, such as food security, by providing open access to information? Would you like the opportunity to present to leaders from around the world?

We are seeking innovative products and ideas that demonstrate the potential of using open data to increase food security. This April 29-30th in Washington, D.C., the G-8 International Conference on Open Data for Agriculture will host policy makers, thought leaders, food security stakeholders, and data experts to build a strategy to share agriculture data and make innovation more accessible. As part of the conference, we are giving innovators a chance to showcase innovative uses of open data for food security in a lightning presentation or in the exhibit hall. This call for ideas is a chance to demonstrate the potential that open data can have in ensuring food security, and can inform an unprecedented global collaboration. Visit data.gov to see what agricultural data is already available and connect to other G-8 open data sites!

We are seeking top innovators to show the world what can be done with open data through:

  • Lightning Presentations: brief (3-5 minute), image rich presentations intended to convey an idea
  • Exhibit Hall: an opportunity to convey an idea through an image-rich exhibit.

Presentations should inspire others to share their data or imagine how open data could be used to increase food security. Presentations may include existing, new, or proposed applications of open data and should meet one or more of the following criteria:

  • Demonstrate the impact of open data on food security.
  • Demonstrate the impact of access to agriculturally-relevant data on developed and/or developing countries.
  • Demonstrate the impact of bringing multiple sources of agriculturally-relevant public and/or private open data together (think about the creation of an agriculture equivalent of weather.com)

For those with a new idea, we invite you to submit your proposal to present it to leading experts in food security, technology and data innovation. Proposals should identify which data is needed that is publicly available, for free, on the internet. Proposals must also include a design of the application including relevance to the target audience and plans for beta testing. A successful prototype will be mobile, interactive, and scalable. Proposals to showcase existing products or pitch new ideas will be reviewed by a global panel of technical experts from the G-8 countries.

Short notice but from the submission form on the website, you only get 75-100 words to summarize your proposal.

Hell, I have trouble identifying myself in 75-100 words. 😉

Still, if you are in D.C. and interested, it could be a good way to meet people in this area.

The nine flags for the G-8 are confusing at first. Not an example of government committee counting. The EU has a representative at G-8 meetings.

I first saw this at: Open Call to Innovators: Apply to present at G-8 International Conference on Open Data for Agriculture.

February 13, 2013

Competition: visualise open government data and win $2,000

Filed under: Contest,Graphics,Open Data,Open Government,Visualization — Patrick Durusau @ 1:54 pm

Competition: visualise open government data and win $2,000 by Simon Rogers.

Closing date: 23:59 BST on 2 April 2013

What can you do with the thousands of open government datasets? With Google and Open Knowledge Foundation we are launching a competition to find the best dataviz out there. You might even win a prize.

(graphic omitted)

Governments around the world are releasing a tidal wave of open data – on everything from spending through to crime and health. Now you can compare national, regional and city-wide data from hundreds of locations around the world.

But how good is this data? We want to see what you can do with it. What apps and visualisations can you make with this data? We want to see how the data changes the way you see the world.

In conjunction with Google and the Open Knowledge Foundation (who will be helping us judge the results), see if you can win the $2,000 prize.

All we want you to do is to take an open dataset from any government open data website (there’s a list of them at the bottom of this article) and visualise it.

The competition is open to citizens of the UK, US, France, Germany, Spain, Netherlands, Sweden. The winner will take home $2,000 and the result will be published on the Guardian Datastore on our Show and Tell site.

Here are some of the key datasets we’ve found (list below) – and feel free to bring your own data to the party – we only ask that it is freely available and open as in OpenDefinition.org.

You are visualizing data anyway, why not take a chance on free PR and $2,000?

February 2, 2013

Alpha.data.gov: From Open Data Provider to Open Data Hub

Filed under: Government,Government Data,Open Data,Topic Maps — Patrick Durusau @ 3:08 pm

Alpha.data.gov: From Open Data Provider to Open Data Hub by Andrea Di Maio.

From the post:

Those who happen to read my blog know that I am rather cynical about many enthusiastic pronouncements around open data. One of the points I keep banging on is that the most common perspective is that open data is just something that governments ought to publish for businesses and citizens to use it. This perspective misses both the importance of open data created elsewhere – such as by businesses or by people in social networks – and the impact of its use inside government. Also, there is a basic confusion between open and public data: not all open data is public and not all public data may be open (although they should, in the long run).

In this respect the new experimental site alpha.data.gov is a breath of fresh air. Announced in a recent post on the White House blog, it does not contain data, but explains which categories of open data can be used for which sort of purposes.

A step in the right direction.

Simply gathering the relevant data sets for any given project is a project in and of itself.

Followed by documenting the semantics of the relevant data sets.

Data hubs are a precursor to collections of semantic documentation for data found at data hubs.

You know what should follow from collections of semantic documentation. 😉 (Can you say topic maps?)

January 31, 2013

Open Data Protocols, DCIP [A Topic Map Song]

Filed under: DCIP,Open Data,Topic Maps — Patrick Durusau @ 7:25 pm

Open Data Protocols, DCIP

From the post:

Have you ever heard about Data Catalog Interoperability Protocol (DCIP)? DCIP is a specification designed to facilitate interoperability between data catalogs published on the Web by defining:

  • a JSON and RDF representation for key data catalog entities such asDataset (DatasetRecord) and Resource (Distribution)based on the DCAT vocabulary
  • a read only REST based protocol for achieving basic catalog interoperability

Data Catalog Interoperability Protocol (DCIP) v0.2 discusses each of the above and provides examples. The approach is designed to be a pragmatic and easily implementable. It merges existing work on DCAT with the real-life experiences of “harvesting” in various projects.

To know more about DCIP, you can visit the  Open Data Protocols  website, which aims to make easier to develop tools and services for working with data, and, to ensure greater interoperability between new and existing tools and services.

The news of new formats, protocols and the like are music to topic map ears.

The refrain is: “cha-ching, cha-ching, cha-ching!”

😉

Only partially in jest.

Every time a new format (read set of subjects) is developed for the encoding of data (another set of subjects), it is be definition different from all that came before.

With good reason. Every sentient being on the planet will be flocking to format/protocol X for all their data.

Well, except that flocking is more like a trickle for most new formats. Particularly when compared to the historical record of formats.

In theory topic maps are an exception to that rule, except that when you map specific instances of other data formats, you have committed yourself to a particular set of mappings.

Still, that’s better than rip-and-replace or ETL processing of data. It maintains backwards compatibility with existing systems while anticipating future systems.

Saturday 23rd February is Open Data Day 2013!

Filed under: Contest,Open Data — Patrick Durusau @ 7:24 pm

Saturday 23rd February is Open Data Day 2013! from AIMS.

From the post:

Open Data Day is a gathering of citizens in cities around the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption of open data policies by the world’s local, regional and national governments. There are Open Data Day events taking place all around the world.

Are you are planning to organize or participate in one of these events? Are you going to launch new open data catalogs on the Open Data Day? Share with us your plans and highlight events that might be of interest for the agricultural information management community.

Know more at http://opendataday.org/

As of today: 52 events.

Anyone interested in a virtual event on Open Data Day using open data and topic maps?

January 21, 2013

O’Reilly’s Open Government book [“…more equal than others” pigs]

Filed under: Government,Government Data,Open Data,Open Government,Transparency — Patrick Durusau @ 7:30 pm

We’re releasing the files for O’Reilly’s Open Government book by Laurel Ruma.

From the post:

I’ve read many eloquent eulogies from people who knew Aaron Swartz better than I did, but he was also a Foo and contributor to Open Government. So, we’re doing our part at O’Reilly Media to honor Aaron by posting the Open Government book files for free for anyone to download, read and share.

The files are posted on the O’Reilly Media GitHub account as PDF, Mobi, and EPUB files for now. There is a movement on the Internet (#PDFtribute) to memorialize Aaron by posting research and other material for the world to access, and we’re glad to be able to do this.

You can find the book here: github.com/oreillymedia/open_government

Daniel Lathrop, my co-editor on Open Government, says “I think this is an important way to remember Aaron and everything he has done for the world.” We at O’Reilly echo Daniel’s sentiment.

Be sure to read Chapter 25, “When Is Transparency Useful?”, by the late Aaron Swartz.

It includes this passage:

…When you create a regulatory agency, you put together a group of people whose job is to solve some problem. They’re given the power to investigate who’s breaking the law and the authority to punish them. Transparency, on the other hand, simply shifts the work from the government to the average citizen, who has neither the time nor the ability to investigate these questions in any detail, let alone do anything about it. It’s a farce: a way for Congress to look like it has done something on some pressing issue without actually endangering its corporate sponsors.

As a tribute to Aaron, are you going to dump data on the WWW or enable the calling of “more equal than others” pigs to account?

January 20, 2013

OpenAIRE Study

Filed under: EU,Open Data — Patrick Durusau @ 8:04 pm

Implementing Open Access Mandates in Europe: OpenAIRE Study by Thembani Malapela.

From the webpage:

Implementing Open Access Mandates in Europe : OpenAIRE Study on the Development of Open Access Repository Communities in Europe is the title of a recent book authored by Birgit Schmidt and Iryna Kuchma. The book highlights the existing open access policies in Europe and provides an overview of publisher’s self archiving policies and it further gives strategies for policy implementation. Such strategies include both institutional and national – which have been used in implementing open access policy mandates. This work provides a unique overview of national awareness of open access in 32 European countries involving all EU member states and in addition, Norway, Iceland, Croatia, Switzerland and Turkey.

What makes this book an interesting read is that it taps into activities implemented through OpenAIRE project and related repository projects by other stakeholders in Europe. Despite its extensive coverage on the implementation of Open Access Mandates in the region, the authors acknowledge, “the main issues that still need to be resolved in the coming years include the effective promotion of open access among research communities and support in copyright management for researchers and research institutions as well as intermediaries such as libraries and repositories”.

The more data that becomes “open,” the greater the semantic diversity you will find.

Important to follow the discussion as you prepare to map more and more information into your topic map.

January 19, 2013

Could Governments Run Out of Patience with Open Data? [Semantic Web?]

Filed under: Government,Open Data,Semantic Web — Patrick Durusau @ 7:06 pm

Could Governments Run Out of Patience with Open Data? by Andrea Di Maio.

From the post:

Yesterday I had yet another client conversation – this time with a mid-size municipality in the north of Europe – on the topic of the economic value generated through open data. The problem we discussed is the same I highlighted in a post last year: nobody argues the potential long term value of open data but it may be difficult to maintain a momentum (and to spend time, money and management bandwidth) on something that will come to fruition in the more distant future, while more urgent problems need to be solved now, under growing budget constraints.

Faith is not enough, nor are the many examples that open data evangelists keep sharing to demonstrate value. Open data must help solve today’s problems too, in order to gain the credibility and the support required to realize future economic value.

While many agree that open data can contribute to shorter term goals, such as improving inter-agency transparency and data exchange or engaging citizens on solving concrete problems, making this happen in a more systematic way requires a change of emphasis and a change of leadership.

Emphasis must be on directing efforts – be they idea collections, citizen-.developed dashboards or mobile apps – onto specific, concrete problems that government organizations need to solve. One might argue that this is not dissimilar from having citizens offer perspectives on how they see existing issues and related solutions. But there is an important difference: what usually happens is that citizens and other stakeholders are free to use whichever data they want to use. The required change is to entice them to help governments solve problems the way governments see them. In other terms, whereas citizens would clearly remain free to come up with whichever use of any open data they deem important, they should get incentives, awards, prizes only for those uses that meet clear government requirements. Citizens would be at the service of government rather than the other way around. For those who might be worried that this advocates for an unacceptable change of responsibility and that governments are at the service of citizens and not the other way around, what I mean is that citizens should help governments serve them.

January 5, 2013

The Semantic Link [ODI Drug Example?]

Filed under: Open Data,Semantic Web — Patrick Durusau @ 3:10 pm

The Semantic Link

Archive of the Semantic Link podcasts.

Semantic Link is a monthly podcast on Semantic Technologies from Semanticweb.com.

In December of 2012, Nigel Shadbolt, chairman and co-founder of ODI (Open Data Institute) is a special guest.

Nigel offers an odd example of the value of open data. See what you think:

The prescriptions, but not for who, written by all physicians, are made public. A start-up company noticed that many prescribed drugs were “off-license” (generic to use the U.S. terminology) but doctors were still prescribing the brand name drug.

Reported savings of 200 million £ in one drug area.

That success isn’t a function of having “open data” but having an intelligent person review the data. Whether open or not.

I can assure you my drug company knows the precise day when it anticipates a generic version of a drug will become available. 😉

December 27, 2012

Majuro.JS [Useful Open Data]

Filed under: Mapping,Maps,Open Data,Visualization — Patrick Durusau @ 11:13 am

Majuro.JS by Nick Doiron.

From the homepage:

Majuro.JS helps you make detailed, interactive maps with open buildings data.

Great examples on the homepage but I prefer the explanation at Github.

This is wicked cool!

This type of open data I can see as the basis for “innovation.”

Resulting in a target for rich annotation by a topic map based application.

December 26, 2012

Educated Guesses Decorated With Numbers

Filed under: Data,Data Analysis,Open Data — Patrick Durusau @ 1:48 pm

Researchers Say Much to Be Learned from Chicago’s Open Data by Sam Cholke.

From the post:

HYDE PARK — Chicago is a vain metropolis, publishing every minute detail about the movement of its buses and every little skirmish in its neighborhoods. A team of researchers at the University of Chicago is taking that flood of data and using it to understand and improve the city.

“Right now we have more data than we’re able to make use of — that’s one of our motivations,” said Charlie Catlett, director of the new Urban Center for Computation and Data at the University of Chicago.

Over the past two years the city has unleashed a torrent of data about bus schedules, neighborhood crimes, 311 calls and other information. Residents have put it to use, but Catlett wants his team of computational experts to get a crack at it.

“Most of what is happening with public data now is interesting, but it’s people building apps to visualize the data,” said Catlett, a computer scientist at the university and Argonne National Laboratory.

Catlett and a collection of doctors, urban planners and social scientists want to analyze that data so to solve urban planning puzzles in some of Chicago’s most distressed neighborhoods and eliminate the old method of trial and error.

“Right now we look around and look for examples where something has worked or appeared to work,” said Keith Besserud, an architect at Skidmore, Owings and Merrill's Blackbox Studio and part of the new center. “We live in a city, so we think we understand it, but it’s really not seeing the forest for the trees, we really don’t understand it.”

Besserud said urban planners have theories but lack evidence to know for sure when greater density could improve a neighborhood, how increased access to public transportation could reduce unemployment and other fundamental questions.

“We’re going to try to break down some of the really tough problems we’ve never been able to solve,” Besserud said. “The issue in general is the field of urban design has been inadequately served by computational tools.”

In the past, policy makers would make educated guesses. Catlett hopes the work of the center will better predict such needs using computer models, and the data is only now available to answer some fundamental questions about cities.

…(emphasis added)

Some city services may be improved by increased data, such as staging ambulances near high density shooting locations based upon past experience.

That isn’t the same as “planning” to reduce the incidence of unemployment or crime by urban planning.

If you doubt that statement, consider the vast sums of economic data available for the past century.

Despite that array of data, there are no universally acclaimed “truths” or “policies” for economic planning.

The temptation to say “more data,” “better data,” “better integration of data,” etc. will solve problem X is ever present.

Avoid disappointing your topic map customers.

Make sure a problem is one data can help solve before treating it like one.

I first saw this in a tweet by Tim O’Reilly.

November 21, 2012

Open Data Is Not for Sprinters [Or Open Data As Religion]

Filed under: Open Data — Patrick Durusau @ 7:13 am

Open Data Is Not for Sprinters by Andrea Di Maio.

Andrea’s comment on the UK special envoy who was “disappointed” with open data usage was to point out that government should be making better internal use of open data, to justify the open data programs.

His view was challenged by a member of an audience who said:

open data is for the sake of economic development and transparency, not for internal use.

Andrea’s response:

I do not disagree of course. All I am saying, and I have been saying for a while now, is that to realize this vision will take quite some time. Indeed more data must be available, of higher quality and timeliness; more entrepreneurs or “appreneurs” must be lured to extract value for businesses and the public at large from this data; and we need a stream of examples across sectors and regions to show that value can be generated everywhere.

A more direct answer would be to point out that statements like:

Opening up data is fundamentally about more efficient use of resources and improving service delivery for citizens. The effects of that are far reaching: innovation, transparency, accountability, better governance and economic growth. (Sir Tim Berners-Lee: Raw data, now!)

Are religious dogma. Useful if you want to run your enterprise or government based on religious dogma but you may as well use a Ouija board.

The astronomy community which has a history of “open data” that spans decades. I find the data very interesting and it has lead to discoveries in astronomy, but economic development?

The biological community apparently has a competition to see who can make more useful data available than the next lab. And it leads to better research, discoveries and innovation, but economic development?

The same holds true for the chemical community and numerous others.

The point being that claims such as “open data leads to economic development” are sure to disappoint.

Some open data might, but that is a question of research and proof, not mere cant.

A government, for example, could practice open data with regard to its tax policies and how it decides to audit taxpayers. I am sure startups would quickly take up the task of using that data to advise clients on how to avoid audits. (They are called tax advisors now.)

Or a government could practice open data on the White House visitor list and include non-tour visitors, some of them, in the thousands who visit every day. It’s “open data,” just not useful data. And not data that is likely to lead to economic development or transparency.

Governments should practice open data but with an eye towards selecting data that is likely to lead to economic development, innovation, etc. By tracking the use of “open data” now, governments can make rational decisions about what data to “open” in the future.

November 20, 2012

… ‘disappointed’ with open data use

Filed under: Open Data,Open Government — Patrick Durusau @ 8:20 pm

Prime minister’s special envoy ‘disappointed’ with open data use by Derek du Preez.

From the post:

Prime Minister David Cameron’s special envoy on the UN’s post-2015 development goals has said that he is ‘disappointed’ by how much the government’s open datasets have been used so far.

Speaking at a Reform event in London this week on open government and data transparency, Anderson said he recognises that the public sector needs to improve the way it pushes out the data so that it is easier to use.

“I am going to be really honest with you. As an official in a government department that has worked really hard to get a lot of data out in the last two years, I have been pretty disappointed by how much it has been used,” he said.

Easier to use data is one issue.

But the expectation that effort making data open = people interested in using it is another.

The article later reports there are 9,000 datasets available at data.gov.uk.

How relevant to every day concerns are those 9,000 datasets?

When the government starts disclosing the financial relationships between members of government, their families and contributors, I suspect interest in open data will go up.

November 14, 2012

Socrata Open Data Server, Community Edition

Filed under: DCAT,Open Data,Socrata Open Data Server — Patrick Durusau @ 5:15 am

Socrata Open Data Server, Community Edition by Saf Rabah.

From the post:

Socrata, the leading provider of cloud-based open data systems, today announced the “Socrata Open Data Server, Community Edition,” to be offered as an open source reference implementation for open data standards. Designed expressly to promote data portability throughout the open data ecosystem, and support open source software policies in public organizations around the globe, the “Socrata Open Data Server, Community Edition” will be released in the first quarter of 2013, as freely downloadable open source software and fully integrated with other components of the company’s commercial software products.

To learn more about the proposed open data standards, or to get involved in this community effort, please visit http://open-data-standards.github.com.

Looking forward to the release!

Resources on open data and various standards efforts related to the same.

Even with extensions, DCAT (Data Catalog Vocabulary) is going to leave a lot of room for mapping semantics between data sets.

October 23, 2012

Open Data vs. Private Data?

Filed under: Data,Open Data — Patrick Durusau @ 4:38 am

Why Government Should Care Less About Open Data and More About Data by Andrea Di Maio.

From the post:

Among the flurry of activities and deja-vu around open data that governments worldwide, in all tiers are pursuing to increase transparency and fuel a data economy, I found something really worth reading in a report that was recently published by the Danish government.

“Good Basic Data for Everyone – A Driver for Growth and Efficiency” takes a different spin than many others by saying that:

Basic data is the core information authorities use in their day-to-day case processing. Basic data is e.g. data on individuals, businesses, properties, , addresses and geography. This information, called basic data, is reused throughout the public sector. Reuse of high-quality data is an essential basis for public authorities to perform their tasks properly and efficiently. Basic data can include personal data.

While most of the categories are open data, the novelty is that for the first time personal and open data is seen for what it is, i.e. data. The document suggests the development of a Data Distributor, which would be responsible for conveying data from different data to its consumers, both inside and outside government. The document also assumes that personal data may be ultimately distributed via a common public-sector data distributor.

Besides what is actually written in the document, this opens the door for a much needed shift from service orientation to data orientation in government service delivery. Stating that data must flow freely across organizational boundaries, irrespective of the type of data (and of course within appropriate policy constraints) is hugely important to lay the foundations for effective integration of services and processes across agencies, jurisdictions, tiers and constituencies.

Combining this with some premises of the US Digital Strategy, which highlights an information layer distinct from a platform layer, which is in turn distinct from a presentation layer, one starts seeing a move toward the centrality of data, which may finally emerge to the emergence of citizen data stores that would put control of service access and integration in the hand of individuals.

If there is novelty in the Danish approach, it is from being “open data.” That is all citizens can draw equally on the “basic data” for whatever purpose.

Property records, geographic, geological and other maps, plus addresses were combined long ago in the United States as “private data.”

Despite being collected at taxpayer expense, private industry sells access to collated public data.

Open data may provide businesses with collated public data at a lower cost, but as an expense to the public.

What is know as a false dilemma: We can buy back data government collected on our behalf or we can pay government to collect and collate it for the few.


The “individual being in charge of their data” is too obvious a fiction to delay us here. Isn’t true now, no signs it will become true. If you doubt that, restrict the distribution of your credit report. Post a note when you accomplish that task.

October 9, 2012

Code for America: open data and hacking the government

Filed under: Government,Government Data,Open Data,Open Government,Splunk — Patrick Durusau @ 12:50 pm

Code for America: open data and hacking the government by Rachel Perkins.

From the post:

Last week, I attended the Code for America Summit here in San Francisco. I attended as a representative of Splunk>4Good (we sponsored the event via a nice outdoor patio lounge area and gave away some of our (in)famous tshirts and a few ponies). Since this wasn’t your typical “conference”, and I’m not so great at schmoozing, i was a little nervous–what would Christy Wilson, Clint Sharp, and I do there? As it turned out, there were so many amazing takeaways and so much potential for awesomeness that my nervousness was totally unfounded.

So what is Code for America?

Code for America is a program that sends technologists (who take a year off and apply to their Fellowship program) to cities throughout the US to work with advocates in city government. When they arrive, they spend a few weeks touring the city and its outskirts, meeting residents, getting to know the area and its issues, and brainstorming about how the city can harness its public data to improve things. Then they begin to hack.
Some of these partnerships have come up with amazing tools–for example,

  • Opencounter Santa Cruz mashes up several public datasets to provide tactical and strategic information for persons looking to start a small business: what forms and permits you’ll need, zoning maps with overlays of information about other businesses in the area, and then partners with http://codeforamerica.github.com/sitemybiz/ to help you find commercial space for rent that matches your zoning requirements.
  • Another Code for America Fellow created blightstatus.org, which uses public data in New Orleans to inform residents about the status and plans for blighted properties in their area.
  • Other apps from other cities do cool things like help city maintenance workers prioritize repairs of broken streetlights based on other public data like crime reports in the area, time of day the light was broken, and number of other broken lights in the vicinity, or get the citizenry involved with civic data, government, and each other by setting up a Stack Exchange type of site to ask and answer common questions.

Whatever your view data sharing by the government, too little, too much, just right, Rachel points to good things can come from open data.

Splunk has a “corporate responsibility program: Splunk>4Good.

Check it out!

BTW, do you have a topic maps “corporate responsibility” program?

September 16, 2012

New Army Guide to Open-Source Intelligence

Filed under: Intelligence,Open Data,Open Source,Public Data — Patrick Durusau @ 4:06 pm

New Army Guide to Open-Source Intelligence

If you don’t know Full Text Reports, you should.

A top-tier research professional’s hand-picked selection of documents from academe, corporations, government agencies, interest groups, NGOs, professional societies, research institutes, think tanks, trade associations, and more.

You will winnow some chaff but also find jewels like Open Source Intelligence (PDF).

From the post:

  • Provides fundamental principles and terminology for Army units that conduct OSINT exploitation.
  • Discusses tactics, techniques, and procedures (TTP) for Army units that conduct OSINT exploitation.
  • Provides a catalyst for renewing and emphasizing Army awareness of the value of publicly available information and open sources.
  • Establishes a common understanding of OSINT.
  • Develops systematic approaches to plan, prepare, collect, and produce intelligence from publicly available information from open sources.

Impressive intelligence overview materials.

Would be nice to re-work into a topic map intelligence approach document with the ability to insert a client’s name and industry specific examples. Has that militaristic tone that is hard to capture with civilian writers.

July 20, 2012

…10 billion lines of code…

Filed under: Open Data,Programming,Search Data,Search Engines — Patrick Durusau @ 5:46 pm

Also know as (aka):

Black Duck’s Ohloh lets data from nearly 500,000 open source projects into the wild by Chris Mayer.

From the post:

In a bumper announcement, Black Duck Software have embraced the FOSS mantra by revealing their equivalent of a repository Yellow Pages, through the Ohloh Open Data Initiative.

The website tracks 488,823 projects, allowing users to compare data from a vast amount of repositories and forges. But now, Ohloh’s huge dataset has been licensed under the Creative Commons Attribution 3.0 Unported license, encouraging further transparency across the companies who have already bought into Ohloh’s aggregation mission directive.

“Licensing Ohloh data under Creative Commons offers both enterprises and the open source community a new level of access to FOSS data, allowing trending, tracking, and insight for the open source community,” said Tim Yeaton, President and CEO of Black Duck Software.

He added: “We are constantly looking for ways to help the open source developer community and enterprise consumers of open source. We’re proud to freely license Ohloh data under this respected license, and believe that making this resource more accessible will allow contributors and consumers of open source gain unique insight, leading to more rapid development and adoption.”

What sort of insight would you expect to gain from “…10 billion lines of code…?”

How would you capture it? Pass it on to others in your project?

Mix or match semantics with other lines of code? Perhaps your own?

June 30, 2012

Inside the Open Data white paper: what does it all mean?

Filed under: Data,Open Data — Patrick Durusau @ 6:49 pm

Inside the Open Data white paper: what does it all mean?

The Guardian reviews a recent white paper on open data in the UK:

Does anyone disagree with more open data? It’s a huge part of the coalition government’s transparency strategy, championed by Francis Maude in the Cabinet Office and key to the government’s self-image.

And – following on from a less-than enthusiastic NAO report on its achievements in April – today’s Open Data White Paper is the government’s chance to seize the inititative.

Launching the paper, Maude said:

Today we’re at a pivotal moment – where we consider the rules and ways of working in a data‑rich world and how we can use this resource effectively, creatively and responsibly. This White Paper sets out clearly how the UK will continue to unlock and seize the benefits of data sharing in the future in a responsible way

And this one comes with a spreadsheet too – a list of each department’s commitments.

So, what does it actually include? White Papers are traditionally full of official, yet positive-sounding waffle, but what about specific announcements? We’ve extracted the key commitments below.

Just in case you are interested in open data from the UK or open data more generally.

it is amusing that the Guardian touts privacy concerns while at the same time bemoaning that access to “The Postcode Address File (PAFÂŽ) is a database that lists all known UK Postcodes and addresses.” remains in doubt.

I would rather a little less privacy and a little less junk mail if you please.

June 24, 2012

Closing In On A Million Open Government Data Sets

Filed under: Dataset,Geographic Data,Government,Government Data,Open Data — Patrick Durusau @ 7:57 pm

Closing In On A Million Open Government Data Sets by Jennifer Zaino.

From the post:

A million data sets. That’s the number of government data sets out there on the web that we have closed in on.

“The question is, when you have that many, how do you search for them, find them, coordinate activity between governments, bring in NGOs,” says James A. Hendler, Tetherless World Senior Constellation Professor, Department of Computer Science and Cognitive Science Department at Rensselaer Polytechnic Institute, and a principal investigator of its Linking Open Government Data project lives, as well as Internet web expert for data.gov, He also is connected with many other governments’ open data projects. “Semantic web tools organize and link the metadata about these things, making them searchable, explorable and extensible.”

To be more specific, Hendler at SemTech a couple of weeks ago said there are 851,000 open government data sets across 153 catalogues from 30-something countries, with the three biggest representatives, in terms of numbers, at the moment being the U.S., the U.K, and France. Last week, the one million threshold was crossed.

About 410,000 of these data sets are from the U.S. (federal, state, city, county, tribal included), including quite a large number of geo-data sets. The U.S. government’s goal is to put “lots and lots and lots of stuff out there” and let people figure out what they want to do with it, he notes.

My question about data that “..[is] searchable, explorable and extensible,” is whether anyone wants to search, explore or extend it?

Simply piling up data to say you have a large pile of data doesn’t sound very useful.

I would rather have a smaller pile of data that included contract/testing transparency on anti-terrorism IT projects, for example. If the systems aren’t working, then disclosing them isn’t going to make them work any less well.

Not that anyone need fear transparency or failure to perform. The TSA has failed to perform for more than a decade now, failed to catch a single terrorist and it remains funded. Even when it starts groping children, passengers are so frightened that even that outrage passes without serious opposition.

Still, it would be easier to get people excited about mining government data if the data weren’t so random or marginal.

June 9, 2012

The Power of Open Education Data [Semantic Content ~ 0]

Filed under: Education,Open Data — Patrick Durusau @ 7:19 pm

The Power of Open Education Data by Todd Park and Jim Shelton.

The title implies a description or example of the “power” of Open Education Data.

Here are ten examples of how this post disappoints:

  • …who pledged to provide…
  • …voting with their feet…
  • …can help with…
  • …as fuel to spur…
  • …seeks to (1) work with…
  • …and (2) collaborate with…
  • …will also include efforts…
  • …will enable them to create…
  • …will include work to develop…
  • …which can help fuel…

None of these have happened, just speculation on what might happen, maybe.

Let me call your attention to, Consumers and Credit Disclosures: Credit Cards and Credit Insurance (2002) by Thomas A. Durkin, a Federal Reserve study of the impact of the Truth in Lending Act, one of the “major” consumer victories of its day (1968).

From the conclusion:

Conclusively evaluating the direct effects of disclosure legislation like Truth in Lending on either consumer behavior or the functioning of the credit marketplace is never a simple matter because there are always competing explanations for observed phenomena. From consumer surveys over time, however, it seems likely that disclosures required by Truth in Lending have had a favorable effect on the ready availability of information on credit transactions.

Let me save some future federal reserve researcher time and effort and observe that with Open Education Data, there will be more information about the cost of higher education available.

What impact it had on behavior is unknown.

The Power of Open Education Data is a disservice to the data mining, open data, education and other communities. It is specious speculation, beneficial only to those seeking public office and the cronies they appoint.

May 16, 2012

OpenSource.com

Filed under: Open Data,Open Source — Patrick Durusau @ 1:30 pm

OpenSource.com

Not sure how I got to OpenSource.com but it showed up as a browser tab after a crash. Maybe it is a new feature and not a bug.

Thought I would take the opportunity to point it out (and record it here) as a source of projects and news from the open source community.

Not to mention data sets, source code, marketing opportunities, etc.

Identifying And Weighting Integration Hypotheses On Open Data Platforms

Filed under: Crowd Sourcing,Data Integration,Integration,Open Data — Patrick Durusau @ 12:58 pm

Identifying And Weighting Integration Hypotheses On Open Data Platforms by Julian Eberius, Katrin Braunschweig, Maik Thiele, and Wolfgang Lehner.

Abstract:

Open data platforms such as data.gov or opendata.socrata.com provide a huge amount of valuable information. Their free-for-all nature, the lack of publishing standards and the multitude of domains and authors represented on these platforms lead to new integration and standardization problems. At the same time, crowd-based data integration techniques are emerging as new way of dealing with these problems. However, these methods still require input in form of specific questions or tasks that can be passed to the crowd. This paper discusses integration problems on Open Data Platforms, and proposes a method for identifying and ranking integration hypotheses in this context. We will evaluate our findings by conducting a comprehensive evaluation using on one of the largest Open Data platforms.

This is interesting work on Open Data platforms but it is marred by claims such as:

Open Data Platforms have some unique integration problems that do not appear in classical integration scenarios and which can only be identi ed using a global view on the level of datasets. These problems include partial- or duplicated datasets, partitioned datasets, versioned datasets and others, which will be described in detail in Section 4.

Really?

Would come as a surprise to the World Data Centre for Aerosols which had Synthesis and INtegration of Global Aerosol Data Sets. Contract No. ENV4-CT98-0780 (DG 12 –EHKN) produced on data sets from 1999 to 2001. One of the specific issues they addressed were duplicate data sets.

More than a decade ago counts for a “classical integration scenario” I think.

Another quibble. Cited sources do not support the text.

New forms of data management such as dataspaces and pay-as-you-go data integration [2, 6] are a hot topic in database research. They are strongly related to Open Data Platforms in that they assume large sets of heterogeneous data sources lacking a global or mediated schemata, which still should be queried uniformly.

2 M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: a new abstraction for information management. SIGMOD Rec., 34:27{33, December 2005.

6 J. Madhavan, S. R. Je ery, S. Cohen, X. . Dong, D. Ko, C. Yu, A. Halevy, and G. Inc. Web-scale Data Integration: You Can Only A fford to Pay As You Go. In Proc. of CIDR-07, 2007.

Articles written seven (7) and five (5) years ago, do not justify a “hot topic(s) in database research.” claim today.

There are other issues, major and minor but for all that, this is important work.

I want to see reports that do justice to its importance.

May 15, 2012

Open Data Visualization: Keeping Traces of the Exploration Process

Filed under: Open Data,Visualization — Patrick Durusau @ 4:49 pm

Open Data Visualization: Keeping Traces of the Exploration Process by BenoĂŽt Otjacques, MickaĂŤl Stefas, MaĂŤl Cornil, and Fernand Feltz.

Abstract:

This paper describes a system to support the visual exploration of Open Data. During his/her interactive experience with the graphics, the user can easily store the current complete state of the visualization application (called a viewpoint). Next, he/she can compose sequences of these viewpoints (called scenarios) that can easily be reloaded. This feature allows to keep traces of a former exploration process, which can be useful in single user (to support investigation carried out in multiple sessions) as well as in collaborative setting (to share points of interest identified in the data set).

I was unaware of this paper when I wrote my “knowledge toilet” post earlier today. This looks like an interesting starting point for discussion.

Just speculating but I think there will be a “sweet spot” for how much effort users will devote to recording their input. For some purposes it will need to be almost automatic. Like the relationship between search terms and links users choose. Crude but somewhat effective.

On the other hand, there will be professional researchers/authors who want to sell their semantic annotations/mappings of resources.

And applications/use cases in between.

April 25, 2012

NYCFacets

Filed under: Marketing,Open Data — Patrick Durusau @ 6:26 pm

NYCFacets: Smart Open Data Exchange

From the FAQ:

Smart Open Data Exchange?

A: We just don’t catalog the metadata for each datasource. We squeeze additional metadata – extrametadata as we call it, and correlate all the datasources to allow Open Data Users to see the “forest for the trees”. Or in the case of NYC – the “city for the streets”? (TODO: find urban equivalent of “See Forest for the Trees“)

The “Smart” comes from a process we call “Crowdknowing” – leveraging metadata + extrametadata to score each dataset from various perspectives, automatically correlate them, and in the near future, perform semi-automatic domain mapping.

Extrametadata?

A: Derived Metadata – Statistics (Quantitative and Qualitative), Ontologies, Semantic Mappings, Inferences, Federated Queries, Scores, Curations, Annotations plus various other Machine and Human-powered signals through a process we call “Crowdknowing“.

Crowdknowing?

A: Human-powered, machine-accelerated, collective knowledge systems cataloging metadata + derived extrametadata (derived using semantics, statistics, algorithm and the crowd). At this stage, the human-powered aspect is not emphasized because we found that the NYC Data Catalog community is still in its infancy – there were very few comments and ratings. But we hope to help improve that over time as we crawl secondary signals (e.g. votes and comments in NYCBigApps, Challengepost and Appstores; Facebook likes; Tweets, etc.).

OK, it was covered as the winner of the most recent NYCBigApps contest but I thought it needed a separate shout-out.

Take a close look at what this site has done with a minimum of software and some clever thinking.

April 22, 2012

Open Government Data

Filed under: Data,Government Data,Open Data — Patrick Durusau @ 7:06 pm

Open Government Data by Joshua Tauberer.

From the website:

This book is the culmination of several years of thinking about the principles behind the open government data movement in the United States. In the pages within, I frame the movement as the application of Big Data to civics. Topics include principles, uses for transparency and civic engagement, a brief legal history, data quality, civic hacking, and paradoxes in transparency.

Johshua’s book can be ordered in hard copy, ebook, or viewed online for free.

You may find this title useful in discussions of open government data.

April 20, 2012

With Perfect Timing, UK Audit Office Review Warns Open Government Enthusiasts

Filed under: Government Data,Open Data — Patrick Durusau @ 6:24 pm

With Perfect Timing, UK Audit Office Review Warns Open Government Enthusiasts

Andrea Di Maio writes:

Right in the middle of the Open Government Partnership conference, which I mentioned in my post yesterday, the UK National Audit Office (NAO) published its cross-government review on Implementing Transparency.

The report, while recognizing the importance and the potential for open data initiatives, highlights a few areas of concern that should be taken quite seriously by the OGP conference attendees, most of which are making open data more a self-fulfilling prophecy than an actual tool for government transparency and transformation.

The areas of concern highlighted in the review are an insufficient attention to assess costs, risks and benefits of transparency, the variation in completeness of information and the mixed progress. While the two latter can improve with greater maturity, it is the first time that requires the most attention.

Better late than never.

I have yet to hear a discouraging word in the U.S. about the rush to openness by the Obama administration.

Not that I object to “openness,” but I would like to see meaningful “openness.”

Take campaign finance for example. Treating all contributions over fifty dollars ($50) the same is hiding influence buying in the chaff of reporting.

What matters is any contribution of over say $100,000 to a candidate. That would make the real supporters (purchasers really) of a particular office stand out.

The Obama Whitehouse uses hiding in the chaff to say they are disclosing White House visitors. Who are mixed into the weekly visitor log for the White House. Girl and Boy Scout troop visits don’t count the same as personal audiences with the President.

Government contract data should be limited to contracts over 500,000 and include individual owner and corporate names plus the names of their usual government contract officers. Might need to bump the $500,000 up but could try it for a year.

If we bring up the house lights we have to search everyone. Why not a flashlight on the masher in the back row?

February 22, 2012

District of Columbia – Data Catalog

Filed under: Data,Government Data,Open Data,Transparency — Patrick Durusau @ 4:48 pm

District of Columbia – Data Catalog

This is an example of a city moving towards transparency.

A large number of data sets to access (485 as of today), with live feeds to some data streams.

The Open Data Handbook

Filed under: Open Data — Patrick Durusau @ 4:47 pm

The Open Data Handbook

From the website:

This handbook discusses the legal, social and technical aspects of open data. It can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data – why to go open, what open is, and the how to ‘open’ data.

To get started, you may wish to look at the Introduction. You can navigate through the report using the Table of Contents (see sidebar or below).

We warmly welcome comments on the text and will incorporate feedback as we go forward. We also welcome contributions or suggestions for additional sections and areas to examine.

The handbook provides more legal and social than technical guidance but there are other resources for the technical side of open data.

The Open Data Handbook provides a much needed comfort level for government and other data holders.

Open data isn’t something odd or to be feared. It will empower new services, even job creation. The more data that is available, the more connections creative people will find in that data.

« Newer PostsOlder Posts »

Powered by WordPress