Archive for the ‘Open Data’ Category

Search Nonprofit Tax Forms

Friday, May 10th, 2013

ProPublica Launches Online Tool to Search Nonprofit Tax Forms by Doug Donovan.

From the post:

The investigative-journalism organization ProPublica started a free online service today for searching the federal tax returns of more than 615,000 nonprofits.

ProPublica began building its Nonprofit Explorer tool on its Web site shortly after the Internal Revenue Service announced in April that it was making nonprofit tax returns available in a digital, searchable format.

ProPublica’s database provides nonprofit Form 990 information free back to 2001, including executive compensation, total revenue, and other critical financial data

Scott Klein, editor of news applications at ProPublica, said Nonprofit Explorer is not meant to replace GuideStar, the most familiar online service for searching nonprofit tax forms. Many search results on Nonprofit Explorer also offer links to GuideStar data.

“They have a much richer tool set,” Mr. Klein said.

For now, Nonprofit Explorer does not include the tax forms filed by private foundations but is expected to do so in a future update.

I guess copy limitations prevented reporting the URL for the ProPublica’s Nonprofit Explorer.

Another place to look for smoke even if you are unlikely to find fire.

“We won’t get fooled again…”

Friday, May 10th, 2013

Landmark Steps to Liberate Open Data

There is no shortage of discussion of President Obama’s executive order that is alleged to result in greater access to government data.

Except then you read:

Agencies shall implement the requirements of the Open Data Policy and shall adhere to the deadlines for specific actions specified therein. When implementing the Open Data Policy, agencies shall incorporate a full analysis of privacy, confidentiality, and security risks into each stage of the information lifecycle to identify information that should not be released. These review processes should be overseen by the senior agency official for privacy. It is vital that agencies not release information if doing so would violate any law or policy, or jeopardize privacy, confidentiality, or national security.

Gee, I wonder who is going to decide what information gets released?

How would we know when “open data” efforts succeed?

Here’s my test: When ordinary citizens can mine open data and their complaints result in the arrest and conviction of public officials or government staff.

Unless and until that sort of information is public data, you are being distracted from important data by platitudes and flattery.

Cassava database becomes open access

Tuesday, May 7th, 2013

Cassava database becomes open access

From the post:

Cassavabase is a database of phenotypic and genotypic data generated by cassava breeding programs within the Next Generation Cassava Breeding (NEXTGEN Cassava) project*.

The database makes available breeding data immediately available, thereby providing cassava researchers and breeders a key reference data source. The Cassava plant (Manihot esculenta) feeds more than 500 million people mainly in Africa.

Besides phenotypic and genotypic data, Cassavabase  contains  cassava geographical maps, genome and sequences and other datasets produced within the NEXTGEN Cassava project. Data can be accessed through the web interface and also various tools are available to view the datasets. Cassavabase, and the advantages of open access data were presented at the recent G8 International Conference on Open Data for Agriculture held in Washington, D.C.

Cassava is a plant that isn’t subject to a Monsanto patent (I don’t think) or that requires Monsanto chemicals to grow properly.

That alone means you are unlikely to encounter references to it in globalization of agriculture discussions.

Why grow something you can’t sell internationally? While paying homage to Monsanto?

Answers suggest themselves to me but for now I simply wanted to make you aware of this dataset.

G8 – Open Data for Agriculture

Friday, May 3rd, 2013

G8 – Open Data for Agriculture (World Bank)

From the webpage:

At the 2012 G-8 Summit, G-8 leaders committed to the New Alliance for Food Security and Nutrition, the next phase of a shared commitment to achieving global food security.

As part of this commitment, they agreed to “share relevant agricultural data available from G-8 countries with African partners and convene an international conference on Open Data for Agriculture, to develop options for the establishment of a global platform to make reliable agricultural and related information available to African farmers, researchers and policymakers, taking into account existing agricultural data systems.”

On April 29-30, the G8 International Conference on Open Data for Agriculture brought together open data and agriculture experts along with U.S. Agriculture Secretary Tom Vilsack, U.S. Chief Technology Officer Todd Park, and World Bank Vice President for Sustainable Development Rachel Kyte to explore more opportunities for open data and knowledge sharing that can help farmers and governments in Africa and around the globe protect their crops from pests and extreme weather, increase their yields, monitor water supplies, and anticipate planting seasons that are shifting with climate change.

Webcasts on open data for agriculture.

This is immediately applicable to some work I am doing (more on that, hopefully later in May) but I discovered that the webcasts are single session casts. That is the one I am watching now is almost nine (9) hours long.

Fortunately I have the agenda and can guess fairly close on the part that I want to see.

Good background information if you are interested in topic maps in this space.

Open Data On The Web : April 2013

Thursday, April 25th, 2013

Open Data On The Web : April 2013 by Kal Ahmed.

From the post:

I was privileged to be one of the attendees of the Open Data on the Web workshop organized by W3C and hosted by Google in London this week. I say privileged because the gathering brought together researchers, developers and entrepreneurs from all around the world together in a unique mix that I’m sure won’t be achieved again until Phil Archer at W3C organizes the next one.

In the following I have not used direct quotes from those named as I didn’t make many notes of direct quotations. I hope that I have not misrepresented anyone, but if I have, please let me know and I will fix the text. This is not a journalistic report, its more a reflection of my concerns through the prism of a lot of people way smarter than me saying a lot of interesting things.

Covers sustainability, make it simpler?, data as a service, discoverability, attribution & licensing.

Kal has an engaging writing style and you will gain a lot just from his summary.

The issues he reports are largely the same across the datasphere, whatever your technological preference.

The Project With No Name

Thursday, April 4th, 2013

Fujitsu Labs And DERI To Offer Free, Cloud-Based Platform To Store And Query Linked Open Data by Jennifer Zaino.

From the post:

The Semantic Web Blog reported last year about a relationship formed between the Digital Enterprise Research Institute (DERI) and Fujitsu Laboratories Ltd. in Japan, focused on a project to build a large-scale RDF store in the cloud capable of processing hundreds of billions of triples. At the time, Dr. Michael Hausenblas, who was then a DERI research fellow, discussed Fujitsu Lab’s research efforts related to the cloud, its huge cloud infrastructure, and its identification of Big Data as an important trend, noting that “Linked Data is involved with answering at least two of the three Big Data questions” – that is, how to deal with volume and variety (velocity is the third).

This week, the DERI and Fujitsu Lab partners have announced a new data storage technology that stores and queries interconnected Linked Open Data, to be available this year, free of charge, on a cloud-based platform. According to a press release about the announcement, the data store technology collects and stores Linked Open Data that is published across the globe, and facilitates search processing through the development of a caching structure that is specifically adapted to LOD.

Typically, search performance deteriorates when searching for common elements that are linked together within data because of requirements around cross-referencing of massive data sets, the release says. The algorithm it has developed — which takes advantage of links in LOD link structures typically being concentrated in only a portion of server nodes, and of past usage frequency — caches only the data that is heavily accessed in cross-referencing to reduce disk accesses, and so accelerate searching.

Not sure what it means for the project between DERI and Fujitsu to have no name. Or at least no name in the press releases.

Until that changes, may I suggest: DERI and Fujitsu Project With No Name (DFPWNN)? ;-)

With or without a name I was glad for DERI because, well, I like research and they do it quite well.

DFPWNN’s better query technology for LOD will demonstrate, in my opinion, the same semantic diversity found at Swoogle.

Linking up semantically diverse content means just that, a lot of semantically diverse content, linked up.

The bill for leaving semantic diversity as a problem to address “later” is about to come due.

Open Data for Africa Launched by AfDB

Thursday, March 28th, 2013

Open Data for Africa Launched by AfDB

From the post:

The African Development Bank Group has recently launched the ‘Open Data for Africa‘ as part of the bank’s goal to improve data management and dissemination in Africa. The Open Data for Africa is a user friendly tool for extracting data, creating and sharing own customized reports, and visualising data across themes, sectors and countries in tables, charts and maps. The platform currently holds data from 20 African countries : Algeria, Cameroon, Cape Verde, Democratic Republic of Congo, Ethiopia, Malawi, Morocco, Mozambique, Namibia, Nigeria, Ghana, Rwanda, Republic of Congo, Senegal, South Africa, South Sudan, Tanzania, Tunisia, Zambia and Zimbabwe.

Not a lot of resources but a beginning.

One trip to one country isn’t enough to form an accurate opinion of a continent but I must report my impression of South Africa from several years ago.

I was at a conference with mid-level government and academic types for a week.

In a country where “child head of household” is a real demographic category, I came away deeply impressed with the optimism of everyone I met.

You can just imagine the local news in the United States and/or Europe if a quarter of the population was dying.

Vows of to “…never let this happen again…,” blah, blah, would choke the channels.

Not in South Africa. They readily admit to having a variety of serious issues but are equally serious about developing ways to meet those challenges.

If you want to see optimism in the face of stunning odds, I would strongly recommend a visit.

Data.ac.uk

Thursday, March 21st, 2013

Data.ac.uk

From the website:

This is a landmark site for academia providing a single point of contact for linked open data development. It not only provides access to the know-how and tools to discuss and create linked data and data aggregation sites, but also enables access to, and the creation of, large aggregated data sets providing powerful and flexible collections of information.
Here at Data.ac.uk we’re working to inform national standards and assist in the development of national data aggregation subdomains.

I can’t imagine a greater contrast between my poor web authoring skills and a website than this one.

But having said that, I think you will be as disappointed as I was when you start looking for data on this “landmark site.”

There is some but not nearly enough to match the promise of such a cleverly designed website.

Perhaps they are hoping that someday RDF data (they also offer comma and tab delimited versions) will catch up to the site design.

I first saw this in a tweet by Frank van Harmelen.

Open Data: The World Bank Data Blog

Wednesday, March 20th, 2013

Open Data: The World Bank Data Blog

In case you are following open data/government issues, you will want to add this blog to your RSS feed.

Not a high traffic blog but with twenty-seven contributing authors, you get a diversity of viewpoints.

Not to mention that the World Bank is a great source for general data.

I persist in thinking that transparency means identifying individuals responsible for decisions, expenditures and the beneficiaries of those decisions and expenditures.

That isn’t a popular position among those who make decisions and approve expenditures for unidentified beneficiaries.

You will either have to speculate on your own or ask someone else why that is an unpopular position.

Scenes from a Dive

Wednesday, March 20th, 2013

Scenes from a Dive – what’s big data got to do with fighting poverty and fraud? by Prasanna Lal Das.

From the post:

A more detailed recap will follow soon but here’s a very quick hats off to the about 150 data scientists, civic hackers, visual analytics savants, poverty specialists, and fraud/anti-corruption experts that made the Big Data Exploration at Washington DC over the weekend such an eye-opener.We invite you to explore the work that the volunteers did (these are rough documents and will likely change as you read them so it’s okay to hold off if you would rather wait for a ‘final’ consolidated  document). The projects that the volunteers worked on include: 

Here are some visualizations that some project teams built. A few photos from the event are here (thanks @neilfantom). More coming soon (and yes, videos too!). Thanks @francisgagnon for the first blog about the event. The event hashtag was #data4good (follow @datakind and @WBopenfinances for more updates on Twitter).

Great meeting and projects but I would suggest a different sort of “big data”

Requiring recipients to grant reporting access to all bank accounts where funds will be transferred and requiring the same for any entity paid out of those accounts to the point where transfers over 90 days are less than $1,000 for any entity (or related entity), would be a better start.

With the exception of the “related entity” information, banks already keep transfer of funds information as a matter of routine business. It would be “big data” that is rich in potential for spotting fraud and waste.

The reporting banks should also be required to deliver other banking records they have on the accounts where funds are transferred and other activity in those accounts.

Before crying “invasion of privacy,” remember World Bank funding is voluntary.

As is acceptance of payment from World Bank funded projects. Anyone and everyone is free to decline such funding and avoid the proposed reporting requirements.

“Big data” to track fraud and waste is already collected by the banking industry.

The question is whether we will use that “big data” to effectively track fraud and waste or wait for particularly egregious cases to come to light?

The Biggest Failure of Open Data in Government

Monday, March 18th, 2013

Many open data initiatives forget to include the basic facts about the government itself by Philip Ashlock.

From the post:

In the past few years we’ve seen a huge shift in the way governments publish information. More and more governments are proactively releasing information as raw open data rather than simply putting out reports or responding to requests for information. This has enabled all sorts of great tools like the ones that help us find transportation or the ones that let us track the spending and performance of our government. Unfortunately, somewhere in this new wave of open data we forgot some of the most fundamental information about our government, the basic “who”, “what”, “when”, and “where”.

US map

Do you know all the different government bodies and districts that you’re a part of? Do you know who all your elected officials are? Do you know where and when to vote or when the next public meeting is? Now perhaps you’re thinking that this information is easy enough to find, so what does this have to do with open data? It’s true, it might not be too hard to learn about the highest office or who runs your city, but it usually doesn’t take long before you get lost down the rabbit hole. Government is complex, particularly in America where there can be a vast multitude of government districts and offices at the local level.

How can we have a functioning democracy when we don’t even know the local government we belong to or who our democratically elected representatives are? It’s not that Americans are simply too ignorant or apathetic to know this information, it’s that the system of government really is complex. With what often seems like chaos on the national stage it can be easy to think of local government as simple, yet that’s rarely the case. There are about 35,000 municipal governments in the US, but when you count all the other local districts there are nearly 90,000 government bodies (US Census 2012) with a total of more than 500,000 elected officials (US Census 1992). The average American might struggle to name their representatives in Washington D.C., but that’s just the tip of the iceberg. They can easily belong to 15 government districts with more than 50 elected officials representing them.

We overlook the fact that it’s genuinely difficult to find information about all our levels of government. We unconsciously assume that this information is published on some government website well enough that we don’t need to include it as part of any kind of open data program

Yes, the number of subdivisions of government and the number of elected officials are drawn from two different census reports, the first from the 2012 census and the second from the 1992 census, a gap of twenty (20) years.

The Census bureau has the 1992 list, saying:

1992 (latest available) 1992 Census of Governments vol. I no. 2 [PDF, 2.45MB] * Report has been discontinued

Makes me curious why such a report would be discontinued?

A report that did not address the various agencies, offices, etc. that are also part of various levels of government.

Makes me think you need an “insider” and/or a specialist just to navigate the halls of government.

Philip’s post illustrates that “open data” dumps from government are distractions from more effective questions of open government.

Questions such as:

  • Which officials have authority over what questions?
  • How to effectively contact those officials?
  • What actions are under consideration now?
  • Rules and deadlines for comments on actions?
  • Hearing and decision calendars?
  • Comments and submissions by others?
  • etc.

It never really is “…the local board of education (substitute your favorite board) decided….” but “…member A, B, D, and F decided that….”

Transparency means not allowing people and their agendas to hide behind the veil of government.

From President Obama, The Opaque

Thursday, February 28th, 2013

Leaked BLM Draft May Hinder Public Access to Chemical Information

From the post:

On Feb. 8, EnergyWire released a leaked draft proposal from the U.S. Department of the Interior’s Bureau of Land Management on natural gas drilling and extraction on federal public lands. If finalized, the proposal could greatly reduce the public’s ability to protect our resources and communities. The new draft indicates a disappointing capitulation to industry recommendations.

The draft rule affects oil and natural gas drilling operations on the 700 million acres of public land administered by BLM, plus 56 million acres of Indian lands. This includes national forests, which are the sources of drinking water for tens of millions of Americans, national wildlife refuges, and national parks, which are widely used for recreation.

The Department of the Interior estimates that 90 percent of the 3,400 wells drilled each year on public and Indian lands use natural gas fracking, a process that pumps large amounts of water, sand, and toxic chemicals into gas wells at very high pressure to cause fissures in shale rock that contains methane gas. Fracking fluid is known to contain benzene (which causes cancer), toluene, and other harmful chemicals. Studies link fracking-related activities to contaminated groundwater, air pollution, and health problems in animals and humans.

If the leaked draft is finalized, the changes in chemical disclosure requirements would represent a major concession to the oil and gas industry. The rule would allow drilling companies to report the chemicals used in fracking to an industry-funded website, called FracFocus.org. Though the move by the federal government to require online disclosure is encouraging, the choice of FracFocus as the vehicle is problematic for many reasons.

First, the site is not subject to federal laws or oversight. The site is managed by the Ground Water Protection Council (GWPC) and the Interstate Oil and Gas Compact Commission (IOGCC), nonprofit intergovernmental organizations comprised of state agencies that promote oil and gas development. However, the site is paid for by the American Petroleum Institute and America’s Natural Gas Alliance, industry associations that represent the interests of member companies.

BLM would have little to no authority to ensure the quality and accuracy of the data reported directly to such a third-party website. Additionally, the data will not be accessible through the Freedom of Information Act since BLM is not collecting the information. The IOGCC has already declared that it is not subject to federal or state open records laws, despite its role in collecting government-mandated data.

Second, FracFocus.org makes it difficult for the public to use the data on wells and chemicals. The leaked BLM proposal fails to include any provisions to ensure minimum functionality on searching, sorting, downloading, or other mechanisms to make complex data more usable. Currently, the site only allows users to download PDF files of reports on fracked wells, which makes it very difficult to analyze data in a region or track chemical use. Despite some plans to improve searching on FracFocus.org, the oil and gas industry opposes making chemical data easier to download or evaluate for fear that the public “might misinterpret it or use it for political purposes.”

Don’t you feel safer? Knowing the oil and gas industry is working so hard to protect you from misinterpreting data?

Why the government is helping the oil and gas industry protect us from data I cannot say.

I mention this an example of testing for “transparency.”

Anything the government freely makes available with spreadsheet capabilities, isn’t transparency. It’s distraction.

Any data that the government tries to hide, that data has potential value.

The Center for Effective Government points out these are draft rules and when published, you need to comment.

Not a bad plan but not very reassuring given the current record of President Obama, the Opaque.

Alternatives? Suggestions for how data mining could expose those who own floors of the BLM, who drill the wells, etc?

Ocean Data Interoperability Platform (ODIP)

Tuesday, February 26th, 2013

Ocean Data Interoperability Platform (ODIP)

From the post:

The Ocean Data Interoperability Platform (ODIP) is a 3-year initiative (2013-2015) funded by the European Commission under the Seventh Framework Programme. It aims to contribute to the removal of barriers hindering the effective sharing of data across scientific domains and international boundaries.

ODIP brings together 11 organizations from United Kingdom, Italy, Belgium, The Netherlands, Greece and France with the objective to provide a forum to harmonise the diverse regional systems.

The First Workshop will take place from Monday 25 February 2013 to and including Thursday 28 February 2013. More information about the workshop at 1st ODIP Workshop.

From the workshop page, a listing of topics with links to further materials:

Gathering a snapshot of our present day semantic diversity is an extremely useful exercise. Whatever your ultimate choice for a “solution.”

G-8 International Conference on Open Data for Agriculture

Tuesday, February 19th, 2013

G-8 International Conference on Open Data for Agriculture

April 29-30, 2013 Washington, D.C.

Deadline for proposals: Midnight, February 28, 2013.

From the call for ideas:

Are you interested in addressing global challenges, such as food security, by providing open access to information? Would you like the opportunity to present to leaders from around the world?

We are seeking innovative products and ideas that demonstrate the potential of using open data to increase food security. This April 29-30th in Washington, D.C., the G-8 International Conference on Open Data for Agriculture will host policy makers, thought leaders, food security stakeholders, and data experts to build a strategy to share agriculture data and make innovation more accessible. As part of the conference, we are giving innovators a chance to showcase innovative uses of open data for food security in a lightning presentation or in the exhibit hall. This call for ideas is a chance to demonstrate the potential that open data can have in ensuring food security, and can inform an unprecedented global collaboration. Visit data.gov to see what agricultural data is already available and connect to other G-8 open data sites!

We are seeking top innovators to show the world what can be done with open data through:

  • Lightning Presentations: brief (3-5 minute), image rich presentations intended to convey an idea
  • Exhibit Hall: an opportunity to convey an idea through an image-rich exhibit.

Presentations should inspire others to share their data or imagine how open data could be used to increase food security. Presentations may include existing, new, or proposed applications of open data and should meet one or more of the following criteria:

  • Demonstrate the impact of open data on food security.
  • Demonstrate the impact of access to agriculturally-relevant data on developed and/or developing countries.
  • Demonstrate the impact of bringing multiple sources of agriculturally-relevant public and/or private open data together (think about the creation of an agriculture equivalent of weather.com)

For those with a new idea, we invite you to submit your proposal to present it to leading experts in food security, technology and data innovation. Proposals should identify which data is needed that is publicly available, for free, on the internet. Proposals must also include a design of the application including relevance to the target audience and plans for beta testing. A successful prototype will be mobile, interactive, and scalable. Proposals to showcase existing products or pitch new ideas will be reviewed by a global panel of technical experts from the G-8 countries.

Short notice but from the submission form on the website, you only get 75-100 words to summarize your proposal.

Hell, I have trouble identifying myself in 75-100 words. ;-)

Still, if you are in D.C. and interested, it could be a good way to meet people in this area.

The nine flags for the G-8 are confusing at first. Not an example of government committee counting. The EU has a representative at G-8 meetings.

I first saw this at: Open Call to Innovators: Apply to present at G-8 International Conference on Open Data for Agriculture.

Competition: visualise open government data and win $2,000

Wednesday, February 13th, 2013

Competition: visualise open government data and win $2,000 by Simon Rogers.

Closing date: 23:59 BST on 2 April 2013

What can you do with the thousands of open government datasets? With Google and Open Knowledge Foundation we are launching a competition to find the best dataviz out there. You might even win a prize.

(graphic omitted)

Governments around the world are releasing a tidal wave of open data – on everything from spending through to crime and health. Now you can compare national, regional and city-wide data from hundreds of locations around the world.

But how good is this data? We want to see what you can do with it. What apps and visualisations can you make with this data? We want to see how the data changes the way you see the world.

In conjunction with Google and the Open Knowledge Foundation (who will be helping us judge the results), see if you can win the $2,000 prize.

All we want you to do is to take an open dataset from any government open data website (there’s a list of them at the bottom of this article) and visualise it.

The competition is open to citizens of the UK, US, France, Germany, Spain, Netherlands, Sweden. The winner will take home $2,000 and the result will be published on the Guardian Datastore on our Show and Tell site.

Here are some of the key datasets we’ve found (list below) – and feel free to bring your own data to the party – we only ask that it is freely available and open as in OpenDefinition.org.

You are visualizing data anyway, why not take a chance on free PR and $2,000?

Alpha.data.gov: From Open Data Provider to Open Data Hub

Saturday, February 2nd, 2013

Alpha.data.gov: From Open Data Provider to Open Data Hub by Andrea Di Maio.

From the post:

Those who happen to read my blog know that I am rather cynical about many enthusiastic pronouncements around open data. One of the points I keep banging on is that the most common perspective is that open data is just something that governments ought to publish for businesses and citizens to use it. This perspective misses both the importance of open data created elsewhere – such as by businesses or by people in social networks – and the impact of its use inside government. Also, there is a basic confusion between open and public data: not all open data is public and not all public data may be open (although they should, in the long run).

In this respect the new experimental site alpha.data.gov is a breath of fresh air. Announced in a recent post on the White House blog, it does not contain data, but explains which categories of open data can be used for which sort of purposes.

A step in the right direction.

Simply gathering the relevant data sets for any given project is a project in and of itself.

Followed by documenting the semantics of the relevant data sets.

Data hubs are a precursor to collections of semantic documentation for data found at data hubs.

You know what should follow from collections of semantic documentation. ;-) (Can you say topic maps?)

Open Data Protocols, DCIP [A Topic Map Song]

Thursday, January 31st, 2013

Open Data Protocols, DCIP

From the post:

Have you ever heard about Data Catalog Interoperability Protocol (DCIP)? DCIP is a specification designed to facilitate interoperability between data catalogs published on the Web by defining:

  • a JSON and RDF representation for key data catalog entities such asDataset (DatasetRecord) and Resource (Distribution)based on the DCAT vocabulary
  • a read only REST based protocol for achieving basic catalog interoperability

Data Catalog Interoperability Protocol (DCIP) v0.2 discusses each of the above and provides examples. The approach is designed to be a pragmatic and easily implementable. It merges existing work on DCAT with the real-life experiences of “harvesting” in various projects.

To know more about DCIP, you can visit the  Open Data Protocols  website, which aims to make easier to develop tools and services for working with data, and, to ensure greater interoperability between new and existing tools and services.

The news of new formats, protocols and the like are music to topic map ears.

The refrain is: “cha-ching, cha-ching, cha-ching!”

;-)

Only partially in jest.

Every time a new format (read set of subjects) is developed for the encoding of data (another set of subjects), it is be definition different from all that came before.

With good reason. Every sentient being on the planet will be flocking to format/protocol X for all their data.

Well, except that flocking is more like a trickle for most new formats. Particularly when compared to the historical record of formats.

In theory topic maps are an exception to that rule, except that when you map specific instances of other data formats, you have committed yourself to a particular set of mappings.

Still, that’s better than rip-and-replace or ETL processing of data. It maintains backwards compatibility with existing systems while anticipating future systems.

Saturday 23rd February is Open Data Day 2013!

Thursday, January 31st, 2013

Saturday 23rd February is Open Data Day 2013! from AIMS.

From the post:

Open Data Day is a gathering of citizens in cities around the world to write applications, liberate data, create visualizations and publish analyses using open public data to show support for and encourage the adoption of open data policies by the world’s local, regional and national governments. There are Open Data Day events taking place all around the world.

Are you are planning to organize or participate in one of these events? Are you going to launch new open data catalogs on the Open Data Day? Share with us your plans and highlight events that might be of interest for the agricultural information management community.

Know more at http://opendataday.org/

As of today: 52 events.

Anyone interested in a virtual event on Open Data Day using open data and topic maps?

O’Reilly’s Open Government book ["...more equal than others" pigs]

Monday, January 21st, 2013

We’re releasing the files for O’Reilly’s Open Government book by Laurel Ruma.

From the post:

I’ve read many eloquent eulogies from people who knew Aaron Swartz better than I did, but he was also a Foo and contributor to Open Government. So, we’re doing our part at O’Reilly Media to honor Aaron by posting the Open Government book files for free for anyone to download, read and share.

The files are posted on the O’Reilly Media GitHub account as PDF, Mobi, and EPUB files for now. There is a movement on the Internet (#PDFtribute) to memorialize Aaron by posting research and other material for the world to access, and we’re glad to be able to do this.

You can find the book here: github.com/oreillymedia/open_government

Daniel Lathrop, my co-editor on Open Government, says “I think this is an important way to remember Aaron and everything he has done for the world.” We at O’Reilly echo Daniel’s sentiment.

Be sure to read Chapter 25, “When Is Transparency Useful?”, by the late Aaron Swartz.

It includes this passage:

…When you create a regulatory agency, you put together a group of people whose job is to solve some problem. They’re given the power to investigate who’s breaking the law and the authority to punish them. Transparency, on the other hand, simply shifts the work from the government to the average citizen, who has neither the time nor the ability to investigate these questions in any detail, let alone do anything about it. It’s a farce: a way for Congress to look like it has done something on some pressing issue without actually endangering its corporate sponsors.

As a tribute to Aaron, are you going to dump data on the WWW or enable the calling of “more equal than others” pigs to account?

OpenAIRE Study

Sunday, January 20th, 2013

Implementing Open Access Mandates in Europe: OpenAIRE Study by Thembani Malapela.

From the webpage:

Implementing Open Access Mandates in Europe : OpenAIRE Study on the Development of Open Access Repository Communities in Europe is the title of a recent book authored by Birgit Schmidt and Iryna Kuchma. The book highlights the existing open access policies in Europe and provides an overview of publisher’s self archiving policies and it further gives strategies for policy implementation. Such strategies include both institutional and national – which have been used in implementing open access policy mandates. This work provides a unique overview of national awareness of open access in 32 European countries involving all EU member states and in addition, Norway, Iceland, Croatia, Switzerland and Turkey.

What makes this book an interesting read is that it taps into activities implemented through OpenAIRE project and related repository projects by other stakeholders in Europe. Despite its extensive coverage on the implementation of Open Access Mandates in the region, the authors acknowledge, “the main issues that still need to be resolved in the coming years include the effective promotion of open access among research communities and support in copyright management for researchers and research institutions as well as intermediaries such as libraries and repositories”.

The more data that becomes “open,” the greater the semantic diversity you will find.

Important to follow the discussion as you prepare to map more and more information into your topic map.

Could Governments Run Out of Patience with Open Data? [Semantic Web?]

Saturday, January 19th, 2013

Could Governments Run Out of Patience with Open Data? by Andrea Di Maio.

From the post:

Yesterday I had yet another client conversation – this time with a mid-size municipality in the north of Europe – on the topic of the economic value generated through open data. The problem we discussed is the same I highlighted in a post last year: nobody argues the potential long term value of open data but it may be difficult to maintain a momentum (and to spend time, money and management bandwidth) on something that will come to fruition in the more distant future, while more urgent problems need to be solved now, under growing budget constraints.

Faith is not enough, nor are the many examples that open data evangelists keep sharing to demonstrate value. Open data must help solve today’s problems too, in order to gain the credibility and the support required to realize future economic value.

While many agree that open data can contribute to shorter term goals, such as improving inter-agency transparency and data exchange or engaging citizens on solving concrete problems, making this happen in a more systematic way requires a change of emphasis and a change of leadership.

Emphasis must be on directing efforts – be they idea collections, citizen-.developed dashboards or mobile apps – onto specific, concrete problems that government organizations need to solve. One might argue that this is not dissimilar from having citizens offer perspectives on how they see existing issues and related solutions. But there is an important difference: what usually happens is that citizens and other stakeholders are free to use whichever data they want to use. The required change is to entice them to help governments solve problems the way governments see them. In other terms, whereas citizens would clearly remain free to come up with whichever use of any open data they deem important, they should get incentives, awards, prizes only for those uses that meet clear government requirements. Citizens would be at the service of government rather than the other way around. For those who might be worried that this advocates for an unacceptable change of responsibility and that governments are at the service of citizens and not the other way around, what I mean is that citizens should help governments serve them.

The Semantic Link [ODI Drug Example?]

Saturday, January 5th, 2013

The Semantic Link

Archive of the Semantic Link podcasts.

Semantic Link is a monthly podcast on Semantic Technologies from Semanticweb.com.

In December of 2012, Nigel Shadbolt, chairman and co-founder of ODI (Open Data Institute) is a special guest.

Nigel offers an odd example of the value of open data. See what you think:

The prescriptions, but not for who, written by all physicians, are made public. A start-up company noticed that many prescribed drugs were “off-license” (generic to use the U.S. terminology) but doctors were still prescribing the brand name drug.

Reported savings of 200 million £ in one drug area.

That success isn’t a function of having “open data” but having an intelligent person review the data. Whether open or not.

I can assure you my drug company knows the precise day when it anticipates a generic version of a drug will become available. ;-)

Majuro.JS [Useful Open Data]

Thursday, December 27th, 2012

Majuro.JS by Nick Doiron.

From the homepage:

Majuro.JS helps you make detailed, interactive maps with open buildings data.

Great examples on the homepage but I prefer the explanation at Github.

This is wicked cool!

This type of open data I can see as the basis for “innovation.”

Resulting in a target for rich annotation by a topic map based application.

Educated Guesses Decorated With Numbers

Wednesday, December 26th, 2012

Researchers Say Much to Be Learned from Chicago’s Open Data by Sam Cholke.

From the post:

HYDE PARK — Chicago is a vain metropolis, publishing every minute detail about the movement of its buses and every little skirmish in its neighborhoods. A team of researchers at the University of Chicago is taking that flood of data and using it to understand and improve the city.

“Right now we have more data than we’re able to make use of — that’s one of our motivations,” said Charlie Catlett, director of the new Urban Center for Computation and Data at the University of Chicago.

Over the past two years the city has unleashed a torrent of data about bus schedules, neighborhood crimes, 311 calls and other information. Residents have put it to use, but Catlett wants his team of computational experts to get a crack at it.

“Most of what is happening with public data now is interesting, but it’s people building apps to visualize the data,” said Catlett, a computer scientist at the university and Argonne National Laboratory.

Catlett and a collection of doctors, urban planners and social scientists want to analyze that data so to solve urban planning puzzles in some of Chicago’s most distressed neighborhoods and eliminate the old method of trial and error.

“Right now we look around and look for examples where something has worked or appeared to work,” said Keith Besserud, an architect at Skidmore, Owings and Merrill's Blackbox Studio and part of the new center. “We live in a city, so we think we understand it, but it’s really not seeing the forest for the trees, we really don’t understand it.”

Besserud said urban planners have theories but lack evidence to know for sure when greater density could improve a neighborhood, how increased access to public transportation could reduce unemployment and other fundamental questions.

“We’re going to try to break down some of the really tough problems we’ve never been able to solve,” Besserud said. “The issue in general is the field of urban design has been inadequately served by computational tools.”

In the past, policy makers would make educated guesses. Catlett hopes the work of the center will better predict such needs using computer models, and the data is only now available to answer some fundamental questions about cities.

…(emphasis added)

Some city services may be improved by increased data, such as staging ambulances near high density shooting locations based upon past experience.

That isn’t the same as “planning” to reduce the incidence of unemployment or crime by urban planning.

If you doubt that statement, consider the vast sums of economic data available for the past century.

Despite that array of data, there are no universally acclaimed “truths” or “policies” for economic planning.

The temptation to say “more data,” “better data,” “better integration of data,” etc. will solve problem X is ever present.

Avoid disappointing your topic map customers.

Make sure a problem is one data can help solve before treating it like one.

I first saw this in a tweet by Tim O’Reilly.

Open Data Is Not for Sprinters [Or Open Data As Religion]

Wednesday, November 21st, 2012

Open Data Is Not for Sprinters by Andrea Di Maio.

Andrea’s comment on the UK special envoy who was “disappointed” with open data usage was to point out that government should be making better internal use of open data, to justify the open data programs.

His view was challenged by a member of an audience who said:

open data is for the sake of economic development and transparency, not for internal use.

Andrea’s response:

I do not disagree of course. All I am saying, and I have been saying for a while now, is that to realize this vision will take quite some time. Indeed more data must be available, of higher quality and timeliness; more entrepreneurs or “appreneurs” must be lured to extract value for businesses and the public at large from this data; and we need a stream of examples across sectors and regions to show that value can be generated everywhere.

A more direct answer would be to point out that statements like:

Opening up data is fundamentally about more efficient use of resources and improving service delivery for citizens. The effects of that are far reaching: innovation, transparency, accountability, better governance and economic growth. (Sir Tim Berners-Lee: Raw data, now!)

Are religious dogma. Useful if you want to run your enterprise or government based on religious dogma but you may as well use a Ouija board.

The astronomy community which has a history of “open data” that spans decades. I find the data very interesting and it has lead to discoveries in astronomy, but economic development?

The biological community apparently has a competition to see who can make more useful data available than the next lab. And it leads to better research, discoveries and innovation, but economic development?

The same holds true for the chemical community and numerous others.

The point being that claims such as “open data leads to economic development” are sure to disappoint.

Some open data might, but that is a question of research and proof, not mere cant.

A government, for example, could practice open data with regard to its tax policies and how it decides to audit taxpayers. I am sure startups would quickly take up the task of using that data to advise clients on how to avoid audits. (They are called tax advisors now.)

Or a government could practice open data on the White House visitor list and include non-tour visitors, some of them, in the thousands who visit every day. It’s “open data,” just not useful data. And not data that is likely to lead to economic development or transparency.

Governments should practice open data but with an eye towards selecting data that is likely to lead to economic development, innovation, etc. By tracking the use of “open data” now, governments can make rational decisions about what data to “open” in the future.

… ‘disappointed’ with open data use

Tuesday, November 20th, 2012

Prime minister’s special envoy ‘disappointed’ with open data use by Derek du Preez.

From the post:

Prime Minister David Cameron’s special envoy on the UN’s post-2015 development goals has said that he is ‘disappointed’ by how much the government’s open datasets have been used so far.

Speaking at a Reform event in London this week on open government and data transparency, Anderson said he recognises that the public sector needs to improve the way it pushes out the data so that it is easier to use.

“I am going to be really honest with you. As an official in a government department that has worked really hard to get a lot of data out in the last two years, I have been pretty disappointed by how much it has been used,” he said.

Easier to use data is one issue.

But the expectation that effort making data open = people interested in using it is another.

The article later reports there are 9,000 datasets available at data.gov.uk.

How relevant to every day concerns are those 9,000 datasets?

When the government starts disclosing the financial relationships between members of government, their families and contributors, I suspect interest in open data will go up.

Socrata Open Data Server, Community Edition

Wednesday, November 14th, 2012

Socrata Open Data Server, Community Edition by Saf Rabah.

From the post:

Socrata, the leading provider of cloud-based open data systems, today announced the “Socrata Open Data Server, Community Edition,” to be offered as an open source reference implementation for open data standards. Designed expressly to promote data portability throughout the open data ecosystem, and support open source software policies in public organizations around the globe, the “Socrata Open Data Server, Community Edition” will be released in the first quarter of 2013, as freely downloadable open source software and fully integrated with other components of the company’s commercial software products.

To learn more about the proposed open data standards, or to get involved in this community effort, please visit http://open-data-standards.github.com.

Looking forward to the release!

Resources on open data and various standards efforts related to the same.

Even with extensions, DCAT (Data Catalog Vocabulary) is going to leave a lot of room for mapping semantics between data sets.

Open Data vs. Private Data?

Tuesday, October 23rd, 2012

Why Government Should Care Less About Open Data and More About Data by Andrea Di Maio.

From the post:

Among the flurry of activities and deja-vu around open data that governments worldwide, in all tiers are pursuing to increase transparency and fuel a data economy, I found something really worth reading in a report that was recently published by the Danish government.

Good Basic Data for Everyone – A Driver for Growth and Efficiency” takes a different spin than many others by saying that:

Basic data is the core information authorities use in their day-to-day case processing. Basic data is e.g. data on individuals, businesses, properties, , addresses and geography. This information, called basic data, is reused throughout the public sector. Reuse of high-quality data is an essential basis for public authorities to perform their tasks properly and efficiently. Basic data can include personal data.

While most of the categories are open data, the novelty is that for the first time personal and open data is seen for what it is, i.e. data. The document suggests the development of a Data Distributor, which would be responsible for conveying data from different data to its consumers, both inside and outside government. The document also assumes that personal data may be ultimately distributed via a common public-sector data distributor.

Besides what is actually written in the document, this opens the door for a much needed shift from service orientation to data orientation in government service delivery. Stating that data must flow freely across organizational boundaries, irrespective of the type of data (and of course within appropriate policy constraints) is hugely important to lay the foundations for effective integration of services and processes across agencies, jurisdictions, tiers and constituencies.

Combining this with some premises of the US Digital Strategy, which highlights an information layer distinct from a platform layer, which is in turn distinct from a presentation layer, one starts seeing a move toward the centrality of data, which may finally emerge to the emergence of citizen data stores that would put control of service access and integration in the hand of individuals.

If there is novelty in the Danish approach, it is from being “open data.” That is all citizens can draw equally on the “basic data” for whatever purpose.

Property records, geographic, geological and other maps, plus addresses were combined long ago in the United States as “private data.”

Despite being collected at taxpayer expense, private industry sells access to collated public data.

Open data may provide businesses with collated public data at a lower cost, but as an expense to the public.

What is know as a false dilemma: We can buy back data government collected on our behalf or we can pay government to collect and collate it for the few.


The “individual being in charge of their data” is too obvious a fiction to delay us here. Isn’t true now, no signs it will become true. If you doubt that, restrict the distribution of your credit report. Post a note when you accomplish that task.

Code for America: open data and hacking the government

Tuesday, October 9th, 2012

Code for America: open data and hacking the government by Rachel Perkins.

From the post:

Last week, I attended the Code for America Summit here in San Francisco. I attended as a representative of Splunk>4Good (we sponsored the event via a nice outdoor patio lounge area and gave away some of our (in)famous tshirts and a few ponies). Since this wasn’t your typical “conference”, and I’m not so great at schmoozing, i was a little nervous–what would Christy Wilson, Clint Sharp, and I do there? As it turned out, there were so many amazing takeaways and so much potential for awesomeness that my nervousness was totally unfounded.

So what is Code for America?

Code for America is a program that sends technologists (who take a year off and apply to their Fellowship program) to cities throughout the US to work with advocates in city government. When they arrive, they spend a few weeks touring the city and its outskirts, meeting residents, getting to know the area and its issues, and brainstorming about how the city can harness its public data to improve things. Then they begin to hack.
Some of these partnerships have come up with amazing tools–for example,

  • Opencounter Santa Cruz mashes up several public datasets to provide tactical and strategic information for persons looking to start a small business: what forms and permits you’ll need, zoning maps with overlays of information about other businesses in the area, and then partners with http://codeforamerica.github.com/sitemybiz/ to help you find commercial space for rent that matches your zoning requirements.
  • Another Code for America Fellow created blightstatus.org, which uses public data in New Orleans to inform residents about the status and plans for blighted properties in their area.
  • Other apps from other cities do cool things like help city maintenance workers prioritize repairs of broken streetlights based on other public data like crime reports in the area, time of day the light was broken, and number of other broken lights in the vicinity, or get the citizenry involved with civic data, government, and each other by setting up a Stack Exchange type of site to ask and answer common questions.

Whatever your view data sharing by the government, too little, too much, just right, Rachel points to good things can come from open data.

Splunk has a “corporate responsibility program: Splunk>4Good.

Check it out!

BTW, do you have a topic maps “corporate responsibility” program?

New Army Guide to Open-Source Intelligence

Sunday, September 16th, 2012

New Army Guide to Open-Source Intelligence

If you don’t know Full Text Reports, you should.

A top-tier research professional’s hand-picked selection of documents from academe, corporations, government agencies, interest groups, NGOs, professional societies, research institutes, think tanks, trade associations, and more.

You will winnow some chaff but also find jewels like Open Source Intelligence (PDF).

From the post:

  • Provides fundamental principles and terminology for Army units that conduct OSINT exploitation.
  • Discusses tactics, techniques, and procedures (TTP) for Army units that conduct OSINT exploitation.
  • Provides a catalyst for renewing and emphasizing Army awareness of the value of publicly available information and open sources.
  • Establishes a common understanding of OSINT.
  • Develops systematic approaches to plan, prepare, collect, and produce intelligence from publicly available information from open sources.

Impressive intelligence overview materials.

Would be nice to re-work into a topic map intelligence approach document with the ability to insert a client’s name and industry specific examples. Has that militaristic tone that is hard to capture with civilian writers.