Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 6, 2014

Open Census Data (UK)

Filed under: Census Data,Open Data — Patrick Durusau @ 4:12 pm

Open Census Data

From the post:

First off, congratulations to Jeni Tennison OBE and Keith Dugmore MBE on their gongs for services to Open Data. As we release our Census Data as Open Data it is worth remembering how ‘bad’ things were before Keith’s tireless campaign for Open Census data. Young data whippersnappers may not believe this, but when I first started working with Census data a corporate license for the ED boundaries (just the boundaries, no actual flippin’ data) was £80,000. In the late 90′s a simple census reporting tool in a GIS usually involved license fees of more than £250K. Today using QGIS, POSTGIS, opendata and a bit of imagination you could have such a thing for £0K license costs

Talking of Census data, we’ve released our full UK census data pack today as Open Data. You can access it here. http://www.geolytix.co.uk/geodata/census.

Good news on all fronts!

However, I am waiting for “open data” to trickle down to the drafts of agency budgets and details of purchases and other expenditures with the payees being identified.

With that data you could draw boundaries around the individuals and groups favored by an agency.

I don’t know what the results would be in the UK but I would wager considerable sums on the results if applied to in Washington, D.C.

You would find out where retirees from federal “service” go when they retire. (Hint, it’s not Florida.)

December 29, 2013

Data Analytic Recidivism Tool (DART) [DAFT?]

Filed under: Government Data,Open Data,Open Government — Patrick Durusau @ 2:39 pm

Data Analytic Recidivism Tool (DART)

From the website:

The Data Analytic Recidivism Tool (DART) helps answer questions about recidivism in New York City.

  • Are people that commit a certain type of crime more likely to be re-arrested?
  • What about people in a certain age group or those with prior convictions?

DART lets users look at recidivism rates for selected groups defined by characteristics of defendants and their cases.

A direct link to the DART homepage.

After looking at the interface, which groups recidivists in groups of 250, I’m not sure DART is all that useful.

It did spark an idea that might help with the federal government’s acquisition problems.

Why not create the equivalent of DART but call it:

Data Analytic Failure Tool (DAFT).

And in DAFT track federal contractors, their principals, contracts, and the program officers who play any role in those contracts.

So that when contractors fail, as so many of them do, it will be easy to track the individuals involved on both sides of the failure.

And every contract will have a preamble that recites any prior history of failure and the people involved in that failure, on all sides.

Such that any subsequent supervisor has to sign off with full knowledge of the prior lack of performance.

If criminal recidivism is to be avoided, shouldn’t failure recidivism be avoided as well?

December 19, 2013

UNESCO Open Access Publications [Update]

Filed under: Data,Government,Government Data,Open Data — Patrick Durusau @ 7:22 pm

UNESCO Open Access Publications

From the webpage:

Building peaceful, democratic and inclusive knowledge societies across the world is at the heart of UNESCO’s mandate. Universal access to information is one of the fundamental conditions to achieve global knowledge societies. This condition is not a reality in all regions of the world.

In order to help reduce the gap between industrialized countries and those in the emerging economy, UNESCO has decided to adopt an Open Access Policy for its publications by making use of a new dimension of knowledge sharing – Open Access.

Open Access means free access to scientific information and unrestricted use of electronic data for everyone. With Open Access, expensive prices and copyrights will no longer be obstacles to the dissemination of knowledge. Everyone is free to add information, modify contents, translate texts into other languages, and disseminate an entire electronic publication.

For UNESCO, adopting an Open Access Policy means to make thousands of its publications freely available to the public. Furthermore, Open Access is also a way to provide the public with an insight into the work of the Organization so that everyone is able to discover and share what UNESCO is doing.

You can access and use our resources for free by clicking here.

In May of 2013 UNESCO announced its Open Access policy.

Many organizations profess a belief in “Open Access.”

The real test is whether they practice “Open Access.”

DataViva

Filed under: Government Data,Open Data — Patrick Durusau @ 4:00 pm

DataViva

I don’t know enough about the Brazilian economy to say if the visualizations are helpful or not.

What I can tell you is the visualizations are impressive!

Thoughts on the site as an interface to open data?

PS: This appears to be a government supported website so not all government sponsored websites are poor performers.

December 3, 2013

Announcing Open LEIs:…

Filed under: Business Intelligence,Identifiers,Open Data — Patrick Durusau @ 11:04 am

Announcing Open LEIs: a user-friendly interface to the Legal Entity Identifier system

From the post:

Today, OpenCorporates announces a new sister website, Open LEIs, a user-friendly interface on the emerging Global Legal Entity Identifier System.

At this point many, possibly most, of you will be wondering: what on earth is the Global Legal Entity Identifier System? And that’s one of the reasons why we built Open LEIs.

The Global Legal Entity Identifier System (aka the LEI system, or GLEIS) is a G20/Financial Stability Board-driven initiative to solve the issues of identifiers in the financial markets. As we’ve explained in the past, there are a number of identifiers out there, nearly all of them proprietary, and all of them with quality issues (specifically not mapping one-to-one with legal entities). Sometimes just company names are used, which are particularly bad identifiers, as not only can they be represented in many ways, they frequently change, and are even reused between different entities.

This problem is particularly acute in the financial markets, meaning that regulators, banks, market participants often don’t know who they are dealing with, affecting everything from the ability to process trades automatically to performing credit calculations to understanding systematic risk.

The LEI system aims to solve this problem, by providing permanent, IP-free, unique identifiers for all entities participating in the financial markets (not just companies but also municipalities who issue bonds, for example, and mutual funds whose legal status is a little greyer than companies).

The post cites five key features for Open LEIs:

  1. Search on names (despite slight misspellings) and addresses
  2. Browse the entire (100,000 record) database and/or filter by country, legal form, or the registering body
  3. A permanent URL for each LEI
  4. Links to OpenCorporate for additional data
  5. Data is available as XML or JSON

As the post points out, the data isn’t complete but dragging legal entities out into the light is never easy.

Use this resource and support it if you are interested in more and not less financial transparency.

November 26, 2013

BBC throws weight behind open data movement

Filed under: News,Open Data — Patrick Durusau @ 5:24 pm

BBC throws weight behind open data movement by Sophie Curtis.

From the post:

The BBC has signed Memoranda of Understanding (MoUs) with the Europeana Foundation, the Open Data Institute, the Open Knowledge Foundation and the Mozilla Foundation, supporting free and open internet technologies.

The agreements will enable closer collaboration between the BBC and each of the four organisations on a range of mutual interests, including the release of structured open data and the use of open standards in web development, according to the BBC.

One aim of the agreement is to give clear technical standards and models to organisations who want to work with the BBC, and give those using the internet a deeper understanding of the technologies involved.

The MoUs also bring together several existing areas of research and provide a framework to explore future opportunities. Through this and other initiatives, the BBC hopes to become a catalyst for open innovation by publishing clear technical standards, models, expertise and – where feasible – data.

That’s good news!

I think.

I looked in Sophie’s story for links to the four Memoranda of Understanding (MoUs) but could not find them.

So I visited the press releases from the various participants:

BBC: BBC signs Memorandums of Understanding with open internet organisations

Europeana Foundation: Press Releases (No news posted on the BBC, as of 17:00 EST, 26 November 2013).

Mozilla Foundation: Press Releases (No news posted on the BBC, as of 17:00 EST, 26 November 2013).

Open Data Institute: BBC signs open data agreement with ODI…

Open Knowledge Foundation: BBC signs Memorandum of Understanding with the Open Knowledge Foundation

Five out of five, no Memoranda of Understanding (MoUs), at least not in their news releases.

It seems inconsistent to have open data “Memoranda of Understanding (MoUs)” that aren’t themselves “open data.”

For all I know the BBC may be about to mail everyone a copy of them, but the logical place to release the memoranda would be with the signing news.

Yes?

Please make a comment if I have overlooked the public posting of these “Memoranda of Understanding (MoUs).”

Thanks!

November 7, 2013

Creating Knowledge out of Interlinked Data…

Filed under: Linked Data,LOD,Open Data,Semantic Web — Patrick Durusau @ 6:55 pm

Creating Knowledge out of Interlinked Data – STATISTICAL OFFICE WORKBENCH by Bert Van Nuffelen and Karel Kremer.

From the slides:

LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany.

LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment.

LOD2 Stack Release 3.0 overview

Connecting the dots: Workbench for Statistical Office

In case you are interested:

LOD2 homepage

Ubuntu 12.04 Repository

VM User / Password: lod2demo / lod2demo

LOD2 blog

The LOD2 project expires in August of 2014.

Linked Data is going to be around, one way or the other, for quite some time.

My suggestion: Grab the last VM from LOD2 and a copy of its OS, store in a location that migrates as data systems change.

November 4, 2013

Open Data Index

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 7:48 pm

Open Data Index by Armin Grossenbacher.

From the post:

There are lots of indexes.

The most famous one may be the Index Librorum Prohibitorum listing books prohibited by the cathoilic church. It contained eminent scientists and intellectuals (see the list in Wikipedia) and was abolished after more than 400 years in 1966 only.

Open Data Index

One index everybody would like to be registered in and this with a high rank is the Open Data Index.

‘An increasing number of governments have committed to open up data, but how much key information is actually being released? …. Which countries are the most advanced and which are lagging in relation to open data? The Open Data Index has been developed to help answer such questions by collecting and presenting information on the state of open data around the world – to ignite discussions between citizens and governments.’

I haven’t seen the movie review guide that appeared in Our Sunday Visitor in years but when I was in high school it was the best movie guide around. Just pick the ones rated as morally condemned. 😉

There are two criteria I don’t see mentioned for rating open data:

  1. How easy/hard is it to integrate a particular data set with other data from the same source or organization?
  2. Is the data supportive, neutral or negative with regard to established government policies?

Do you know of any open data sets where those questions are used to rate them?

September 22, 2013

India…1,745 datasets for agriculture

Filed under: Agriculture,Data,Open Data — Patrick Durusau @ 2:09 pm

Open Data Portal India launched: Already 1,745 datasets for agriculture

From the post:

The Government of India has launched its open Data Portal India (data.gov.in), a portal for the public to access and use datasets and applications provided by ministries and departments of the Government of India.

Aim: “To increase transparency in the functioning of Government and also open avenues for many more innovative uses of Government Data to give different perspective.” (“About portal,” data.gov.in)

The story goes on to report there are more than 4,000 data sets from over 51 offices. An adviser to the prime minister of India is hopeful there will be more than 10,000 data sets in six months.

Not quite as much fun as the IMDB, but on the other hand, the data is more likely to be of interest to business types.

August 30, 2013

OpenAGRIS 0.9 released:…

Filed under: Agriculture,Data Mining,Linked Data,Open Data — Patrick Durusau @ 7:25 pm

OpenAGRIS 0.9 released: new functionalities, resources & look by Fabrizio Celli.

From the post:

The AGRIS team has released OpenAGRIS 0.9, a new version of the Web application that aggregates information from different Web sources to expand the AGRIS knowledge, providing as much data as possible about a topic or a bibliographical resource within the agricultural domain.

OpenAGRIS 0.9 contains new functionalities and resources, and received a new interface in English and Spanish, with French, Arabic, Chinese and Russian translations on their way.

Mission: To make information on agricultural research globally available, interlinked with other data resources (e.g. DBPedia, World Bank, Geopolitical Ontology, FAO fisheries dataset, AGRIS serials dataset etc.) following Linked Open Data principles, allowing users to access the full text of a publication and all the information the Web holds about a specific research area in the agricultural domain (1).

Curious what agricultural experts make of this resource?

As of today, the site claims 5,076,594 records. And with all the triple bulking up, some 134,276,804 triples based on those records.

What, roughly # of records * 26 for the number of triples?

Which is no mean feat but I wonder about the granularity of the information being offered?

That is how useful is it to find 10,000 resources when each will take an hour to read?

More granular retrieval, that is far below the level of a file or document, is going to be necessary to avoid repetition of human data mining.

Repetitive human data mining being one of the earmarks of today’s search technology.

July 23, 2013

Sites and Services for Accessing Data

Filed under: Data,Open Data — Patrick Durusau @ 2:21 pm

Sites and Services for Accessing Data by Andy Kirk.

From the site:

This collection presents the key sites that provide data, whether through curated collections, offering access under the Open Data movement or through Software/Data-as-a-Service platforms. Please note, I may not have personally used all the services, sites or tools presented but have seen sufficient evidence of their value from other sources. Also, to avoid re-inventing the wheel, descriptive text may have been reproduced from the native websites for many resources.

You will see there is clearly a certain bias towards US and UK based sites and services. This is simply because they are the most visible, most talked about, most shared and/or useful resources on my radar. I will keep updating this site to include as many other finds and suggestions as possible, extending (ideally) around the world.

I count ninety-nine (99) resources.

A well organized listing but like many other listings, you have to explore each resource to discover its contents.

A mapping of resources across collections would be far more useful.

July 15, 2013

Better Corporate Data!

Filed under: Open Data,Open Government — Patrick Durusau @ 2:43 pm

Announcing open corporate network data: not just good, but better

OpenCorporates announces three projects:

1. An open data corporate network platform

The most important part is a new platform for collecting, collating and allowing access to different types of corporate relationship data – subsidiary data, parent company data, and shareholding data. This means that governments around the world (and companies too) can publish corporate network data and they can be combined in a single open-data repository, for a more complete picture. We think this is a game-changer, as it not only allows seamless, lightweight co-operation, but will identify errors and contradictions. We’ll be blogging about the platform in more details over the coming weeks, but it’s been a genuinely hard computer-science problem that has resulted in some really innovative work.

2. Three key initial datasets

(…)

The shareholder data from the New Zealand company register, for example, is granular and up to date, and if you have API access is available as data. It talks about parental control, often to very granular data, and importing this data allows you to see not just shareholders (which you can also see on the NZ Companies House pages) but also what companies are owned by another company (which you can’t). And it’s throwing up some interesting examples, of which more in a later blog post.

The data from the Federal Reserve’s National Information Center is also fairly up to date, but is (for the biggest banks) locked away in horrendous PDFs and talks about companies controlled by other companies.

The data from the 10-K and 20-F filings from the US Securities and Exchange Commission is the most problematic of all, being published once a year, as arbitrary text (pretty shocking in the 21st century for this still to be the case), and talks about ‘significant subsidiaries’.

(…)

3. An example of the power of this dataset.

We think just pulling the data together as open data is pretty cool, and that many of the best uses will come from other users (we’re going to include the data in the next version of our API in a couple of weeks). But we’ve built in some network visualisations to allow the information to be explored. Check out Barclays Bank PLC, Pearson PLC, The Gap or Starbucks.

OpenCorporates is engineering the critical move between “open data,” ho-hum, to “corporate visibility using open data.”

Not quite to the point of “accountability” but you have to identity evil doers before they can be held accountable.

A project that merits your interest, donations and support. Please pass this on. Thanks!

July 10, 2013

Data Sharing and Management Snafu in 3 Short Acts

Filed under: Archives,Astroinformatics,Open Access,Open Data — Patrick Durusau @ 1:43 pm

As you may suspect, my concerns are focused on the preservation of the semantics of the field names, Sam1, Sam2, Sam3, but also with the field names that will be generated by the requesting researcher.

I found this video embedded in: A call for open access to all data used in AJ and ApJ articles by Kelle Cruz.

From the post:

I don’t fully understand it, but I know the Astronomical Journal (AJ) and Astrophysical Journal (ApJ) are different than many other journals: They are run by the American Astronomical Society (AAS) and not by a for-profit publisher. That means that the AAS Council and the members (the people actually producing and reading the science) have a lot of control over how the journals are run. In a recent President’s Column, the AAS President, David Helfand proposed a radical, yet obvious, idea for propelling our field into the realm of data sharing and open access: require all journal articles to be accompanied by the data on which the conclusions are based.

We are a data-rich—and data-driven—field [and] I am advocating [that authors provide] a link in articles to the data that underlies a paper’s conclusions…In my view, the time has come—and the technological resources are available—to make the conclusion of every ApJ or AJ article fully reproducible by publishing the data that underlie that conclusion. It would be an important step toward enhancing and sharing our scientific understanding of the universe.

Kelle points out several reasons why existing efforts are insufficient to meet the sharing and archiving needs of the astronomical community.

Suggested reading if you are concerned with astronomical data or archives more generally.

June 24, 2013

OpenGLAM

Filed under: Archives,Library,Museums,Open Data — Patrick Durusau @ 9:14 am

OpenGLAM

From the FAQ:

What is OpenGLAM?

OpenGLAM (Galleries, Libraries, Archives and Museum) is an initiative coordinated by the Open Knowledge Foundation that is committed to building a global cultural commons for everyone to use, access and enjoy.

OpenGLAM helps cultural institutions to open up their content and data through hands-on workshops, documentation and guidance and it supports a network of open culture evangelists through its Working Group.

What do we mean by “open”?

“Open” is a term you hear a lot these days. We’ve tried to get some clarity around this important issue by developing a clear and succinct definition of openness – see Open Definition.

The Open Definition says that a piece of content or data is open if “anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.”

There a number of Open Definition compliant licenses that GLAMs are increasingly using to license digital content and data that they hold. Popular ones for data include CC-0 and for content CC-BY or CC-BY-SA are often used.

Open access to cultural heritage materials will grow the need for better indexing/organization. As if you needed another reason to support it. 😉

June 17, 2013

G8 countries must work harder to open up essential data

Filed under: Government,Government Data,NSA,Open Data — Patrick Durusau @ 9:22 am

G8 countries must work harder to open up essential data by Rufus Pollock.

From the post:

Open data and transparency will be one of the three main topics at the G8 Summit in Northern Ireland next week. Today transparency campaigners released preview results from the global Open Data Census showing that G8 countries still have a long way to go in releasing essential information as open data.

The Open Data Census is run by the Open Knowledge Foundation, with the help of a network of local data experts around the globe. It measures the openness of data in ten key areas including those essential for transparency and accountability (such as election results and government spending data), and those vital for providing critical services to citizens (such as maps and transport timetables). Full results for the 2013 Open Data Census will be released later this year.

open data

The preview results show that while both the UK and the US (who top the table of G8 countries) have made significant progress towards opening up key datasets, both countries still have work to do. Postcode data, which is required for almost all location-based applications and services, remains a major issue for all G8 countries except Germany. No G8 country scored the top mark for company registry data. Russia is the only G8 country not to have published any of the information included in the census as open data. The full results for G8 countries are online at: http://census.okfn.org/g8/

Apologies for the graphic, it is too small to read. See the original post for a more legible version.

The U.S. came in first with a score of 54 out of a possible 60.

I assume this evaluation was done prior the the revelation of the NSA data snooping?

The U.S. government has massive collections of data that not only isn’t visible, its existence is denied.

How is that for government transparency?

The most disappointing part is that other major players, China, Russia, you take your pick, has largely the small secret data as the United States. Probably not full sets of the day to day memos but the data that really counts, they all have.

So, who is it they are keeping information from?

Ah, that would be their citizens.

Who might not approve of their privileges, goals, tactics, and favoritism.

For example, despite the U.S. government’s disapproval/criticism of many other countries (or rather their governments), I can’t think of any reason for me to dislike unknown citizens of another country.

Whatever goals the U.S. government is pursuing in disadvantaging citizens of another country, it’s not on my behalf.

If the public knew who was benefiting from U.S. policy, perhaps new officials would change those policies.

But that isn’t the goal of the specter of government transparency that the United States leads.

June 16, 2013

Open Data Certificates

Filed under: Open Data — Patrick Durusau @ 6:17 pm

Open Data Certificates

From the website:

1. Publish your data

Good news! You’ve already done this bit (or you’re about to). Now let’s make your data easier for people to find, use and share.

2. Check it with our questionnaire

Our helpful questions act like a checklist. They explain your options about how to publish good open data and give you clear and recognised targets to aim for.

3. Share it with a certificate

Your answers determine which of our four certificates you generate. Each one means success in a unique way and demonstrates you are a leader in open data.

From the questionnaire:

This self-assessment questionnaire generates an open data certificate and badge you can publish to tell people all about this open data. We also use your answers to learn how organisations publish open data.

When you answer these questions it demonstrates your efforts to comply with relevant UK legislation. You should also check which other laws and policies apply to your sector, especially if you’re outside the UK (which these questions don’t cover).

The self-assessment aspect of the certificate seems problematic to me.

Too many bad experiences with SDOs that rely on self-assessment in place of independent review.

Having said that, the checklist will help those interested in producing quality data products.

Perhaps there is a commercial opportunity in assessing open data sets?

May 22, 2013

Open Access to Weather Data for International Development

Filed under: Agriculture,Open Data,Weather Data — Patrick Durusau @ 3:28 pm

Open Access to Weather Data for International Development

From the post:

Farming communities in Africa and South Asia are becoming increasingly vulnerable to shock as the effects of climate change become a reality. This increased vulnerability, however, comes at a time when improved technology makes critical information more accessible than ever before. aWhere Weather, an online platform offering free weather data for locations in Western, Eastern and Southern Africa and South Asia provides instant and interactive access to highly localized weather data, instrumental for improved decision making and providing greater context in shaping policies relating to agricultural development and global health.

Weather Data in 9km Grid Cells

Weather data is collected at meteorological stations around the world and interpolated to create accurate data in detailed 9km grids. Within each cell, users can access historical, daily-observed and 8 days of daily forecasted ‘localized’ weather data for the following variables:

  • Precipitation 
  • Minimum and Maximum Temperature
  • Minimum and Maximum Relative Humidity 
  • Solar Radiation 
  • Maximum and Morning Wind Speed
  • Growing degree days (dynamically calculated for your base and cap temperature) 

These data prove essential for risk adaption efforts, food security interventions, climate-smart decision making, and agricultural or environmental research activities.

Sign up Now

Access is free and easy. Register at http://www.awhere.com/en-us/weather-p. Then, you can log back in anytime at me.awhere.com.  

For questions on the platform, please contact weather@awhere.com

At least as a public observer, I could not determine how much “interpolation” is going to the weather data. That would have a major impact on the risk of accepting the data provided at face value.

I suspect it varies from little interpolation at all in heavily instrumented areas to quite a bit in areas with sparser readings. How much is unclear.

It maybe that the amount of interpolation in the data is a factor of whether you use the free version or some upgraded commercial version.

Still, an interesting data source to combine with others, if you are mindful of the risks.

May 18, 2013

Open Data and Wishful Thinking

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 12:58 pm

BLM Fracking Rule Violates New Executive Order on Open Data by Sofia Plagakis.

From the post:

Today, the U.S. Department of the Interior’s Bureau of Land Management (BLM) released its revised proposed rule for natural gas drilling (commonly referred to as fracking) on federal and tribal lands. The much-anticipated rule violates President Obama’s recently issued executive order that requires new government information to be made available to the public in open, machine-readable formats.

Last week, President Obama signed an executive order requiring that all newly generated public data be pushed out in open, machine-readable formats. Concurrently, the Office of Management and Budget (OMB) and the Office of Science and Technology Policy (OSTP) released an Open Data Policy designed to make previously unavailable government data accessible to entrepreneurs, researchers, and the public.

The executive order and accompanying policy must have been in development for months, and agencies, including BLM, should have been fully aware of the new policy. But instead of establishing a modern example of government information collection and sharing, BLM’s proposed rule would allow drilling companies to report the chemicals used in fracking to a third-party, industry-funded website, called FracFocus.org, which does not provide data in machine-readable formats. FracFocus.org only allows users to download PDF files of reports on fracked wells. Because PDF files are not machine-readable, the site makes it very difficult for the public to use and analyze data on wells and chemicals that the government requires companies to collect and make available.

I wonder if Sofia simply overlooked:

When implementing the Open Data Policy, agencies shall incorporate a full analysis of privacy, confidentiality, and security risks into each stage of the information lifecycle to identify information that should not be released. These review processes should be overseen by the senior agency official for privacy. It is vital that agencies not release information if doing so would violate any law or policy, or jeopardize privacy, confidentiality, or national security. [From “We won’t get fooled again…”]

Or if her “…requires new government information to be made available to the public in open, machine-readable formats” is wishful thinking?

The Obama just released the Benghazi emails in PDF format. So we have an example of the Whitehouse violating its own “open data” policy.

We don’t need more “open data.”

What we need are more leakers. A lot more leakers.

Just be sure you leak or pass on leaks in “open, machine-readable formats.”

The foreign adventures, environmental pollution, failures in drug or food safety, etc., avoided by leaks may save your life, the lives of your children or grandchildren.

Leak today!

May 10, 2013

Search Nonprofit Tax Forms

Filed under: Government Data,Non-Profit,Open Data — Patrick Durusau @ 5:46 pm

ProPublica Launches Online Tool to Search Nonprofit Tax Forms by Doug Donovan.

From the post:

The investigative-journalism organization ProPublica started a free online service today for searching the federal tax returns of more than 615,000 nonprofits.

ProPublica began building its Nonprofit Explorer tool on its Web site shortly after the Internal Revenue Service announced in April that it was making nonprofit tax returns available in a digital, searchable format.

ProPublica’s database provides nonprofit Form 990 information free back to 2001, including executive compensation, total revenue, and other critical financial data

Scott Klein, editor of news applications at ProPublica, said Nonprofit Explorer is not meant to replace GuideStar, the most familiar online service for searching nonprofit tax forms. Many search results on Nonprofit Explorer also offer links to GuideStar data.

“They have a much richer tool set,” Mr. Klein said.

For now, Nonprofit Explorer does not include the tax forms filed by private foundations but is expected to do so in a future update.

I guess copy limitations prevented reporting the URL for the ProPublica’s Nonprofit Explorer.

Another place to look for smoke even if you are unlikely to find fire.

“We won’t get fooled again…”

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 4:21 pm

Landmark Steps to Liberate Open Data

There is no shortage of discussion of President Obama’s executive order that is alleged to result in greater access to government data.

Except then you read:

Agencies shall implement the requirements of the Open Data Policy and shall adhere to the deadlines for specific actions specified therein. When implementing the Open Data Policy, agencies shall incorporate a full analysis of privacy, confidentiality, and security risks into each stage of the information lifecycle to identify information that should not be released. These review processes should be overseen by the senior agency official for privacy. It is vital that agencies not release information if doing so would violate any law or policy, or jeopardize privacy, confidentiality, or national security.

Gee, I wonder who is going to decide what information gets released?

How would we know when “open data” efforts succeed?

Here’s my test: When ordinary citizens can mine open data and their complaints result in the arrest and conviction of public officials or government staff.

Unless and until that sort of information is public data, you are being distracted from important data by platitudes and flattery.

May 7, 2013

Cassava database becomes open access

Filed under: Agriculture,Open Access,Open Data — Patrick Durusau @ 3:50 pm

Cassava database becomes open access

From the post:

Cassavabase is a database of phenotypic and genotypic data generated by cassava breeding programs within the Next Generation Cassava Breeding (NEXTGEN Cassava) project*.

The database makes available breeding data immediately available, thereby providing cassava researchers and breeders a key reference data source. The Cassava plant (Manihot esculenta) feeds more than 500 million people mainly in Africa.

Besides phenotypic and genotypic data, Cassavabase  contains  cassava geographical maps, genome and sequences and other datasets produced within the NEXTGEN Cassava project. Data can be accessed through the web interface and also various tools are available to view the datasets. Cassavabase, and the advantages of open access data were presented at the recent G8 International Conference on Open Data for Agriculture held in Washington, D.C.

Cassava is a plant that isn’t subject to a Monsanto patent (I don’t think) or that requires Monsanto chemicals to grow properly.

That alone means you are unlikely to encounter references to it in globalization of agriculture discussions.

Why grow something you can’t sell internationally? While paying homage to Monsanto?

Answers suggest themselves to me but for now I simply wanted to make you aware of this dataset.

May 3, 2013

G8 – Open Data for Agriculture

Filed under: Agriculture,Open Data — Patrick Durusau @ 3:21 pm

G8 – Open Data for Agriculture (World Bank)

From the webpage:

At the 2012 G-8 Summit, G-8 leaders committed to the New Alliance for Food Security and Nutrition, the next phase of a shared commitment to achieving global food security.

As part of this commitment, they agreed to “share relevant agricultural data available from G-8 countries with African partners and convene an international conference on Open Data for Agriculture, to develop options for the establishment of a global platform to make reliable agricultural and related information available to African farmers, researchers and policymakers, taking into account existing agricultural data systems.”

On April 29-30, the G8 International Conference on Open Data for Agriculture brought together open data and agriculture experts along with U.S. Agriculture Secretary Tom Vilsack, U.S. Chief Technology Officer Todd Park, and World Bank Vice President for Sustainable Development Rachel Kyte to explore more opportunities for open data and knowledge sharing that can help farmers and governments in Africa and around the globe protect their crops from pests and extreme weather, increase their yields, monitor water supplies, and anticipate planting seasons that are shifting with climate change.

Webcasts on open data for agriculture.

This is immediately applicable to some work I am doing (more on that, hopefully later in May) but I discovered that the webcasts are single session casts. That is the one I am watching now is almost nine (9) hours long.

Fortunately I have the agenda and can guess fairly close on the part that I want to see.

Good background information if you are interested in topic maps in this space.

April 25, 2013

Open Data On The Web : April 2013

Filed under: Open Data,W3C — Patrick Durusau @ 10:09 am

Open Data On The Web : April 2013 by Kal Ahmed.

From the post:

I was privileged to be one of the attendees of the Open Data on the Web workshop organized by W3C and hosted by Google in London this week. I say privileged because the gathering brought together researchers, developers and entrepreneurs from all around the world together in a unique mix that I’m sure won’t be achieved again until Phil Archer at W3C organizes the next one.

In the following I have not used direct quotes from those named as I didn’t make many notes of direct quotations. I hope that I have not misrepresented anyone, but if I have, please let me know and I will fix the text. This is not a journalistic report, its more a reflection of my concerns through the prism of a lot of people way smarter than me saying a lot of interesting things.

Covers sustainability, make it simpler?, data as a service, discoverability, attribution & licensing.

Kal has an engaging writing style and you will gain a lot just from his summary.

The issues he reports are largely the same across the datasphere, whatever your technological preference.

April 4, 2013

The Project With No Name

Filed under: Linked Data,LOD,Open Data — Patrick Durusau @ 4:53 am

Fujitsu Labs And DERI To Offer Free, Cloud-Based Platform To Store And Query Linked Open Data by Jennifer Zaino.

From the post:

The Semantic Web Blog reported last year about a relationship formed between the Digital Enterprise Research Institute (DERI) and Fujitsu Laboratories Ltd. in Japan, focused on a project to build a large-scale RDF store in the cloud capable of processing hundreds of billions of triples. At the time, Dr. Michael Hausenblas, who was then a DERI research fellow, discussed Fujitsu Lab’s research efforts related to the cloud, its huge cloud infrastructure, and its identification of Big Data as an important trend, noting that “Linked Data is involved with answering at least two of the three Big Data questions” – that is, how to deal with volume and variety (velocity is the third).

This week, the DERI and Fujitsu Lab partners have announced a new data storage technology that stores and queries interconnected Linked Open Data, to be available this year, free of charge, on a cloud-based platform. According to a press release about the announcement, the data store technology collects and stores Linked Open Data that is published across the globe, and facilitates search processing through the development of a caching structure that is specifically adapted to LOD.

Typically, search performance deteriorates when searching for common elements that are linked together within data because of requirements around cross-referencing of massive data sets, the release says. The algorithm it has developed — which takes advantage of links in LOD link structures typically being concentrated in only a portion of server nodes, and of past usage frequency — caches only the data that is heavily accessed in cross-referencing to reduce disk accesses, and so accelerate searching.

Not sure what it means for the project between DERI and Fujitsu to have no name. Or at least no name in the press releases.

Until that changes, may I suggest: DERI and Fujitsu Project With No Name (DFPWNN)? 😉

With or without a name I was glad for DERI because, well, I like research and they do it quite well.

DFPWNN’s better query technology for LOD will demonstrate, in my opinion, the same semantic diversity found at Swoogle.

Linking up semantically diverse content means just that, a lot of semantically diverse content, linked up.

The bill for leaving semantic diversity as a problem to address “later” is about to come due.

March 28, 2013

Open Data for Africa Launched by AfDB

Filed under: Government,Government Data,Open Data — Patrick Durusau @ 6:15 pm

Open Data for Africa Launched by AfDB

From the post:

The African Development Bank Group has recently launched the ‘Open Data for Africa‘ as part of the bank’s goal to improve data management and dissemination in Africa. The Open Data for Africa is a user friendly tool for extracting data, creating and sharing own customized reports, and visualising data across themes, sectors and countries in tables, charts and maps. The platform currently holds data from 20 African countries : Algeria, Cameroon, Cape Verde, Democratic Republic of Congo, Ethiopia, Malawi, Morocco, Mozambique, Namibia, Nigeria, Ghana, Rwanda, Republic of Congo, Senegal, South Africa, South Sudan, Tanzania, Tunisia, Zambia and Zimbabwe.

Not a lot of resources but a beginning.

One trip to one country isn’t enough to form an accurate opinion of a continent but I must report my impression of South Africa from several years ago.

I was at a conference with mid-level government and academic types for a week.

In a country where “child head of household” is a real demographic category, I came away deeply impressed with the optimism of everyone I met.

You can just imagine the local news in the United States and/or Europe if a quarter of the population was dying.

Vows of to “…never let this happen again…,” blah, blah, would choke the channels.

Not in South Africa. They readily admit to having a variety of serious issues but are equally serious about developing ways to meet those challenges.

If you want to see optimism in the face of stunning odds, I would strongly recommend a visit.

March 21, 2013

Data.ac.uk

Filed under: Data,Open Data,RDF — Patrick Durusau @ 2:38 pm

Data.ac.uk

From the website:

This is a landmark site for academia providing a single point of contact for linked open data development. It not only provides access to the know-how and tools to discuss and create linked data and data aggregation sites, but also enables access to, and the creation of, large aggregated data sets providing powerful and flexible collections of information.
Here at Data.ac.uk we’re working to inform national standards and assist in the development of national data aggregation subdomains.

I can’t imagine a greater contrast between my poor web authoring skills and a website than this one.

But having said that, I think you will be as disappointed as I was when you start looking for data on this “landmark site.”

There is some but not nearly enough to match the promise of such a cleverly designed website.

Perhaps they are hoping that someday RDF data (they also offer comma and tab delimited versions) will catch up to the site design.

I first saw this in a tweet by Frank van Harmelen.

March 20, 2013

Open Data: The World Bank Data Blog

Filed under: Government,Government Data,Open Data,Open Government — Patrick Durusau @ 1:25 pm

Open Data: The World Bank Data Blog

In case you are following open data/government issues, you will want to add this blog to your RSS feed.

Not a high traffic blog but with twenty-seven contributing authors, you get a diversity of viewpoints.

Not to mention that the World Bank is a great source for general data.

I persist in thinking that transparency means identifying individuals responsible for decisions, expenditures and the beneficiaries of those decisions and expenditures.

That isn’t a popular position among those who make decisions and approve expenditures for unidentified beneficiaries.

You will either have to speculate on your own or ask someone else why that is an unpopular position.

Scenes from a Dive

Filed under: BigData,Data Mining,Open Data,Public Data — Patrick Durusau @ 10:27 am

Scenes from a Dive – what’s big data got to do with fighting poverty and fraud? by Prasanna Lal Das.

From the post:

A more detailed recap will follow soon but here’s a very quick hats off to the about 150 data scientists, civic hackers, visual analytics savants, poverty specialists, and fraud/anti-corruption experts that made the Big Data Exploration at Washington DC over the weekend such an eye-opener.We invite you to explore the work that the volunteers did (these are rough documents and will likely change as you read them so it’s okay to hold off if you would rather wait for a ‘final’ consolidated  document). The projects that the volunteers worked on include: 

Here are some visualizations that some project teams built. A few photos from the event are here (thanks @neilfantom). More coming soon (and yes, videos too!). Thanks @francisgagnon for the first blog about the event. The event hashtag was #data4good (follow @datakind and @WBopenfinances for more updates on Twitter).

Great meeting and projects but I would suggest a different sort of “big data”

Requiring recipients to grant reporting access to all bank accounts where funds will be transferred and requiring the same for any entity paid out of those accounts to the point where transfers over 90 days are less than $1,000 for any entity (or related entity), would be a better start.

With the exception of the “related entity” information, banks already keep transfer of funds information as a matter of routine business. It would be “big data” that is rich in potential for spotting fraud and waste.

The reporting banks should also be required to deliver other banking records they have on the accounts where funds are transferred and other activity in those accounts.

Before crying “invasion of privacy,” remember World Bank funding is voluntary.

As is acceptance of payment from World Bank funded projects. Anyone and everyone is free to decline such funding and avoid the proposed reporting requirements.

“Big data” to track fraud and waste is already collected by the banking industry.

The question is whether we will use that “big data” to effectively track fraud and waste or wait for particularly egregious cases to come to light?

March 18, 2013

The Biggest Failure of Open Data in Government

Filed under: Government,Government Data,Open Data,Open Government — Patrick Durusau @ 3:35 pm

Many open data initiatives forget to include the basic facts about the government itself by Philip Ashlock.

From the post:

In the past few years we’ve seen a huge shift in the way governments publish information. More and more governments are proactively releasing information as raw open data rather than simply putting out reports or responding to requests for information. This has enabled all sorts of great tools like the ones that help us find transportation or the ones that let us track the spending and performance of our government. Unfortunately, somewhere in this new wave of open data we forgot some of the most fundamental information about our government, the basic “who”, “what”, “when”, and “where”.

US map

Do you know all the different government bodies and districts that you’re a part of? Do you know who all your elected officials are? Do you know where and when to vote or when the next public meeting is? Now perhaps you’re thinking that this information is easy enough to find, so what does this have to do with open data? It’s true, it might not be too hard to learn about the highest office or who runs your city, but it usually doesn’t take long before you get lost down the rabbit hole. Government is complex, particularly in America where there can be a vast multitude of government districts and offices at the local level.

How can we have a functioning democracy when we don’t even know the local government we belong to or who our democratically elected representatives are? It’s not that Americans are simply too ignorant or apathetic to know this information, it’s that the system of government really is complex. With what often seems like chaos on the national stage it can be easy to think of local government as simple, yet that’s rarely the case. There are about 35,000 municipal governments in the US, but when you count all the other local districts there are nearly 90,000 government bodies (US Census 2012) with a total of more than 500,000 elected officials (US Census 1992). The average American might struggle to name their representatives in Washington D.C., but that’s just the tip of the iceberg. They can easily belong to 15 government districts with more than 50 elected officials representing them.

We overlook the fact that it’s genuinely difficult to find information about all our levels of government. We unconsciously assume that this information is published on some government website well enough that we don’t need to include it as part of any kind of open data program

Yes, the number of subdivisions of government and the number of elected officials are drawn from two different census reports, the first from the 2012 census and the second from the 1992 census, a gap of twenty (20) years.

The Census bureau has the 1992 list, saying:

1992 (latest available) 1992 Census of Governments vol. I no. 2 [PDF, 2.45MB] * Report has been discontinued

Makes me curious why such a report would be discontinued?

A report that did not address the various agencies, offices, etc. that are also part of various levels of government.

Makes me think you need an “insider” and/or a specialist just to navigate the halls of government.

Philip’s post illustrates that “open data” dumps from government are distractions from more effective questions of open government.

Questions such as:

  • Which officials have authority over what questions?
  • How to effectively contact those officials?
  • What actions are under consideration now?
  • Rules and deadlines for comments on actions?
  • Hearing and decision calendars?
  • Comments and submissions by others?
  • etc.

It never really is “…the local board of education (substitute your favorite board) decided….” but “…member A, B, D, and F decided that….”

Transparency means not allowing people and their agendas to hide behind the veil of government.

February 28, 2013

From President Obama, The Opaque

Filed under: Government,Government Data,Open Data,Open Government,Transparency — Patrick Durusau @ 5:26 pm

Leaked BLM Draft May Hinder Public Access to Chemical Information

From the post:

On Feb. 8, EnergyWire released a leaked draft proposal from the U.S. Department of the Interior’s Bureau of Land Management on natural gas drilling and extraction on federal public lands. If finalized, the proposal could greatly reduce the public’s ability to protect our resources and communities. The new draft indicates a disappointing capitulation to industry recommendations.

The draft rule affects oil and natural gas drilling operations on the 700 million acres of public land administered by BLM, plus 56 million acres of Indian lands. This includes national forests, which are the sources of drinking water for tens of millions of Americans, national wildlife refuges, and national parks, which are widely used for recreation.

The Department of the Interior estimates that 90 percent of the 3,400 wells drilled each year on public and Indian lands use natural gas fracking, a process that pumps large amounts of water, sand, and toxic chemicals into gas wells at very high pressure to cause fissures in shale rock that contains methane gas. Fracking fluid is known to contain benzene (which causes cancer), toluene, and other harmful chemicals. Studies link fracking-related activities to contaminated groundwater, air pollution, and health problems in animals and humans.

If the leaked draft is finalized, the changes in chemical disclosure requirements would represent a major concession to the oil and gas industry. The rule would allow drilling companies to report the chemicals used in fracking to an industry-funded website, called FracFocus.org. Though the move by the federal government to require online disclosure is encouraging, the choice of FracFocus as the vehicle is problematic for many reasons.

First, the site is not subject to federal laws or oversight. The site is managed by the Ground Water Protection Council (GWPC) and the Interstate Oil and Gas Compact Commission (IOGCC), nonprofit intergovernmental organizations comprised of state agencies that promote oil and gas development. However, the site is paid for by the American Petroleum Institute and America’s Natural Gas Alliance, industry associations that represent the interests of member companies.

BLM would have little to no authority to ensure the quality and accuracy of the data reported directly to such a third-party website. Additionally, the data will not be accessible through the Freedom of Information Act since BLM is not collecting the information. The IOGCC has already declared that it is not subject to federal or state open records laws, despite its role in collecting government-mandated data.

Second, FracFocus.org makes it difficult for the public to use the data on wells and chemicals. The leaked BLM proposal fails to include any provisions to ensure minimum functionality on searching, sorting, downloading, or other mechanisms to make complex data more usable. Currently, the site only allows users to download PDF files of reports on fracked wells, which makes it very difficult to analyze data in a region or track chemical use. Despite some plans to improve searching on FracFocus.org, the oil and gas industry opposes making chemical data easier to download or evaluate for fear that the public “might misinterpret it or use it for political purposes.”

Don’t you feel safer? Knowing the oil and gas industry is working so hard to protect you from misinterpreting data?

Why the government is helping the oil and gas industry protect us from data I cannot say.

I mention this an example of testing for “transparency.”

Anything the government freely makes available with spreadsheet capabilities, isn’t transparency. It’s distraction.

Any data that the government tries to hide, that data has potential value.

The Center for Effective Government points out these are draft rules and when published, you need to comment.

Not a bad plan but not very reassuring given the current record of President Obama, the Opaque.

Alternatives? Suggestions for how data mining could expose those who own floors of the BLM, who drill the wells, etc?

« Newer PostsOlder Posts »

Powered by WordPress