Archive for the ‘Open Government’ Category

(Legal) Office of Personnel Management Data!

Friday, June 9th, 2017

We’re Sharing A Vast Trove Of Federal Payroll Records by Jeremy Singer-Vine.

From the post:

Today, BuzzFeed News is sharing an enormous dataset — one that sheds light on four decades of the United States’ federal payroll.

The dataset contains hundreds of millions of rows and stretches all the way back to 1973. It provides salary, title, and demographic details about millions of U.S. government employees, as well as their migrations into, out of, and through the federal bureaucracy. In many cases, the data also contains employees’ names.

We obtained the information — nearly 30 gigabytes of it — from the U.S. Office of Personnel Management, via the Freedom of Information Act (FOIA). Now, we’re sharing it with the public. You can download it for free on the Internet Archive.

This is the first time, it seems, that such extensive federal payroll data is freely available online, in bulk. (The Asbury Park Press and both publish searchable databases. They’re great for browsing, but don’t let you download the data.)

We hope that policy wonks, sociologists, statisticians, fellow journalists — or anyone else, for that matter — find the data useful.

We obtained the information through two Freedom of Information Act requests to OPM. The first chunk of data, provided in response to a request filed in September 2014, covers late 1973 through mid-2014. The second, provided in response to a request filed in December 2015, covers late 2014 through late 2016. We have submitted a third request, pending with the agency, to update the data further.

Between our first and second requests, OPM announced it had suffered a massive computer hack. As a result, the agency told us, it would no longer release certain information, including the employee “pseudo identifier” that had previously disambiguated employees with common names.

What a great data release! Kudos and thanks to BuzzFeed News!

If you need the “pseudo identifiers” for the second or following releases and/or data for the employees withheld (generally the more interesting ones), consult data from the massive computer hack.

Or obtain the excluded data directly from the Office of Personnel Management without permission.


Open Data = Loss of Bureaucratic Power

Friday, June 9th, 2017

James Comey’s leaked memos about meetings with President Trump illustrates one reason for the lack of progress on open data reported in FOIA This! The Depressing State of Open Data by Toby McIntosh.

From Former DOJ Official on Comey Leak: ‘Standard Operating Procedure’ Among Bureaucrats:

On “Fox & Friends” today, J. Christian Adams said the leak of the memos by Comey was in line with “standard operating procedure” among Beltway bureaucrats.

“[They] were using the media, using confidential information to advance attacks on the President of the United States. That’s what they do,” said Adams, adding he saw it go on at DOJ.

Access to information is one locus of bureaucratic power, which makes the story in FOIA This! The Depressing State of Open Data a non-surprise:

In our latest look at FOIA around the world, we examine the state of open data sets. According to the new report by the World Wide Web Foundation, the news is not good.

“The number of global truly open datasets remains at a standstill,” according to the group’s researchers, who say that only seven percent of government data is fully open.

The findings come in the fourth edition of the Open Data Barometer, an annual assessment which was enlarged this year to include 1,725 datasets from 15 different sectors across 115 countries. The report summarizes:

Only seven governments include a statement on open data by default in their current policies. Furthermore, we found that only 7 percent of the data is fully open, only one of every two datasets is machine readable and only one in four datasets has an open license. While more data has become available in a machine-readable format and under an open license since the first edition of the Barometer, the number of global truly open datasets remains at a standstill.

Based on the detailed country-by-country rankings, the report says some countries continue to be leaders on open data, a few have stepped up their game, but some have slipped backwards.

With open data efforts at a standstill and/or sliding backwards, waiting for bureaucrats to voluntarily relinquish power is a non-starter.

There are other options.

Need I mention the Office of Personnel Management hack? The highly touted but apparently fundamentally vulnerable NSA?

If you need a list of cyber-vulnerable U.S. government agencies, see: A-Z Index of U.S. Government Departments and Agencies.

You can:

  • wait for bureaucrats to abase themselves,
  • post how government “…ought to be transparent and accountable…”
  • echo opinions of others on calling for open data,

or, help yourself to government collected, generated, or produced data.

Which one do you think is more effective?

Guarantees Of Public Access In Trump Administration (A Perfect Data Storm)

Saturday, December 31st, 2016

I read hand wringing over the looming secrecy of the Trump administration on a daily basis.

More truthfully, I skip over daily hand wringing over the looming secrecy of the Trump administration.

For two reasons.

First, as reported in US government subcontractor leaks confidential military personnel data by Charlie Osborne, government data doesn’t require hacking, just a little initiative.

In this particular case, it was rsync without a username or password, that made this data leak possible.

Editors should ask their reporters before funding FOIA suits: “Have you tried rsync?”

Second, the alleged-to-be-Trump-nominees for cabinet and lesser positions, remind me of this character from Dilbert: November 2, 1992:


Trump appointees may have mastered the pointy end of pencils but their ability to use cyber-security will be as shown.

When you add up the cyber-security incompetence of Trump appointees, complaints from Inspector Generals about agency security, and agencies leaking to protect their positions/turf, you have the conditions for a perfect data storm.

A perfect data storm that may see the US government hemorrhaging data like never before.

PS: You know my preference, post leaks on receipt in their entirety. As for “consequences,” consider those a down payment on what awaits people who betray humanity, their people, colleagues and family. They could have chosen differently and didn’t. What more can one say?

Why the Open Government Partnership Needs a Reboot [Governments Too]

Saturday, December 12th, 2015

Why the Open Government Partnership Needs a Reboot by Steve Adler.

From the post:

The Open Government Partnership was created in 2011 as an international forum for nations committed to implementing Open Government programs for the advancement of their societies. The idea of open government started in the 1980s after CSPAN was launched to broadcast U.S. Congressional proceedings and hearings to the American public on TV. While the galleries above the House of Representatives and Senate had been “open” to the “public” (if you got permission from your representative to attend) for decades, never before had all public democratic deliberations been broadcast on TV for the entire nation to behold at any time they wished to tune in.

I am a big fan of OGP and feel that the ideals and ambition of this partnership are noble and essential to the survival of democracy in this millennium. But OGP is a startup, and every startup business or program faces a chasm it must cross from early adopters and innovators to early majority market implementation and OGP is very much at this crossroads today. It has expanded membership at a furious pace the past three years and it’s clear to me that expansion is now far more important to OGP than the delivery of the benefits of open government to the hundreds of millions of citizens who need transparent transformation.

OGP needs a reboot.

The structure of a system produces its own behavior. OGP needs a new organizational structure with new methods for evaluating national commitments. But that reboot needs to happen within its current mission. We should see clearly that the current structure is straining due to the rapid expansion of membership. There aren’t enough support unit resources to manage the expansion. We have to rethink how we manage national commitments and how we evaluate what it means to be an open government. It’s just not right that countries can celebrate baby steps at OGP events while at the same time passing odious legislation, sidestepping OGP accomplishments, buckling to corruption, and cracking down on journalists.

Unlike Steve I didn’t and don’t have a lot of faith in governments being voluntarily transparent.

As I pointed out in Congress: More XQuery Fodder, sometime in 2016, full bill status data will be available for all legislation before the United States Congress.

A lot more data than is easy to access now but it is more smoke than fire.

With legislation status data, you can track the civics lesson progression of a bill through Congress, but that leaves you at least 3 to 4 degrees short of knowing who was behind the legislation.

Just a short list of what more would be useful:

  • Visitor/caller list for everyone who spoke to a member of Congress and their staff. With date and subject of the call.
  • All visitors and calls tied to particular legislation and/or classes of legislation
  • All fund raising calls made by members of Congress and/or their staffs, date, results, substance of call.
  • Representative conversations with reconciliation committee members or their staffers about legislation and requested “corrections.”
  • All conversations between a representative or member of their staff and agency staff, identifying all parties and the substance of the conversation
  • Notes, proposals, discussion notes for all agencies decisions

Current transparency proposals are sufficient to confuse the public with mounds of nearly useless data. None of it reflects the real decision making processes of government.

Before someone shouts “privacy,” I would point out that no citizen has a right to privacy when their request is for a government representative to favor them over other citizens of the same government.

Real government transparency will require breaking the mini-star chamber proceedings from the lowest to the highest levels of government.

What we need is a rebooting of governments.

The Closed United States Government

Wednesday, December 17th, 2014

U.S. providing little information to judge progress against Islamic State by Nancy A. Youssef.

From the post:

The American war against the Islamic State has become the most opaque conflict the United States has undertaken in more than two decades, a fight that’s so underreported that U.S. officials and their critics can make claims about progress, or lack thereof, with no definitive data available to refute or bolster their positions.

The result is that it’s unclear what impact more than 1,000 airstrikes on Iraq and Syria have had during the past four months. That confusion was on display at a House Foreign Affairs Committee hearing earlier this week, where the topic – “Countering ISIS: Are We Making Progress?” – proved to be a question without an answer.

“Although the administration notes that 60-plus countries having joined the anti-ISIS campaign, some key partners continue to perceive the administration’s strategy as misguided,” Rep. Ed Royce, R-Calif., the committee’s chairman, said in his opening statement at the hearing, using a common acronym for the Islamic State. “Meanwhile, there are grave security consequences to allowing ISIS to control a territory of the size of western Iraq and eastern Syria.”

Nancy does a great job teasing out reasons for the opaqueness of the war against ISIS, which include:

  1. Disclosure of the lack of coordination between any policy goal and military action
  2. Disclosure of odd alliances with countries and “groups” (terrorist groups?)
  3. Disclosure of timing and location of attacks might be used to detect trends

The first two are classic reasons for openness. If the public knew what was happening in the war with ISIS, it would well have Congress defund the war as being incompetently lead. Take it up some other time with better leadership.

But the public can’t make that call so long as the government remains a closed (not open) government and the press remains too timid to seek facts out for itself.

I don’t credit #3 at all because ISIS should know with a fair degree of accuracy where bombing raids are occurring and when. Unless the military is bombing sand to throw off their trend analysis.

Lack of openness from the government, about wars, about torture, about its alliances, will lead to future generations asking Americans: “How could you have supported a government like that?” Are you really going to say that you didn’t know?

Open-Data Data Dumps

Friday, July 25th, 2014

Government’s open-data portal at risk of becoming a data dump by Jj Worrall.

From the post:

The Government’s new open-data portal is not yet where it would like it to be, Minister Brendan Howlin said in a Department of Public Expenditure and Reform meeting room earlier this week.

In case expectations are too high, the word “pilot” is in italics when you visit the site in question –

Meanwhile the words “start” and “beginning” pepper the conversation with the Minister and a variety of data experts from the Insight Centre in NUI Galway who have helped create the site. allows those in the Government, as well as interested businesses and citizens, to examine data from a variety of public bodies, opening opportunities for Government efficiencies and commercial possibilities along the way.

The main problem is that there is not much of it, and a lot of what is there can’t be utilised in a particularly useful fashion.

As director of the US Open Data Institute Waldo Jaquith told The Irish Times, with “almost no data” available in a format that’s genuinely usable by app developers, businesses or interested parties, for the moment represents “a haphazard collection of data”.

It is important to realize that governments and their staffs have very little experience at being open and/or sharing data. Among the reasons for being reluctant to post open-data are:

  1. Less power over you since requests for data cannot be delayed or denied
  2. Less power in general because others will have the data
  3. Less power to confer on others by exclusive access to the data
  4. Less security since data may show poor results or performance
  5. Less security since data may show favoritism or fraud
  6. Less prestige as the source of answers on the data

Not an exhaustive list but it is a reminder that changing the attitudes about open-data probably beyond your reach.

What you can do with a site such as, is to find a dataset of interest to you and make concrete suggestions for improvements.

There are a number of government staffers who I didn’t capture in my list of reasons to not share data. Side with them and facilitate their work.

For example:

Met Éireann Climate Products. A polite note to Evelyn.O’ should point out that an order form and price list doesn’t really constitute “open-data” in the sense citizens and developers expect. Should take the “resource” off the listing and make it available elsewhere. “Data products to order” for example.


Weather Buoy Network Real Time Data, where if you dig long enough, you will find that you can download csv formatted data by blindly guessing at bouy names. A map of bouy locations would greatly assist at that point, not to mention having an RSS feed for bouy data as it is received. Downloading a file tells me I am not getting “Real Time Data.” Yes?

Not major improvements but would improve those two items at any rate.

It will take time but ultimately sharing staff will prevail. You can hasten that day’s arrival or you can retard it. Your choice.

I first saw this in a tweet by Deirdre Lee.


Wednesday, July 2nd, 2014

Searching Data Tables

From the webpage:

There are loads of open data portals There’s even portal about data portals. And each of these portals has loads of datasets.

OpenPrism is my most recent attempt at understanding what is going on in all of these portals. Read on if you want to see why I made it, or just go to the site and start playing with it.

Naive search method

One difficulty in discovering open data is the search paradigm.

Open data portals approach searching data as if data were normal prose; your search terms are some keywords, a category, &c., and your results are dataset titles and descriptions.

OpenPrism is one small attempt at making it easier to search. Rather than going to all of the different portals and making a separate search for each portal, you type your search in one search bar, and you get results from a bunch of different Socrata, CKAN and Junar portals.

Certainly more efficient than searching data portals separately but searching data portals is highly problematic in any event.

Or at least more problematic that using one of the standard web search engines. Search engines that rely upon the choices of millions of users to fine tune their results and even then they are often a mixed bag.

Inter-data portals and I suspect most intra-data portals do not share common schemas or metadata. Which means a search that is successful in one data portal may return no results in another data portal.

Not that I am about to advocate a “universal” schema for all data portals. 😉

A good first step would be enabling data silo to have searchable mappings for data columns as suggested by users. Not machine implemented but just simple prose. Users researching in particular areas are likely to encounter the same data sets and recording their mappings could well assist other users.

Relying on user suggested mappings would also enable improvements to those data sets that get used the most, the ones users care about possibly combining. As opposed to having IT guessing what data mappings should have priority.

Sound like a plan?

See the source at GitHub.

I first saw this in a tweet by Felienne Hermans

Open government:….

Saturday, May 31st, 2014

Open government: getting beyond impenetrable online data by Jed Miller.

From the post:

Mathematician Blaise Pascal famously closed a long letter by apologising that he hadn’t had time to make it shorter. Unfortunately, his pithy point about “download time” is regularly attributed to Mark Twain and Henry David Thoreau, probably because the public loves writers more than it loves statisticians. Scientists may make things provable, but writers make them memorable.

The World Bank confronted a similar reality of data journalism earlier this month when it revealed that, of the 1,600 bank reports posted online on from 2008 to 2012, 32% had never been downloaded at all and another 40% were downloaded under 100 times each.

Taken together, these cobwebbed documents represent millions of dollars in World Bank funds and hundreds of thousands of person-hours, spent by professionals who themselves represent millions of dollars in university degrees. It’s difficult to see the return on investment in producing expert research and organising it into searchable web libraries when almost three quarters of the output goes largely unseen.

You won’t find any ways to make documents less impenetrable in Jed’s post but it is a source for quotes on the issue.

For example:

For nonprofits and governments that still publish 100-page pdfs on their websites and do not optimise the content to share in other channels such as social: it is a huge waste of time and ineffective. Stop it now.

OK, so that’s easy: “Stop it now.”

The harder question: “What should we put in its place?”

Shouting “stop it” without offering examples of better documents or approaches, is like a car horn in New York City. It’s just noise pollution.

Do you have any examples of documents, standards, etc. that are “good” and non impenetrable?

Let’s make this more concrete: Suggest an “impenetrable” document*, hopefully not a one hundred (100) page one and I will take a shot at revising it to make it less “impenetrable.” I will post a revised version here with notes as to why revisions were made. We won’t all agree but it might result in a example document that isn’t “impenetrable.”

*Please omit tax statutes or regulations, laws, etc. I could probably make them less impenetrable but only with a great deal of effort. That sort of text is “impenetrable” by design.

Follow the Money (OpenTED)

Tuesday, May 20th, 2014

Opening Up EU Procurement Data by Friedrich Lindenberg.

From the post:

What is the next European dataset that investigative journalists should look at? Back in 2012 at the DataHarvest conference, Brigitte, investigative superstar from FarmSubsidy and co-host of the conference, had a clear answer: let’s open up TED (Tenders Electronic Daily). TED is the EU’s shared procurement mechanism, and is at the heart of the EU contracting process. Opening it up would shine a light on the key questions of who receives public money, and what they receive it for.

Her suggestion triggered a two-year project, OpenTED, which, as of last week, has finally matured into a useful resource for journalists and researchers. While gaps remain, we hope it will now start to be used by journalists, NGOs, analysts and citizens to get information on everything from large scale trends to local municipal developments.

(image omitted)


TED collects tender notices for large public projects so that companies from all EU countries can bid on those contracts. For journalists, there are many exciting questions such a database would be able to answer: What major projects are being announced? Who is winning the contracts for these projects, and is that decision made prudently and impartially? Who are the biggest suppliers in a particular country or industry?

A data dictionary for the project remains unfinished and there are plenty of other opportunities to contribute to this project.

The phrase “large public project” means projects with budgets in excess of €200,000. If experience in the United States holds true for the EU, there can be a lot of FGC (Fraud, Greed, Corruption) in under €200,000 contracts.

If you are looking for volunteer opportunities, the data needs to be used and explored, a data dictionary remains unfinished, current code can be improved and I assume documentation would be appreciated.

Certainly the type of project that merits widespread public support.

I find the project interesting because once you connect the players based on this data set, folding in other sets of connections, such as school, social, club, agency, employer, will improve the value of the original data set. Topic maps of course being my preferred method for the folding.

I first saw this in a tweet by ePSIplatform.

R Client for the U.S. Federal Register API

Thursday, May 8th, 2014

R Client for the U.S. Federal Register API by Thomas Leeper.

From the webpage:

This package provides access to the API for the United States Federal Register. The API provides access to all Federal Register contents since 1994, including Executive Orders by Presidents Clinton, Bush, and Obama and all “Public Inspection” Documents made available prior to publication in the Register. The API returns basic details about each entry in the Register and provides URLs for HTML, PDF, and plain text versions of the contents thereof, and the data are fully searchable. The federalregister package provides access to all version 1 API endpoints.

If you are interested in law, policy development, or just general awareness of government activity, this is an important client for you!

More than 30 years ago I had a hard copy subscription to the Federal Register. Even then it was a mind numbing amount of detail. Today it is even worse.

This API enables any number of business models based upon quick access to current and historical Federal Register data.


Accessible Government vs. Open Government

Sunday, March 30th, 2014

Congressional Officials Grant Access Due To Campaign Contributions: A Randomized Field Experiment


Concern that lawmakers grant preferential treatment to individuals because they have contributed to political campaigns has long occupied jurists, scholars, and the public. However, the effects of campaign contributions on legislators’ behavior have proven notoriously difficult to assess. We report the first randomized field experiment on the topic. In the experiment, a political organization attempted to schedule meetings between 191 Members of Congress and their constituents who had contributed to political campaigns. However, the organization randomly assigned whether it informed legislators’ offices that individuals who would attend the meetings were contributors. Congressional offices made considerably more senior officials available for meetings when offices were informed the attendees were donors, with senior officials attending such meetings more than three times as often (p < 0.01). Influential policymakers thus appear to make themselves much more accessible to individuals because they have contributed to campaigns, even in the absence of quid pro quo arrangements. These findings have significant implications for ongoing legal and legislative debates. The hypothesis that individuals can command greater attention from influential policymakers by contributing to campaigns has been among the most contested explanations for how financial resources translate into political power. The simple but revealing experiment presented here elevates this hypothesis from extensively contested to scientifically supported.

Donors really are different from the rest of us, they have access.

One hopes the next randomized experiment distinguishes where the break points are in donations.

I suspect < $500 is one group, $500 - $1,000 is the second group, $1,000 - $2,500 is the third group and so on. Just guesses on my part but it would help the political process if potential donors had a bidding sheet for candidates. You don't want to appear foolish and pay too much for access to a junior member of Congress but on the other hand, you don't want to insult a senior member with too small of an donation. Think of it as transparency of access. I first saw this at Full Text Reports.

UK statistics and open data…

Tuesday, March 18th, 2014

UK statistics and open data: MPs’ inquiry report published Owen Boswarva.

From the post:

This morning the Public Administration Select Committee (PASC), a cross-party group of MPs chaired by Bernard Jenkin, published its report on Statistics and Open Data.

This report is the product of an inquiry launched in July 2013. Witnesses gave oral evidence in three sessions; you can read the transcripts and written evidence as well.

Useful if you are looking for rhetoric and examples of use of government data.

Ironic that just last week the news broke that Google has given British security the power to censor “unsavory” (but legal) content from Youtube. UK gov wants to censor legal but “unsavoury” YouTube content by Lisa Vaas.

Lisa writes:

Last week, the Financial Times revealed that Google has given British security the power to quickly yank terrorist content offline.

The UK government doesn’t want to stop there, though – what it really wants is the power to pull “unsavoury” content, regardless of whether it’s actually illegal – in other words, it wants censorship power.

The news outlet quoted UK’s security and immigration minister, James Brokenshire, who said that the government must do more to deal with material “that may not be illegal but certainly is unsavoury and may not be the sort of material that people would want to see or receive.”

I’m not sure why the UK government wants to block content that people don’t want to see or receive. They simply won’t look at it. Yes?

But, intellectual coherence has never been a strong point of most governments and the UK in particular of late.

Is this more evidence for my contention that “open data” for government means only the data government wants you to have?

Beyond Transparency

Tuesday, March 4th, 2014

Beyond Transparency, edited by Brett Goldstein and Lauren Dyson.

From the webpage:

The rise of open data in the public sector has sparked innovation, driven efficiency, and fueled economic development. And in the vein of high-profile federal initiatives like and the White House’s Open Government Initiative, more and more local governments are making their foray into the field with Chief Data Officers, open data policies, and open data catalogs.

While still emerging, we are seeing evidence of the transformative potential of open data in shaping the future of our civic life. It’s at the local level that government most directly impacts the lives of residents—providing clean parks, fighting crime, or issuing permits to open a new business. This is where there is the biggest opportunity to use open data to reimagine the relationship between citizens and government.

Beyond Transparency is a cross-disciplinary survey of the open data landscape, in which practitioners share their own stories of what they’ve accomplished with open civic data. It seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving. Through these stories, we examine what is needed to build an ecosystem in which open data can become the raw materials to drive more effective decision-making and efficient service delivery, spur economic activity, and empower citizens to take an active role in improving their own communities.

Let me list the titles for two (2) parts out of five (5):

  • PART 1 Opening Government Data
    • Open Data and Open Discourse at Boston Public Schools Joel Mahoney
    • Open Data in Chicago: Game On Brett Goldstein
    • Building a Smarter Chicago Dan X O’Neil
    • Lessons from the London Datastore Emer Coleman
    • Asheville’s Open Data Journey: Pragmatics, Policy, and Participation Jonathan Feldman
  • PART 2 Building on Open Data
    • From Entrepreneurs to Civic Entrepreneurs, Ryan Alfred, Mike Alfred
    • Hacking FOIA: Using FOIA Requests to Drive Government Innovation, Jeffrey D. Rubenstein
    • A Journalist’s Take on Open Data. Elliott Ramos
    • Oakland and the Search for the Open City, Steve Spiker
    • Pioneering Open Data Standards: The GTFS Story, Bibiana McHugh

Steve Spiker captures my concerns about efficacy of “open data” in his opening sentence:

At the center of the Bay Area lies an urban city struggling with the woes of many old, great cities in the USA, particularly those in the rust belt: disinvestment, white flight, struggling schools, high crime, massive foreclosures, political and government corruption, and scandals. (Oakland and the Search for the Open City)

It may well be that I agree with “open data,” in part because I have no real data to share. So any sharing of data is going to benefit me and whatever agenda I want to pursue.

People who are pursuing their own agendas without open data, have nothing to gain by an open playing field and more than a little to lose. Particularly if they are on the corrupt side of public affairs.

All the more reason to pursue open data in my view but with the understanding that every line of data access benefits some and penalizes others.

Take the long standing tradition of not publishing who meets with the President of the United States. Justified on the basis that the President needs open and frank advice from people who feel free to speak openly.

That’s one explanation. Another explanation is being clubby with media moguls would look inconvenient with the U.S. trade delegation be pushing a pro-media position, to the detriment of us all.

When open data is used to take down members of Congress, the White House, heads and staffs of agencies, it will truly have arrived.

Until then, open data is just whistling as it walks past a graveyard in the dark.

I first saw this in a tweet by ladyson.

Project Open Data

Tuesday, February 25th, 2014

Project Open Data

From the webpage:

Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public. Managing this data as an asset and making it available, discoverable, and usable – in a word, open – not only strengthens our democracy and promotes efficiency and effectiveness in government, but also has the potential to create economic opportunity and improve citizens’ quality of life.

For example, when the U.S. Government released weather and GPS data to the public, it fueled an industry that today is valued at tens of billions of dollars per year. Now, weather and mapping tools are ubiquitous and help everyday Americans navigate their lives.

The ultimate value of data can often not be predicted. That’s why the U.S. Government released a policy that instructs agencies to manage their data, and information more generally, as an asset from the start and, wherever possible, release it to the public in a way that makes it open, discoverable, and usable.

The White House developed Project Open Data – this collection of code, tools, and case studies – to help agencies adopt the Open Data Policy and unlock the potential of government data. Project Open Data will evolve over time as a community resource to facilitate broader adoption of open data practices in government. Anyone – government employees, contractors, developers, the general public – can view and contribute. Learn more about Project Open Data Governance and dive right in and help to build a better world through the power of open data.

An impressive list of tools and materials for federal (United States) agencies seeking to release data.

And as Ryan Swanstrom says in his post:

Best of all, the entire project is available on GitHub and contributions are welcomed.

Thoughts on possible contributions?

R Markdown:… [Open Analysis, successor to Open Data?]

Tuesday, February 25th, 2014

R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics by Ben Baumer,


Nolan and Temple Lang argue that “the ability to express statistical computations is an essential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation. (emphasis in original)

The author’s third point for R Markdown I would have made the first:

Third, the separation of computing from presentation is not necessarily honest… More subtly and less perniciously, the copy-and-paste paradigm enables, and in many cases even encourages, selective reporting. That is, the tabular output from R is admittedly not of presentation quality. Thus the student may be tempted or even encouraged to prettify tabular output before submitting. But while one is fi ddling with margins and headers, it is all too tempting to remove rows or columns that do not suit the student’s purpose. Since the commands used to generate the table are not present, the reader is none the wiser.

Although I have to admit that reproducibility has a lot going for it.

Can you imagine reproducible analysis from the OMB? Complete with machine readable data sets? Or for any other agency reports. Or for that matter, for all publications by registered lobbyists. That could be real interesting.

Open Analysis (OA) as a natural successor to Open Data.

That works for me.


PS: More resources:

Create Dynamic R Statistical Reports Using R Markdown

R Markdown

Using R Markdown with RStudio

Writing papers using R Markdown

If journals started requiring R Markdown as a condition for publication, some aspects of research would become more transparent.

Some will say that authors will resistl

Assume Science or Nature has accepted your article on the condition of your use of R Markdown.

Honestly, are you really going to say no?

I first saw this in a tweet by Scott Chamberlain.


Saturday, February 22nd, 2014

OpenRFPs: Open RFP Data for All 50 States by Clay Johnson.

From the post:

Tomorrow at CodeAcross we’ll be launching our first community-based project, OpenRFPs. The goal is to liberate the data inside of every state RFP listing website in the country. We hope you’ll find your own state’s RFP site, and contribute a parser.

The Department of Better Technology’s goal is to improve the way government works by making it easier for small, innovative businesses to provide great technology to government. But those businesses can barely make it through the front door when the RFPs themselves are stored in archaic systems, with sloppy user interfaces and disparate data formats, or locked behind paywalls.

I have posted to the announcement suggesting they use UBL. But in any event, mapping the semantics of RFPs, to enable wider participation would make an interesting project.

I first saw this in a tweet by Tim O’Reilly.

Fiscal Year 2015 Budget (US) Open Government?

Friday, February 21st, 2014

Fiscal Year 2015 Budget

From the description:

Each year, the Office of Management and Budget (OMB) prepares the President’s proposed Federal Government budget for the upcoming Federal fiscal year, which includes the Administration’s budget priorities and proposed funding.

For Fiscal Year (FY) 2015– which runs from October 1, 2014, through September 30, 2015– OMB has produced the FY 2015 Federal Budget in four print volumes plus an all-in-one CD-ROM:

  1. the main “Budget” document with the Budget Message of the President, information on the President’s priorities and budget overviews by agency, and summary tables;
  2. “Analytical Perspectives” that contains analyses that are designed to highlight specified subject areas;
  3. “Historical Tables” that provides data on budget receipts, outlays, surpluses or deficits, Federal debt over a time period
  4. an “Appendix” with detailed information on individual Federal agency programs and appropriation accounts that constitute the budget.
  5. A CD-ROM version of the Budget is also available which contains all the FY 2015 budget documents in PDF format along with some additional supporting material in spreadsheet format.

You will also want a “Green Book,” the 2014 version carried this description:

Each February when the President releases his proposed Federal Budget for the following year, Treasury releases the General Explanations of the Administration’s Revenue Proposals. Known as the “Green Book” (or Greenbook), the document provides a concise explanation of each of the Administration’s Fiscal Year 2014 tax proposals for raising revenue for the Government. This annual document clearly recaps each proposed change, reviewing the provisions in the Current Law, outlining the Administration’s Reasons for Change to the law, and explaining the Proposal for the new law. Ideal for anyone wanting a clear summary of the Administration’s policies and proposed tax law changes.

Did I mention that the four volumes for the budget in print with CD-ROM are $250? And last year the Green Book was $75?

For $325.00, you can have a print and pdf of the Budget plus a print copy of the Green Book.


  1. Would machine readable versions of the Budget + Green Book make it easier to explore and compare the information within?
  2. Are PDFs and print volumes what President Obama considers to be “open government?”
  3. Who has the advantage in policy debates, the OMB and Treasury with machine readable versions of these documents or the average citizen who has the PDFs and print?
  4. Do you think OMB and Treasury didn’t get the memo? Open Data Policy-Managing Information as an Asset

Public policy debates cannot be fairly conducted without meaningful access to data on public policy issues.

Data Analytic Recidivism Tool (DART) [DAFT?]

Sunday, December 29th, 2013

Data Analytic Recidivism Tool (DART)

From the website:

The Data Analytic Recidivism Tool (DART) helps answer questions about recidivism in New York City.

  • Are people that commit a certain type of crime more likely to be re-arrested?
  • What about people in a certain age group or those with prior convictions?

DART lets users look at recidivism rates for selected groups defined by characteristics of defendants and their cases.

A direct link to the DART homepage.

After looking at the interface, which groups recidivists in groups of 250, I’m not sure DART is all that useful.

It did spark an idea that might help with the federal government’s acquisition problems.

Why not create the equivalent of DART but call it:

Data Analytic Failure Tool (DAFT).

And in DAFT track federal contractors, their principals, contracts, and the program officers who play any role in those contracts.

So that when contractors fail, as so many of them do, it will be easy to track the individuals involved on both sides of the failure.

And every contract will have a preamble that recites any prior history of failure and the people involved in that failure, on all sides.

Such that any subsequent supervisor has to sign off with full knowledge of the prior lack of performance.

If criminal recidivism is to be avoided, shouldn’t failure recidivism be avoided as well?

Better Corporate Data!

Monday, July 15th, 2013

Announcing open corporate network data: not just good, but better

OpenCorporates announces three projects:

1. An open data corporate network platform

The most important part is a new platform for collecting, collating and allowing access to different types of corporate relationship data – subsidiary data, parent company data, and shareholding data. This means that governments around the world (and companies too) can publish corporate network data and they can be combined in a single open-data repository, for a more complete picture. We think this is a game-changer, as it not only allows seamless, lightweight co-operation, but will identify errors and contradictions. We’ll be blogging about the platform in more details over the coming weeks, but it’s been a genuinely hard computer-science problem that has resulted in some really innovative work.

2. Three key initial datasets


The shareholder data from the New Zealand company register, for example, is granular and up to date, and if you have API access is available as data. It talks about parental control, often to very granular data, and importing this data allows you to see not just shareholders (which you can also see on the NZ Companies House pages) but also what companies are owned by another company (which you can’t). And it’s throwing up some interesting examples, of which more in a later blog post.

The data from the Federal Reserve’s National Information Center is also fairly up to date, but is (for the biggest banks) locked away in horrendous PDFs and talks about companies controlled by other companies.

The data from the 10-K and 20-F filings from the US Securities and Exchange Commission is the most problematic of all, being published once a year, as arbitrary text (pretty shocking in the 21st century for this still to be the case), and talks about ‘significant subsidiaries’.


3. An example of the power of this dataset.

We think just pulling the data together as open data is pretty cool, and that many of the best uses will come from other users (we’re going to include the data in the next version of our API in a couple of weeks). But we’ve built in some network visualisations to allow the information to be explored. Check out Barclays Bank PLC, Pearson PLC, The Gap or Starbucks.

OpenCorporates is engineering the critical move between “open data,” ho-hum, to “corporate visibility using open data.”

Not quite to the point of “accountability” but you have to identity evil doers before they can be held accountable.

A project that merits your interest, donations and support. Please pass this on. Thanks!

Open Government and Benghazi Emails

Thursday, May 16th, 2013

The controversy over the “Benghazi emails” is a good measure of what the Obama Administration means by “open government.”

News of the release of the Benghazi emails broke yesterday, NPR, USA Today, among others.

I saw the news at Benghazi Emails Released, Wall Street Journal. PDF of the emails

If you go to and search for “Benghazi emails,” can you find the White House release of the emails?

I thought not.

The emails show congressional concern over the “talking points” on Benghazi to be a tempest in a teapot, as many of us already suspected.

Early release of the emails would have avoided some of the endless discussion rooted in congressional ignorance and bigotry.

But, the Obama administration has so little faith in “open government” that it conceals information that would be to its advantage if revealed.

Now imagine how the Obama administration must view information that puts it at a disadvantage.

Does that help to clarify the commitment of the Obama administration to open government?

It does for me.

Open Data: The World Bank Data Blog

Wednesday, March 20th, 2013

Open Data: The World Bank Data Blog

In case you are following open data/government issues, you will want to add this blog to your RSS feed.

Not a high traffic blog but with twenty-seven contributing authors, you get a diversity of viewpoints.

Not to mention that the World Bank is a great source for general data.

I persist in thinking that transparency means identifying individuals responsible for decisions, expenditures and the beneficiaries of those decisions and expenditures.

That isn’t a popular position among those who make decisions and approve expenditures for unidentified beneficiaries.

You will either have to speculate on your own or ask someone else why that is an unpopular position.

The Biggest Failure of Open Data in Government

Monday, March 18th, 2013

Many open data initiatives forget to include the basic facts about the government itself by Philip Ashlock.

From the post:

In the past few years we’ve seen a huge shift in the way governments publish information. More and more governments are proactively releasing information as raw open data rather than simply putting out reports or responding to requests for information. This has enabled all sorts of great tools like the ones that help us find transportation or the ones that let us track the spending and performance of our government. Unfortunately, somewhere in this new wave of open data we forgot some of the most fundamental information about our government, the basic “who”, “what”, “when”, and “where”.

US map

Do you know all the different government bodies and districts that you’re a part of? Do you know who all your elected officials are? Do you know where and when to vote or when the next public meeting is? Now perhaps you’re thinking that this information is easy enough to find, so what does this have to do with open data? It’s true, it might not be too hard to learn about the highest office or who runs your city, but it usually doesn’t take long before you get lost down the rabbit hole. Government is complex, particularly in America where there can be a vast multitude of government districts and offices at the local level.

How can we have a functioning democracy when we don’t even know the local government we belong to or who our democratically elected representatives are? It’s not that Americans are simply too ignorant or apathetic to know this information, it’s that the system of government really is complex. With what often seems like chaos on the national stage it can be easy to think of local government as simple, yet that’s rarely the case. There are about 35,000 municipal governments in the US, but when you count all the other local districts there are nearly 90,000 government bodies (US Census 2012) with a total of more than 500,000 elected officials (US Census 1992). The average American might struggle to name their representatives in Washington D.C., but that’s just the tip of the iceberg. They can easily belong to 15 government districts with more than 50 elected officials representing them.

We overlook the fact that it’s genuinely difficult to find information about all our levels of government. We unconsciously assume that this information is published on some government website well enough that we don’t need to include it as part of any kind of open data program

Yes, the number of subdivisions of government and the number of elected officials are drawn from two different census reports, the first from the 2012 census and the second from the 1992 census, a gap of twenty (20) years.

The Census bureau has the 1992 list, saying:

1992 (latest available) 1992 Census of Governments vol. I no. 2 [PDF, 2.45MB] * Report has been discontinued

Makes me curious why such a report would be discontinued?

A report that did not address the various agencies, offices, etc. that are also part of various levels of government.

Makes me think you need an “insider” and/or a specialist just to navigate the halls of government.

Philip’s post illustrates that “open data” dumps from government are distractions from more effective questions of open government.

Questions such as:

  • Which officials have authority over what questions?
  • How to effectively contact those officials?
  • What actions are under consideration now?
  • Rules and deadlines for comments on actions?
  • Hearing and decision calendars?
  • Comments and submissions by others?
  • etc.

It never really is “…the local board of education (substitute your favorite board) decided….” but “…member A, B, D, and F decided that….”

Transparency means not allowing people and their agendas to hide behind the veil of government.

From President Obama, The Opaque

Thursday, February 28th, 2013

Leaked BLM Draft May Hinder Public Access to Chemical Information

From the post:

On Feb. 8, EnergyWire released a leaked draft proposal from the U.S. Department of the Interior’s Bureau of Land Management on natural gas drilling and extraction on federal public lands. If finalized, the proposal could greatly reduce the public’s ability to protect our resources and communities. The new draft indicates a disappointing capitulation to industry recommendations.

The draft rule affects oil and natural gas drilling operations on the 700 million acres of public land administered by BLM, plus 56 million acres of Indian lands. This includes national forests, which are the sources of drinking water for tens of millions of Americans, national wildlife refuges, and national parks, which are widely used for recreation.

The Department of the Interior estimates that 90 percent of the 3,400 wells drilled each year on public and Indian lands use natural gas fracking, a process that pumps large amounts of water, sand, and toxic chemicals into gas wells at very high pressure to cause fissures in shale rock that contains methane gas. Fracking fluid is known to contain benzene (which causes cancer), toluene, and other harmful chemicals. Studies link fracking-related activities to contaminated groundwater, air pollution, and health problems in animals and humans.

If the leaked draft is finalized, the changes in chemical disclosure requirements would represent a major concession to the oil and gas industry. The rule would allow drilling companies to report the chemicals used in fracking to an industry-funded website, called Though the move by the federal government to require online disclosure is encouraging, the choice of FracFocus as the vehicle is problematic for many reasons.

First, the site is not subject to federal laws or oversight. The site is managed by the Ground Water Protection Council (GWPC) and the Interstate Oil and Gas Compact Commission (IOGCC), nonprofit intergovernmental organizations comprised of state agencies that promote oil and gas development. However, the site is paid for by the American Petroleum Institute and America’s Natural Gas Alliance, industry associations that represent the interests of member companies.

BLM would have little to no authority to ensure the quality and accuracy of the data reported directly to such a third-party website. Additionally, the data will not be accessible through the Freedom of Information Act since BLM is not collecting the information. The IOGCC has already declared that it is not subject to federal or state open records laws, despite its role in collecting government-mandated data.

Second, makes it difficult for the public to use the data on wells and chemicals. The leaked BLM proposal fails to include any provisions to ensure minimum functionality on searching, sorting, downloading, or other mechanisms to make complex data more usable. Currently, the site only allows users to download PDF files of reports on fracked wells, which makes it very difficult to analyze data in a region or track chemical use. Despite some plans to improve searching on, the oil and gas industry opposes making chemical data easier to download or evaluate for fear that the public “might misinterpret it or use it for political purposes.”

Don’t you feel safer? Knowing the oil and gas industry is working so hard to protect you from misinterpreting data?

Why the government is helping the oil and gas industry protect us from data I cannot say.

I mention this an example of testing for “transparency.”

Anything the government freely makes available with spreadsheet capabilities, isn’t transparency. It’s distraction.

Any data that the government tries to hide, that data has potential value.

The Center for Effective Government points out these are draft rules and when published, you need to comment.

Not a bad plan but not very reassuring given the current record of President Obama, the Opaque.

Alternatives? Suggestions for how data mining could expose those who own floors of the BLM, who drill the wells, etc?

Competition: visualise open government data and win $2,000

Wednesday, February 13th, 2013

Competition: visualise open government data and win $2,000 by Simon Rogers.

Closing date: 23:59 BST on 2 April 2013

What can you do with the thousands of open government datasets? With Google and Open Knowledge Foundation we are launching a competition to find the best dataviz out there. You might even win a prize.

(graphic omitted)

Governments around the world are releasing a tidal wave of open data – on everything from spending through to crime and health. Now you can compare national, regional and city-wide data from hundreds of locations around the world.

But how good is this data? We want to see what you can do with it. What apps and visualisations can you make with this data? We want to see how the data changes the way you see the world.

In conjunction with Google and the Open Knowledge Foundation (who will be helping us judge the results), see if you can win the $2,000 prize.

All we want you to do is to take an open dataset from any government open data website (there’s a list of them at the bottom of this article) and visualise it.

The competition is open to citizens of the UK, US, France, Germany, Spain, Netherlands, Sweden. The winner will take home $2,000 and the result will be published on the Guardian Datastore on our Show and Tell site.

Here are some of the key datasets we’ve found (list below) – and feel free to bring your own data to the party – we only ask that it is freely available and open as in

You are visualizing data anyway, why not take a chance on free PR and $2,000?

O’Reilly’s Open Government book [“…more equal than others” pigs]

Monday, January 21st, 2013

We’re releasing the files for O’Reilly’s Open Government book by Laurel Ruma.

From the post:

I’ve read many eloquent eulogies from people who knew Aaron Swartz better than I did, but he was also a Foo and contributor to Open Government. So, we’re doing our part at O’Reilly Media to honor Aaron by posting the Open Government book files for free for anyone to download, read and share.

The files are posted on the O’Reilly Media GitHub account as PDF, Mobi, and EPUB files for now. There is a movement on the Internet (#PDFtribute) to memorialize Aaron by posting research and other material for the world to access, and we’re glad to be able to do this.

You can find the book here:

Daniel Lathrop, my co-editor on Open Government, says “I think this is an important way to remember Aaron and everything he has done for the world.” We at O’Reilly echo Daniel’s sentiment.

Be sure to read Chapter 25, “When Is Transparency Useful?”, by the late Aaron Swartz.

It includes this passage:

…When you create a regulatory agency, you put together a group of people whose job is to solve some problem. They’re given the power to investigate who’s breaking the law and the authority to punish them. Transparency, on the other hand, simply shifts the work from the government to the average citizen, who has neither the time nor the ability to investigate these questions in any detail, let alone do anything about it. It’s a farce: a way for Congress to look like it has done something on some pressing issue without actually endangering its corporate sponsors.

As a tribute to Aaron, are you going to dump data on the WWW or enable the calling of “more equal than others” pigs to account?

… ‘disappointed’ with open data use

Tuesday, November 20th, 2012

Prime minister’s special envoy ‘disappointed’ with open data use by Derek du Preez.

From the post:

Prime Minister David Cameron’s special envoy on the UN’s post-2015 development goals has said that he is ‘disappointed’ by how much the government’s open datasets have been used so far.

Speaking at a Reform event in London this week on open government and data transparency, Anderson said he recognises that the public sector needs to improve the way it pushes out the data so that it is easier to use.

“I am going to be really honest with you. As an official in a government department that has worked really hard to get a lot of data out in the last two years, I have been pretty disappointed by how much it has been used,” he said.

Easier to use data is one issue.

But the expectation that effort making data open = people interested in using it is another.

The article later reports there are 9,000 datasets available at

How relevant to every day concerns are those 9,000 datasets?

When the government starts disclosing the financial relationships between members of government, their families and contributors, I suspect interest in open data will go up.

Code for America: open data and hacking the government

Tuesday, October 9th, 2012

Code for America: open data and hacking the government by Rachel Perkins.

From the post:

Last week, I attended the Code for America Summit here in San Francisco. I attended as a representative of Splunk>4Good (we sponsored the event via a nice outdoor patio lounge area and gave away some of our (in)famous tshirts and a few ponies). Since this wasn’t your typical “conference”, and I’m not so great at schmoozing, i was a little nervous–what would Christy Wilson, Clint Sharp, and I do there? As it turned out, there were so many amazing takeaways and so much potential for awesomeness that my nervousness was totally unfounded.

So what is Code for America?

Code for America is a program that sends technologists (who take a year off and apply to their Fellowship program) to cities throughout the US to work with advocates in city government. When they arrive, they spend a few weeks touring the city and its outskirts, meeting residents, getting to know the area and its issues, and brainstorming about how the city can harness its public data to improve things. Then they begin to hack.
Some of these partnerships have come up with amazing tools–for example,

  • Opencounter Santa Cruz mashes up several public datasets to provide tactical and strategic information for persons looking to start a small business: what forms and permits you’ll need, zoning maps with overlays of information about other businesses in the area, and then partners with to help you find commercial space for rent that matches your zoning requirements.
  • Another Code for America Fellow created, which uses public data in New Orleans to inform residents about the status and plans for blighted properties in their area.
  • Other apps from other cities do cool things like help city maintenance workers prioritize repairs of broken streetlights based on other public data like crime reports in the area, time of day the light was broken, and number of other broken lights in the vicinity, or get the citizenry involved with civic data, government, and each other by setting up a Stack Exchange type of site to ask and answer common questions.

Whatever your view data sharing by the government, too little, too much, just right, Rachel points to good things can come from open data.

Splunk has a “corporate responsibility program: Splunk>4Good.

Check it out!

BTW, do you have a topic maps “corporate responsibility” program?

Yu and Robinson on The Ambiguity of “Open Government”

Saturday, August 11th, 2012

Yu and Robinson on The Ambiguity of “Open Government”

Legal Informatics calls our attention to the use of ambiguity to blunt, at least in one view, the potency of the phrase “open government.”

Whatever your politics, it is a reminder that for good or ill, semantics originate with us.

Topic maps are one tool to map those semantics, to remove (or enhance) ambiguity.