Archive for the ‘Government Data’ Category

SODA Developers

Wednesday, January 14th, 2015

SODA Developers

From the webpage:

The Socrata Open Data API allows you to programatically access a wealth of open data resources from governments, non-profits, and NGOs around the world.

I have mentioned Socrata and their Open Data efforts more than once on this blog but I don’t think I have ever pointed to their developer site.

Very much worth spending time here if you are interested in governmental data.

Not that I take any data, government or otherwise, at face value. Data is created and released/leaked for reasons that may or may not coincide with your assumptions or goals. Access to data is just the first step in uncovering whose interests the data represents.

Project Open Data Dashboard

Sunday, January 4th, 2015

Project Open Data Dashboard

From the about page:

This website shows how Federal agencies are performing on the latest Open Data Policy (M-13-13) using the guidance provided by Project Open Data. It also provides many other other tools and resources to help agencies and other interested parties implement their open data programs. Features include:

  • A dashboard to track the progress of agencies implementing Project Open Data on a quarterly basis
  • Automated analysis of URLs provided within metadata to see if the links work as expected
  • A validator for v1.0 and v1.1 of the Project Open Data Metadata Schema
  • A converter to transform CSV files into JSON as defined by the Project Open Data Metadata Schema Link broken as of 4 January 2014. Site notified.
  • An export API to export from the CKAN API and transform the metadata into JSON as defined by the Project Open Data Metadata Schema
  • A changeset viewer to compare a data.json file to the metadata currently available in CKAN (eg

You can learn more by reading the main documentation page.

The main documentation defines the “Number of Datasets” on the dashboard as:

This element accounts for the total number of all datasets listed in the Enterprise Data Inventory. This includes those marked as “Public”, “Non-Public” and “Restricted”.

If you compare the “Milestone – May 31st 2014″ to November, the number of data sets increases in most cases, as you would expect. However, both the Department of Commerce and the Department of Health and Human Services, had decreases in the number of available data sets.

On May 31st, the Department of Commerce listed 20488 data sets but on November 30th, only 372. A decrease of more than 20,000 data sets.

On May 31st, the Department of Health and Human Services listed 1507 data sets but on November 30th, only 1064, a decrease of 443 data sets.

Looking further, the sudden decrease for both agencies occurred between Milestone 3 and Milestone 4 (August 31st 2014).

Sounds exciting! Yes?

Yes, but this illustrates why you should “drill down” in data whenever possible. And if not possible in interface, check other sources.

I followed the Department of Commerce link (the first column on the left) to the details of the crawl and thence the data link to determine the number of publicly available data sets.

As of today, 04 January 2014, the Department of Commerce has 23,181 datasets and not the 372 reported for Milestones 5 or the 268 reported for Milestone 4.

As of today, 04 January 2014, the Department of Health and Human Services has 1,672 datasets and not the 1064 reported for Milestones 5 or the 1088 reported for Milestone 4.

The reason(s) for the differences are unclear and the dashboard itself offers no explanation for the disparate figures. I suspect there is some glitch in the automatic harvesting of the information and/or in the representation of those results in the dashboard.

Always remember that just because a representation* claims some “fact,” that doesn’t necessarily make it so.

*Representation: Bear in mind that anything you see on a computer screen is a “representation.” There isn’t anything in data storage that has any resemblance to what you see on the screen. Choices have been made out of your sight as to how information will be represented to you.

As I mentioned yesterday, there is a common and naive assumption that data as represented to us has a reliable correspondence with data held in storage. And that the data held in storage has a reliable correspondence to data as entered or obtained from other sources.

Those assumptions aren’t unreasonable, at least until they are. Can you think of ways to illustrate those principles? I ask because at least one way to illustrate those principles makes an excellent case for open source software. More on that anon.

U.S. Appropriations by Fiscal Year

Wednesday, December 31st, 2014

U.S. Appropriations by Fiscal Year

Congressdotgov tweeted about this resource earlier today.

It’s a great starting place for research on U.S. appropriations but it is more of a bulk resource than a granular one.

You will have to wade through this resource and many others to piece together some of the details on any particular line item in the budget. Not surprisingly, anyone interested in the same line item will have to repeat that mechanical process. For every line in the budget.

There are collected resources on different aspects of the budget process, hearing documents, campaign donation records, etc. but they are for the most part all separated and not easily collated. Perhaps that is due to lack of foresight. Perhaps.

In any event, it is a starting place if you have a particular line item in mind. Think about creating a result that can be re-used and shared if at all possible.

Collection of CRS reports released to the public

Friday, December 19th, 2014

Collection of CRS reports released to the public by Kevin Kosar.

From the post:

Something rare has occurred—a collection of reports authored by the Congressional Research Service has been published and made freely available to the public. The 400-page volume, titled, “The Evolving Congress,” and was produced in conjunction with CRS’s celebration of its 100th anniversary this year. Congress, not CRS, published it. (Disclaimer: Before departing CRS in October, I helped edit a portion of the volume.)

The Congressional Research Service does not release its reports publicly. CRS posts its reports at, a website accessible only to Congress and its staff. The agency has a variety of reasons for this policy, not least that its statute does not assign it this duty. Congress, with ease, could change this policy. Indeed, it already makes publicly available the bill digests (or “summaries”) CRS produces at

The Evolving Congress” is a remarkable collection of essays that cover a broad range of topic. Readers would be advised to start from the beginning. Walter Oleszek provides a lengthy essay on how Congress has changed over the past century. Michael Koempel then assesses how the job of Congressman has evolved (or devolved depending on one’s perspective). “Over time, both Chambers developed strategies to reduce the quantity of time given over to legislative work in order to accommodate Members’ other duties,” Koempel observes.

The NIH (National Institutes of Health) requires that NIH funded research be made available to the public. Other government agencies are following suite. Isn’t it time for the Congressional Research Service to make its publicly funded research available to the public that paid for it?

Congress needs to require it. Contact your member of Congress today. Ask for all Congressional Research Service reports, past, present and future be made available to the public.

You have already paid for the reports, why shouldn’t you be able to read them?

Senate Joins House In Publishing Legislative Information In Modern Formats [No More Sneaking?]

Friday, December 19th, 2014

Senate Joins House In Publishing Legislative Information In Modern Formats by Daniel Schuman.

From the post:

There’s big news from today’s Legislative Branch Bulk Data Task Force meeting. The United States Senate announced it would begin publishing text and summary information for Senate legislation, going back to the 113th Congress, in bulk XML. It would join the House of Representatives, which already does this. Both chambers also expect to have bill status information available online in XML format as well, but a little later on in the year.

This move goes a long way to meet the request made by a coalition of transparency organizations, which asked for legislative information be made available online, in bulk, in machine-processable formats. These changes, once implemented, will hopefully put an end to screen scraping and empower users to build impressive tools with authoritative legislative data. A meeting to spec out publication methods will be hosted by the Task Force in late January/early February.

The Senate should be commended for making the leap into the 21st century with respect to providing the American people with crucial legislative information. We will watch closely to see how this is implemented and hope to work with the Senate as it moves forward.

In addition, the Clerk of the House announced significant new information will soon be published online in machine-processable formats. This includes data on nominees, election statistics, and members (such as committee assignments, bioguide IDs, start date, preferred name, etc.) Separately, House Live has been upgraded so that all video is now in H.264 format. The Clerk’s website is also undergoing a redesign.

The Office of Law Revision Counsel, which publishes the US Code, has further upgraded its website to allow pinpoint citations for the US Code. Users can drill down to the subclause level simply by typing the information into their search engine. This is incredibly handy.

This is great news!

Law is a notoriously opaque domain and the process of creating it even more so. Getting the data is a great first step, parsing out steps in the process and their meaning is another. To say nothing of the content of the laws themselves.

Still, progress is progress and always welcome!

Perhaps citizen review will stop the Senate from sneaking changes past sleepy members of the House.

GovTrack’s Summer/Fall Updates

Thursday, December 18th, 2014

GovTrack’s Summer/Fall Updates by Josh Tauberer.

From the post:

Here’s what’s been improved on GovTrack in the summer and fall of this year.


  • Permalinks to individual paragraphs in bill text is now provided (example).
  • We now ask for your congressional district so that we can customize vote and bill pages to show how your Members of Congress voted.
  • Our bill action/status flow charts on bill pages now include activity on certain related bills, which are often crucially important to the main bill.
  • The bill cosponsors list now indicates when a cosponsor of a bill is no longer serving (i.e. because of retirement or death).
  • We switched to gender neutral language when referring to Members of Congress. Instead of “congressman/woman”, we now use “representative.”
  • Our historical votes database (1979-1989) from was refreshed to correct long-standing data errors.
  • We dropped support for Internet Explorer 6 in order to address with POODLE SSL security vulnerability that plagued most of the web.
  • We dropped support for Internet Explorer 7 in order to allow us to make use of more modern technologies, which has always been the point of GovTrack.

The comment I posted was:

Great work! But I read the other day about legislation being “snuck” by the House (Senate changes), US Congress OKs ‘unprecedented’ codification of warrantless surveillance.

Do you have plans for a diff utility that warns members of either house of changes to pending legislation?

In case you aren’t familiar with

From the about page:, a project of Civic Impulse, LLC now in its 10th year, is one of the worldʼs most visited government transparency websites. The site helps ordinary citizens find and track bills in the U.S. Congress and understand their representatives’ legislative record.

In 2013, was used by 8 million individuals. We sent out 3 million legislative update email alerts. Our embeddable widgets were deployed on more than 80 official websites of Members of Congress.

We bring together the status of U.S. federal legislation, voting records, congressional district maps, and more (see the table at the right).
and make it easier to understand. Use GovTrack to track bills for updates or get alerts about votes with email updates and RSS feeds. We also have unique statistical analyses to put the information in context. Read the «Analysis Methodology».

GovTrack openly shares the data it brings together so that other websites can build other tools to help citizens engage with government. See the «Developer Documentation» for more.

Melville House to Publish CIA Torture Report:… [Publishing Gone Awry?]

Tuesday, December 16th, 2014

Melville House to Publish CIA Torture Report: An Interview with Publisher Dennis Johnson by Jonathon Sturgeon.

From the post:

In what must be considered a watershed moment in contemporary publishing, Brooklyn-based independent publisher Melville House will release the Senate Intelligence Committee’s executive summary of a government report — “Study of the Central Intelligence Agency’s Detention and Interrogation Program” — that is said to detail the monstrous torture methods employed by the Central Intelligence Agency in its counter-terrorism efforts.

Melville House’s co-publisher and co-founder Dennis Johnson has called the report “probably the most important government document of our generation, even one of the most significant in the history of our democracy.”

Melville House’s press release confirms that they are releasing both print and digital editions on December 30, 2014.

As of December 30, 2014, I can read and mark my copy, print or digital and you can mark your copy, print or digital, but no collaboration on the torture report.

For the “…most significant [document] in the history of our democracy” that seems rather sad. That is that each of us is going to be limited to whatever we know or can find out when we are reading our copies of the same report.

If there was ever a report (and there have been others) that merited a collaborative reading/annotation, the CIA Torture Report would be one of them.

Given the large number of people who worked on this report and the diverse knowledge required to evaluate it, that sounds like bad publishing choices. Or at least that there are better publishing choices available.

What about casting the entire report into the form of wiki pages, broken down by paragraphs? Once proofed, the original text can be locked and comments only allowed on the text. Free to view but $fee to comment.

What do you think? Viable way to present such a text? Other ways to host the text?

PS: Unlike other significant government reports, major publishing houses did not receive incentives to print the report. Jerry attributes that to Dianne Feinstein not wanting to favor any particular publisher. That’s one explanation. Another would be that if published in hard copy at all, a small press will mean it fades more quickly from public view. Your call.

US Congress OKs ‘unprecedented’ codification of warrantless surveillance

Tuesday, December 16th, 2014

US Congress OKs ‘unprecedented’ codification of warrantless surveillance by Lisa Vaas.

From the post:

Congress last week quietly passed a bill to reauthorize funding for intelligence agencies, over objections that it gives the government “virtually unlimited access to the communications of every American”, without warrant, and allows for indefinite storage of some intercepted material, including anything that’s “enciphered”.

That’s how it was summed up by Rep. Justin Amash, a Republican from Michigan, who pitched and lost a last-minute battle to kill the bill.

The bill is titled the Intelligence Authorization Act for Fiscal Year 2015.

Amash said that the bill was “rushed to the floor” of the house for a vote, following the Senate having passed a version with a new section – Section 309 – that the House had never considered.

Lisa reports that the bill codifies Executive Order 12333, a Ronald Reagan remnant from an earlier attempt to dismantle the United States Constitution.

There is a petition underway to ask President Obama to veto the bill. Are you a large bank? Skip the petition and give the President a call.

From Lisa’s report, it sounds like Congress needs a DEW Line for legislation:

Rep. Zoe Lofgren, a California Democrat who voted against the bill, told the National Journal that the Senate’s unanimous passage of the bill was sneaky and ensured that the House would rubberstamp it without looking too closely:

If this hadn’t been snuck in, I doubt it would have passed. A lot of members were not even aware that this new provision had been inserted last-minute. Had we been given an additional day, we may have stopped it.

How do you “sneak in” legislation in a public body?

Suggestions on an early warning system for changes to legislation between the two houses of Congress?

Global Open Data Index

Friday, December 12th, 2014

Global Open Data Index

From the about page:

For more information on the Open Data Index, you may contact the team at:

Each year, governments are making more data available in an open format. The Global Open Data Index tracks whether this data is actually released in a way that is accessible to citizens, media and civil society and is unique in crowd-sourcing its survey of open data releases around the world. Each year the open data community and Open Knowledge produces an annual ranking of countries, peer reviewed by our network of local open data experts.

Crowd-sourcing this data provides a tool for communities around the world to learn more about the open data available locally and by country, and ensures that the results reflect the experience of civil society in finding open information, rather than government claims. it also ensures that those who actually collect the information that builds the Index are the very people who use the data and are in a strong position to advocate for more and higher quality open data.

The Global Open Data Index measures and benchmarks the openness of data around the world, and then presents this information in a way that is easy to understand and use. This increases its usefulness as an advocacy tool and broadens its impact.

In 2014 we are expanding to more countries (from 70 in 2013) with an emphasis on countries of the Global South.

See the blog post launching the 2014 Index. For more information, please see the FAQ and the methodology section. Join the conversation with our Open Data Census discussion list.

It is better to have some data rather than none but look at the data by which countries are ranked for openness:

Transport Timetables, Government Budget, Government Spending, Election Results, Company Register, National Map, National Statistics, Postcodes/Zipcodes, Pollutant Emissions.

A listing of data that results in the United Kingdom with a 97% score and first place.

It is hard to imagine a less threatening set of data than those listed. I am sure someone will find a use for them but in the great scheme of things, they are a distraction from the data that isn’t being released.

Off-hand, in the United States at least, public data should include who meets with appointed or elected members of government along with transcripts of those meetings (including phone calls). It should also include all personal or corporate donations made to any organization for any reason of greater than $100.00. It should include documents prepared and/or submitted to the U.S. government and its agencies. And those are just the ones that come to mind rather quickly.

Current disclosures by the U.S. government are a fiction of openness that conceals a much larger dark data set, waiting to be revealed at some future date.

I first saw this in a tweet by ChemConnector.

Timeline of sentences from the CIA Torture Report

Wednesday, December 10th, 2014

Chris R. Albon has created a timeline of sentences from the CIA torture report!


1997,”The FBI information included that al-Mairi’s brother “”traveled to Afghanistan in 1997-1998 to train in Bin – Ladencamps.”””
1997,”The FBI information included that al-Mairi’s brother “”traveled to Afghanistan in 1997-1998 to train in Bin – Ladencamps.”””
1997,”For example, on October 12, 2004, another CIA detainee explained how he met al-Kuwaiti at a guesthouse that was operated by Ibn Shaykh al-Libi and Abu Zubaydah in 1997.”

Cleanly imports into Apache OpenOffice Calc and is 6163 rows (after subtracting the header).

Please acknowledge Chris if you use this data.

What other data would you pull from the executive summary?

What other data do you think would convince Senator Udall to release the entire 6,000 page report?

A Tranche of Climate Data

Wednesday, December 10th, 2014

FACT SHEET: Harnessing Climate Data to Boost Ecosystem & Water Resilience

From the document:

Today, the Administration is making a new tranche of data about ecosystems and water resilience available as part of the Climate Data Initiative—including key datasets related water quality, streamflow, land cover, soils, and biodiversity.

In addition to the datasets being added today to, the Department of Interior (DOI) is launching a suite of geospatial mapping tools on that will enable users to visualize and overlay datasets related to ecosystems, land use, water, and wildlife. Together, the data and tools unleashed today will help natural-resource managers, decision makers, and communities on the front lines of climate change build resilience to climate impacts and better plan for the future. (emphasis added)

I had to look up “tranche.” Google offers: “a portion of something, especially money.”

Assume that your contacts and interactions with both sites are monitored and recorded.

Treasury Island: the film

Tuesday, November 25th, 2014

Treasury Island: the film by Lauren Willmott, Boyce Keay, and Beth Morrison.

From the post:

We are always looking to make the records we hold as accessible as possible, particularly those which you cannot search for by keyword in our catalogue, Discovery. And we are experimenting with new ways to do it.

The Treasury series, T1, is a great example of a series which holds a rich source of information but is complicated to search. T1 covers a wealth of subjects (from epidemics to horses) but people may overlook it as most of it is only described in Discovery as a range of numbers, meaning it can be difficult to search if you don’t know how to look. There are different processes for different periods dating back to 1557 so we chose to focus on records after 1852. Accessing these records requires various finding aids and multiple stages to access the papers. It’s a tricky process to explain in words so we thought we’d try demonstrating it.

We wanted to show people how to access these hidden treasures, by providing a visual aid that would work in conjunction with our written research guide. Armed with a tablet and a script, we got to work creating a video.

Our remit was:

  • to produce a video guide no more than four minutes long
  • to improve accessibility to these records through a simple, step-by–step process
  • to highlight what the finding aids and documents actually look like

These records can be useful to a whole range of researchers, from local historians to military historians to social historians, given that virtually every area of government action involved the Treasury at some stage. We hope this new video, which we intend to be watched in conjunction with the written research guide, will also be of use to any researchers who are new to the Treasury records.

Adding video guides to our written research guides are a new venture for us and so we are very keen to hear your feedback. Did you find it useful? Do you like the film format? Do you have any suggestions or improvements? Let us know by leaving a comment below!

This is a great illustration that data management isn’t something new. The Treasury Board has kept records since 1557 and has accumulated a rather extensive set of materials.

The written research guide looks interesting but since I am very unlikely to ever research Treasury Board records, I am unlikely to need it.

However, the authors have anticipated that someone might be interested in process of record keeping itself and so provided this additional reference:

Thomas L Heath, The Treasury (The Whitehall Series, 1927, GP Putnam’s Sons Ltd, London and New York)

That would be an interesting find!

I first saw this in a tweet by Andrew Janes.

Would You Protect Nazi Torturers And Their Superiors?

Saturday, November 15th, 2014

If you answered “Yes,” this post won’t interest you.

If you answered “No,” read on:

Senator Mark Udall faces the question: “Would You Protect Nazi Torturers And Their Superiors?” as reported by Mike Masnick in:

Mark Udall’s Open To Releasing CIA Torture Report Himself If Agreement Isn’t Reached Over Redactions.

Mike writes in part:

As we were worried might happen, Senator Mark Udall lost his re-election campaign in Colorado, meaning that one of the few Senators who vocally pushed back against the surveillance state is about to leave the Senate. However, Trevor Timm pointed out that, now that there was effectively “nothing to lose,” Udall could go out with a bang and release the Senate Intelligence Committee’s CIA torture report. The release of some of that report (a redacted version of the 400+ page “executive summary” — the full report is well over 6,000 pages) has been in limbo for months since the Senate Intelligence Committee agreed to declassify it months ago. The CIA and the White House have been dragging out the process hoping to redact some of the most relevant info — perhaps hoping that a new, Republican-controlled Senate would just bury the report.

Mike details why Senator Udall’s recent reelection defeat makes release of the report, either in full or in summary, a distinct possibility.

In addition to Mike’s report, here is some additional information you may find useful

Contact Information for Senator Udall

Senator Mark Udall
Hart Office Building Suite SH-730
Washington, D.C. 20510

P: 202-224-5941
F: 202-224-6471

An informed electorate is essential to the existence of self-governance.

No less a figure than Thomas Jefferson spoke about the star chamber proceedings we now take for granted saying:

An enlightened citizenry is indispensable for the proper functioning of a republic. Self-government is not possible unless the citizens are educated sufficiently to enable them to exercise oversight. It is therefore imperative that the nation see to it that a suitable education be provided for all its citizens. It should be noted, that when Jefferson speaks of “science,” he is often referring to knowledge or learning in general. “I know no safe depositary of the ultimate powers of the society but the people themselves; and if we think them not enlightened enough to exercise their control with a wholesome discretion, the remedy is not to take it from them, but to inform their discretion by education. This is the true corrective of abuses of constitutional power.” –Thomas Jefferson to William C. Jarvis, 1820. ME 15:278

“Every government degenerates when trusted to the rulers of the people alone. The people themselves, therefore, are its only safe depositories. And to render even them safe, their minds must be improved to a certain degree.” –Thomas Jefferson: Notes on Virginia Q.XIV, 1782. ME 2:207

“The most effectual means of preventing [the perversion of power into tyranny are] to illuminate, as far as practicable, the minds of the people at large, and more especially to give them knowledge of those facts which history exhibits, that possessed thereby of the experience of other ages and countries, they may be enabled to know ambition under all its shapes, and prompt to exert their natural powers to defeat its purposes.” –Thomas Jefferson: Diffusion of Knowledge Bill, 1779. FE 2:221, Papers 2:526

Jefferson didn’t have to contend with Middle East terrorists, only the English terrorizing the country side. Since more Americans died in British prison camps than in the Revolution proper, I would say they were as bad as terrorists. Prisoners of war in the American Revolutionary War

Noise about the CIA torture program post 9/11 is plentiful. But the electorate, that would be voters in the United States, lack facts about the CIA torture program, its oversight (or lack thereof) and those responsible for torture, from top to bottom. There isn’t enough information to “connect the dots,” a common phrase in the intelligence community.

Connecting those dots are what could bring the accountability and transparency necessary to prevent torture from returning as an instrument of US policy.

Thirty retired generals are urging President Obama to declassify the Senate Intelligence Committee’s report on CIA torture, arguing that without accountability and transparency the practice could be resumed. (Even Generals in US Military Oppose CIA Torture)

Hiding the guilty will produce an expectation of potential future torturers that they too will get a free pass on torture.

Voters are responsible for turning out those who authorized the use of torture and to hold their subordinates are held accountable for their crimes. To do so voters must have the information contained in the full CIA torture report.

Release of the Full CIA Torture Report: No Doom and Gloom

Senator Udall should ignore speculation that release of the full CIA torture report will “doom the nation.”


There have been similar claims in the past and none of them, not one, has ever proven to be true. Here are some of the ones that I remember personally:

Documents Released Date Nation Doomed?
Pentagon Papers 1971 No
Nixon White House Tapes 1974 No
The Office of Special Investigations: Striving for Accountability in the Aftermath of the Holocaust 2010 No
United States diplomatic cables leak 2010 No
Snowden (Global Surveillance Disclosures (2013—present)) 2013 No

Others that I should add to this list?

Is Saying “Nazi” Inflammatory?

Is using the term “Nazi” inflammatory in this context? The only difference between CIA and Nazi torture is the government that ordered or tolerated the torture. Unless you know of some other classification of torture. The United States military apparently doesn’t and I am willing to take their word for it.

Some will say the torturers were “serving the American people.” The same could be and was said by many a death camp guard for the Nazis. Wrapping yourself in a flag, any flag, does not put criminal activity beyond the reach of the law. It didn’t at Nuremberg and it should not work here.


A functioning democracy requires an informed electorate. Not elected officials, not a star chamber group but an informed electorate. To date the American people lack details about illegal torture carried out by a government agency, the CIA. To exercise their rights and responsibilities an an informed electorate, American voters must have full access to the full CIA torture report.

Release of anything less than the full CIA torture report protects torture participants and their superiors. I have no interest in protecting those who engage in illegal activities nor their superiors. As an American citizen, do you?

Experience with prior “sensitive” reports indicates that despite the wailing and gnashing of teeth, the United States will not fall when the guilty are exposed, prosecuted and lead off to jail. This case is no different.

As many retired US generals point out, transparency and accountability are the only ways to keep illegal torture from returning as an instrument of United States policy.

Is there any reason to wait until American torturers are in their nineties, suffering from dementia and living in New Jersey to hold them accountable for their crimes?

I don’t think so either.

PS: When Senator Udall releases the full CIA torture report in the Congressional Record (not to the New York Times or Wikileaks, both of which censor information for reasons best known to themselves), I hereby volunteer to assist in the extraction of names, dates, places and the association of those items with other, pubic data, both in topic map form as well as in other formats.

How about you?

PPS: On the relationship between Nazis and the CIA, see: Nazis Were Given ‘Safe Haven’ in U.S., Report Says. The special report that informed that article: The Office of Special Investigations: Striving for Accountability in the Aftermath of the Holocaust. (A leaked document)

When you compare Aryanism to American Exceptionalism the similarities between the CIA and the Nazi regime are quite pronounced. How could any act that protects the fatherland/homeland be a crime?

data.parliament @ Accountability Hack 2014

Friday, November 7th, 2014

data.parliament @ Accountability Hack 2014 by Zeid Hadi.

From the post:

We are pleased to announce that data.parliament will be providing data to be used during the Accountability Hack 2014

data.parliament is a platform that enables the sharing of UK Parliament’s data with consumers both within and outside of Parliament. Designed to complement existing data services it aims to be the central publishing platform and data repository for data that is produced by Parliament. Note our release is in Alpha.

It provides both a repository ( for data and a Linked Data API ( The platform’s ‘shop front’ or data catalogue can be found here (

The following datasets and APIs are now available on data.parliament

  • Commons Written Parliamentary Questions and Answers
  • Lords Written Parliamentary Questions and Answers
  • Commons Oral Questions and Question Times
  • Early Day Motions
  • Lords Divisions
  • Commons Divisions
  • Commons Members
  • Lords Members
  • Constituencies
  • Briefing Papers
  • Papers Laid

A description of the APIs and their usage can be found at All the data exposed by the endpoints can be returned in a variety of formats not least JSON.

To get you started the team has coded two publically available demonstrators that make use of the data in data.parliament. The source code for these can found at One of the demonstrators, a client app, can be found working at Also be sure to read our blog ( for quick start guides, updates, and news about upcoming datasets.

The data.parliament team will be on hand at the Hack, both participating and networking through the event to gather feedback and ideas..

I don’t know enough about British parliamentary procedure to comment on the completeness of the interface.

I am quite interested in the Briefing Papers data feed:

This dataset contains the data for research briefings produced by the Libraries of the House of Commons and House of Lords and the Parliamentary Office of Science and Technology. Each briefing has a pdf document for the briefing itself as well as a set of metadata to accompany it. (

A great project but even a complete set of documents and transcripts of every word spoken at Parliament does not document relationships between members of Parliment, their relationships to economic interests, etc.

Looking forward to collation of information from this project with other data to form a clearer picture of the legislative process in the UK.

I first saw this in a tweet by data.parliament UK.

Core Econ: a free economics textbook

Wednesday, November 5th, 2014

Core Econ: a free economics textbook by Cathy O’Neil.

From the post:

Today I want to tell you guys about, a free (although you do have to register) textbook my buddy Suresh Naidu is using this semester to teach out of and is also contributing to, along with a bunch of other economists.

(image omitted)

It’s super cool, and I wish a class like that had been available when I was an undergrad. In fact I took an economics course at UC Berkeley and it was a bad experience – I couldn’t figure out why anyone would think that people behaved according to arbitrary mathematical rules. There was no discussion of whether the assumptions were valid, no data to back it up. I decided that anybody who kept going had to be either religious or willing to say anything for money.

Not much has changed, and that means that Econ 101 is a terrible gateway for the subject, letting in people who are mostly kind of weird. This is a shame because, later on in graduate level economics, there really is no reason to use toy models of society without argument and without data; the sky’s the limit when you get through the bullshit at the beginning. The goal of the Core Econ project is to give students a taste for the good stuff early; the subtitle on the webpage is teaching economics as if the last three decades happened.

Skepticism of government economic forecasts and data requires knowledge of the lingo and assumptions of economics. This introduction won’t get you to that level but it is a good starting place.

Enjoy! Officially Out of Beta

Tuesday, October 28th, 2014 Officially Out of Beta

From the post:

The free legislative information website,, is officially out of beta form, and beginning today includes several new features and enhancements. URLs that include will be redirected to The site now includes the following:

New Feature: Resources

  • A new resources section providing an A to Z list of hundreds of links related to Congress
  • An expanded list of “most viewed” bills each day, archived to July 20, 2014

New Feature: House Committee Hearing Videos

  • Live streams of House Committee hearings and meetings, and an accompanying archive to January, 2012

Improvement: Advanced Search

  • Support for 30 new fields, including nominations, Congressional Record and name of member

Improvement: Browse

  • Days in session calendar view
  • Roll Call votes
  • Bill by sponsor/co-sponsor

When the Library of Congress, in collaboration with the U.S. Senate, U.S. House of Representatives and the Government Printing Office (GPO) released as a beta site in the fall of 2012, it included bill status and summary, member profiles and bill text from the two most recent congresses at that time – the 111th and 112th.

Since that time, has expanded with the additions of the Congressional Record, committee reports, direct links from bills to cost estimates from the Congressional Budget Office, legislative process videos, committee profile pages, nominations, historic access reaching back to the 103rd Congress and user accounts enabling saved personal searches. Users have been invited to provide feedback on the site’s functionality, which has been incorporated along with the data updates.

Plans are in place for ongoing enhancements in the coming year, including addition of treaties, House and Senate Executive Communications and the Congressional Record Index.

Field Value Lists:

Use search fields in the main search box (available on most pages), or via the advanced and command line search pages. Use terms or codes from the Field Value Lists with corresponding search fields: Congress [congressId], Action – Words and Phrases [billAction], Subject – Policy Area [billSubject], or Subject (All) [allBillSubjects].

Congresses (44, stops with 70th Congress (1927-1929))

Legislative Subject Terms, Subject Terms (541), Geographic Entities (279), Organizational Names (173). (total 993)

Major Action Codes (98)

Policy Area (33)

Search options:

Search Form: “Choose collections and fields from dropdown menus. Add more rows as needed. Use Major Action Codes and Legislative Subject Terms for more precise results.”

Command Line: “Combine fields with operators. Refine searches with field values: Congresses, Major Action Codes, Policy Areas, and Legislative Subject Terms. To use facets in search results, copy your command line query and paste it into the home page search box.”

Search Tips Overview: “You can search using the quick search available on most pages or via the advanced search page. Advanced search gives you the option of using a guided search form or a command line entry box.” (includes examples)


You can follow this project @congressdotgov.

Orientation to Legal Research & is available both as a seminar (in-person) and webinar (online).


I first saw this at is Out of Beta with New Features by Africa S. Hands.

A $23 million venture fund for the government tech set

Tuesday, September 16th, 2014

A $23 million venture fund for the government tech set by Nancy Scola.

Nancy tells a compelling story of a new VC firm, GovTech, which is looking for startups focused on providing governments with better technology infrastructure.

Three facts from the story stand out:

“The U.S. government buys 10 eBays’ worth of stuff just to operate,” from software to heavy-duty trucking equipment.

…working with government might be a tortuous slog, but Bouganim says that he saw that behind that red tape lay a market that could be worth in the neighborhood of $500 billion a year.

What most people don’t realize is government spends nearly $74 billion on technology annually. As a point of comparison, the video game market is a $15 billion annual market.

See Nancy’s post for the full flavor of the story but it sounds like there is gold buried in government IT.

Another way to look at it is the government is already spending $74 billion a year on technology that is largely an object of mockery and mirth. Effective software may be sufficiently novel and threatening to either attract business or a buy-out.

While you are pondering possible opportunities, existing systems, their structures and data are “subjects” in topic map terminology. Which means topic maps can protect existing contracts and relationships, while delivering improved capabilities and data.

Promote topic maps as “in addition to” existing IT systems and you will encounter less resistance both from within and without the government.

Don’t be squeamish about associating with governments, of whatever side. Their money spends just like everyone else’s. You can ask At&T and IBM about supporting both sides in a conflict.

I first saw this in a tweet by Mike Bracken.

Army can’t track spending on $4.3b system to track spending, IG finds

Sunday, September 14th, 2014

Army can’t track spending on $4.3b system to track spending, IG finds. by Mark Flatten.

From the post:

More than $725 million was spent by the Army on a high-tech network for tracking supplies and expenses that failed to comply with federal financial reporting rules meant to allow auditors to track spending, according to an inspector general’s report issued Wednesday.

The Global Combat Support System-Army, a logistical support system meant to track supplies, spare parts and other equipment, was launched in 1997. In 2003, the program switched from custom software to a web-based commercial software system.

About $95 million was spent before the switch was made, according to the report from the Department of Defense IG.

As of this February, the Army had spent $725.7 million on the system, which is ultimately expected to cost about $4.3 billion.

The problem, according to the IG, is that the Army has failed to comply with a variety of federal laws that require agencies to standardize reporting and prepare auditable financial statements.

The report is full of statements like this one:

PMO personnel provided a system change request, which they indicated would correct four account attributes in July 2014. In addition, PMO personnel provided another system change request they indicated would correct the remaining account attribute (Prior Period Adjustment) in late FY 2015.

PMO = Project Management Office (in this case, of GCSS–Army).

The lack of identification of personnel speaking on behalf of the project or various offices pervades the report. Moreover, the same is true for twenty-seven (27) other reports on issues with this project.

If the sources of statements and information were identified in these reports, then it would be possible to track people across reports and to identify who has failed to follow up on representations made in the reports.

The first step towards accountability is identification of decision makers in audit reports.

Tracking decision makers from one position to another and linking them to specific decisions is a natural application of topic maps.

I first saw this in Links I Liked by Chris Blattman, September 7, 2014.

6,482 Datasets Available

Tuesday, August 26th, 2014

6,482 Datasets Available Across 22 Federal Agencies In Data.json Files by Kin Lane.

From the post:

It has been a few months since I ran any of my federal government data.json harvesting, so I picked back up my work, and will be doing more work around datasets that federal agnecies have been making available, and telling the stories across my network.

I’m still surprised at how many people are unaware that 22 of the top federal agencies have data inventories of their public data assets, available in the root of their domain as a data.json file. This means you can go to many and there is a machine readable list of that agencies current inventory of public datasets.

See Kin’s post for links to the agency data.json files.

You may also want to read: What Happened With Federal Agencies And Their Data.json Files, which details Kin’s earlier efforts with tracking agency data.json files.

Kin points out that these data.json files are governed by: OMB M-13-13 Open Data Policy—Managing Information as an Asset. It’s pretty joyless reading but if you are interested in the the policy details or the requirements agencies must meet, it’s required reading.

If you are looking for datasets to clean up or combine together, it would be hard to imagine a more diverse set to choose from.

FDA Recall Data

Wednesday, July 16th, 2014

OpenFDA Provides Ready Access to Recall Data by Taha A. Kass-Hout.

From the post:

Every year, hundreds of foods, drugs, and medical devices are recalled from the market by manufacturers. These products may be labeled incorrectly or might pose health or safety issues. Most recalls are voluntary; in some cases they may be ordered by the U.S. Food and Drug Administration. Recalls are reported to the FDA, and compiled into its Recall Enterprise System, or RES. Every week, the FDA releases an enforcement report that catalogues these recalls. And now, for the first time, there is an Application Programming Interface (API) that offers developers and researchers direct access to all of the drug, device, and food enforcement reports, dating back to 2004.

The recalls in this dataset provide an illuminating window into both the safety of individual products and the safety of the marketplace at large. Recent reports have included such recalls as certain food products (for not containing the vitamins listed on the label), a soba noodle salad (for containing unlisted soy ingredients), and a pain reliever (for not following laboratory testing requirements).

You will get warnings that this data is “not for clinical use.”

Sounds like a treasure trove of data if you are looking for products still being sold despite being recalled.

Or if you want to advertise for “victims” of faulty products that have been recalled.

I think both of those are non-clinical uses. ;-)

Free Companies House data to boost UK economy

Tuesday, July 15th, 2014

Free Companies House data to boost UK economy

From the post:

Companies House is to make all of its digital data available free of charge. This will make the UK the first country to establish a truly open register of business information.

As a result, it will be easier for businesses and members of the public to research and scrutinise the activities and ownership of companies and connected individuals. Last year (2013/14), customers searching the Companies House website spent £8.7 million accessing company information on the register.

This is a considerable step forward in improving corporate transparency; a key strand of the G8 declaration at the Lough Erne summit in 2013.

It will also open up opportunities for entrepreneurs to come up with innovative ways of using the information.

This change will come into effect from the second quarter of 2015 (April – June).

In a side bar, Business Secretary Vince Cable said in part:

Companies House is making the UK a more transparent, efficient and effective place to do business.

I’m not sure about “efficient,” but providing incentives for lawyers and others to track down insider trading and other business as usual practices and arming them with open data would be a start in the right direction.

I first saw this in a tweet by Hadley Beeman.


Wednesday, July 9th, 2014


From the webpage:

MuckRock is an open news tool powered by state and federal Freedom of Information laws and you: Requests are based on your questions, concerns and passions, and you are free to embed, share and write about any of the verified government documents hosted here. Want to learn more? Check out our about page. MuckRock has been funded in part by grants from the Sunlight Foundation, the Freedom of the Press Foundation and the Knight Foundation.

Join Our Mailing List »

An amazing site.

I found MuckRock while looking for documents released by mistake by DHS. DHS Releases Trove of Documents Related to Wrong “Aurora” in Response to Freedom of Information Act (FOIA) Request (Maybe the DHS needs a topic map?)

I’ve signed up for their mailing list. Thinking about what government lies I want to read. ;-)

Looks like a great place to use your data mining/analysis skills.


Graphing 173 Million Taxi Rides

Thursday, June 26th, 2014

Interesting taxi rides dataset by Danny Bickson.

From the post:

I got the following from my collaborator Zach Nation. NY taxi ride dataset that was not properly anonymized and was reverse engineered to find interesting insights in the data.

Danny mapped the data using GraphLab and asks some interesting questions of the data.

BTW, Danny is offering the iPython notebook to play with!


This is the same data set I mentioned in: On Taxis and Rainbows

Friendly Fire: Death, Delay, and Dismay at the VA

Wednesday, June 25th, 2014

Friendly Fire: Death, Delay, and Dismay at the VA by Sen. Tom Coburn, M.D.

From the introduction:

Too many men and women who bravely fought for our freedom are losing their lives, not at the hands of terrorists or enemy combatants, but from friendly fire in the form of medical malpractice and neglect by the Department of Veterans Affairs (VA).

Split-second medical decisions in a war zone or in an emergency room can mean the difference between life and death. Yet at the VA, the urgency of the battlefield is lost in the lethargy of the bureaucracy. Veterans wait months just to see a doctor and the Department has systemically covered up delays and deaths they have caused. For decades, the Department has struggled to deliver timely care to veterans.

The reason veterans care has suffered for so long is Congress has failed to hold the VA accountable. Despite years of warnings from government investigators about efforts to cook the books, it took the unnecessary deaths of veterans denied care from Atlanta to Phoenix to prompt Congress to finally take action. On June 11, 2014, the Senate recently approved a bipartisan bill to
allow veterans who cannot receive a timely doctor’s appointment to go to another doctor outside of the VA.1046

But the problems at the VA are far deeper than just scheduling. After all, just getting to see a doctor does not guarantee appropriate treatment. Veterans in Boston receive top-notch care, while those treated in Phoenix suffer from subpar treatment. Over the past decade, more than 1,000 veterans may have died as a result of VA malfeasance,1 and the VA has paid out nearly $1
billion to veterans and their families for its medical malpractice.2

The waiting list cover-ups and uneven care are reflective of a much larger culture within the VA, where administrators manipulate both data and employees to give an appearance that all is well.

I am digesting the full report but I’m not sure enabling veterans to see doctors outside the VA is the same thing as holding the VA “accountable.”

From the early reports in this growing tragedy, there appear to be any number of “dark places” where data failed to be collected, where data was altered, or where the VA simply refused to collect data that might have driven better oversight.

I don’t think the VA is unique in any of these practices so mapping what is known, what could have been known and dark places in the VA data flow, could be informative both for the VA and other agencies as well.

I first saw this at Full Text Reports, Beyond the Waiting Lists, New Senate Report Reveals a Culture of Crime, Cover-Up and Coercion within the VA. Adds New Features

Tuesday, June 24th, 2014 Adds New Features

From the post:

  • User Accounts & Saved Searches: Users have the option of creating a private account that lets them save their personal searches. The feature gives users a quick and easy index from which to re-run their searches for new and updated information.
  • Congressional Record Search-by-Speaker: New metadata has been added to the Congressional Record that enables searching the daily transcript of congressional floor action by member name from 2009 – present. The member profile pages now also feature a link that returns a list of all Congressional Record articles in which that member was speaking.
  • Nominations: Users can track presidential nominees from appointment to hearing to floor votes with the new nominations function. The data goes back to 1981 and features faceted search, like the rest of, so users can narrow their searches by congressional session, type of nomination and status.

Other updates include expanded “About” and “Frequently Asked Questions” sections and the addition of committee referral and committee reports to bill-search results.

The website describes itself as: is the official source for federal legislative information. A collaboration among the Library of Congress, the U.S. Senate, the U.S. House of Representatives and the Government Printing Office, is a free resource that provides searchable access to bill status and summary, bill text, member profiles, the Congressional Record, committee reports, direct links from bills to cost estimates from the Congressional Budget Office, legislative process videos, committee profile pages and historic access reaching back to the 103rd Congress.

Before you get too excited, the 103rd Congress was in session 1993-1994. A considerable amount of material but far from complete.

The utility of topic maps is easy to demonstrate with the increased easy of tracking presidential nominations.

Rather than just tracking a bald nomination, wouldn’t it be handy to have all the political donations made by the nominee from the FEC? Or for that matter, their “friend” graph that shows their relationships with members of the president’s insider group?

All of that is easy enough to find, but then every searcher has to find the same information. If it were found and presented with the nominee, then other users would not have to work to re-find the information.


I first saw this in New Features Added to by Africa S. Hands.

Feds and Big Data

Thursday, June 12th, 2014

Federal Agencies and the Opportunities and Challenges of Big Data by Nicole Wong.

June 19, 2014
1:00 pm – 5:00 pm

From the post:

On June 19, the Obama Administration will continue the conversation on big data as we co-host our fourth big data conference, this time with the Georgetown University McCourt School of Public Policy’s Massive Data Institute.  The conference, “Improving Government Performance in the Era of Big Data; Opportunities and Challenges for Federal Agencies”,  will build on prior workshops at MIT, NYU, and Berkeley, and continue to engage both subject matter experts and the public in a national discussion about the future of data innovation and policy.

Drawing from the recent White House working group report, Big Data: Seizing Opportunities, Preserving Values, this event will focus on the opportunities and challenges posed by Federal agencies’ use of data, best practices for sharing data within and between agencies and other partners, and measures the government may use to ensure the protection of privacy and civil liberties in a big data environment.

You can find more information about the workshop and the webcast here.

We hope you will join us!

Nicole Wong is U.S. Deputy Chief Technology Officer at the White House Office of Science & Technology Policy

Approximately between 1:30 – 2:25 p.m., Panel One: Open Data and Information Sharing, Moderator: Nick Sinai, Deputy U.S. Chief Technology Officer.

Could be some useful intelligence on how sharing of data is viewed now. Perhaps you could throttle back a topic map to be just a little ahead of where agencies are now. So it would not look like such a big step.


New Data Sets Available in Census Bureau API

Monday, June 9th, 2014

New Data Sets Available in Census Bureau API

From the post:

Today the U.S. Census Bureau added several data sets to its application programming interface, including 2013 population estimates and 2012 nonemployer statistics.

The Census Bureau API allows developers to create a variety of apps and tools, such as ones that allow homebuyers to find detailed demographic information about a potential new neighborhood. By combining Census Bureau statistics with other data sets, developers can create tools for researchers to look at a variety of topics and how they impact a community.

Data sets now available in the API are:

  • July 1, 2013, national, state, county and Puerto Rico population estimates
  • 2012-2060 national population projections
  • 2007 Economic Census national, state, county, place and region economy-wide key statistics
  • 2012 Economic Census national economy-wide key statistics
  • 2011 County Business Patterns at the national, state and county level (2012 forthcoming)
  • 2012 national, state and county nonemployer statistics (businesses without paid employees)

The API also includes three decades (1990, 2000 and 2010) of census statistics and statistics from the American Community Survey covering one-, three- and five-year periods of data collection. Developers can access the API online and share ideas through the Census Bureau’s Developers Forum. Developers can use the Discovery Tool to examine the variables available in each dataset.

In case you are looking for census data to crunch!


UK Houses of Parliament launches Open Data portal

Friday, June 6th, 2014

UK Houses of Parliament launches Open Data portal

From the webpage:

Datasets related to the UK Houses of Parliament are now available via – the institution’s new dedicated Open Data portal.

Site developers are currently seeking feedback on the portal ahead of the next release, details of how to get in touch can be found by clicking here.

From the alpha release of the portal:

Welcome to the first release of – the home of Open Data from the UK Houses of Parliament. This is an alpha release and contains a limited set of features and data. We are seeking feedback from users about the platform and the data on it so please contact us.

I would have to agree that the portal presently contains “limited data.” ;-)

What would be helpful for non-U.K. data miners as well as ones in the U.K., would be some sense of what data is available?

A PDF file listing data that is currently maintained on the UK Houses of Parliament, their members, record of proceedings, transcripts, etc. would be a good starting point.

Pointers anyone?


Monday, June 2nd, 2014


Not all the news out of government is bad.

Consider openFDA which is putting

More than 3 million adverse drug event reports at your fingertips.

From the “about” page:

OpenFDA is an exciting new initiative in the Food and Drug Administration’s Office of Informatics and Technology Innovation spearheaded by FDA’s Chief Health Informatics Officer. OpenFDA offers easy access to FDA public data and highlight projects using these data in both the public and private sector to further regulatory or scientific missions, educate the public, and save lives.

What does it do?

OpenFDA provides API and raw download access to a number of high-value structured datasets. The platform is currently in public beta with one featured dataset, FDA’s publically available drug adverse event reports.

In the future, openFDA will provide a platform for public challenges issued by the FDA and a place for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data.

We’re currently focused on working on datasets in the following areas:

  • Adverse Events: FDA’s publically available drug adverse event reports, a database that contains millions of adverse event and medication error reports submitted to FDA covering all regulated drugs.
  • Recalls (coming soon): Enforcement Report and Product Recalls Data, containing information gathered from public notices about certain recalls of FDA-regulated products
  • Documentation (coming soon): Structured Product Labeling Data, containing detailed product label information on many FDA-regulated product

We’ll be releasing a number of updates and additional datasets throughout the upcoming months.

OK, I’m Twitter follower #522 @openFDA.

What’s your @openFDA number?

A good experience, i.e., people making good use of released data, asking for more data, etc., is what will drive more open data. Make every useful government data project count.

A New Nation Votes

Thursday, May 15th, 2014

A New Nation Votes: American Election Returns 1787-1825

From the webpage:

A New Nation Votes is a searchable collection of election returns from the earliest years of American democracy. The data were compiled by Philip Lampi. The American Antiquarian Society and Tufts University Digital Collections and Archives have mounted it online for you with funding from the National Endowment for the Humanities.

Currently there are 18040 elections that have been digitized.

Interesting data set and certainly one that could be supplemented with all manner of other materials.

Among other things, the impact or lack thereof from extension of the voting franchise would make an interesting study.