Archive for the ‘Government Data’ Category

FBI Records: The Vault

Wednesday, February 11th, 2015

FBI Records: The Vault

From the webpage:

The Vault is our new FOIA Library, containing 6,700 documents and other media that have been scanned from paper into digital copies so you can read them in the comfort of your home or office. 

Included here are many new FBI files that have been released to the public but never added to this website; dozens of records previously posted on our site but removed as requests diminished; files from our previous FOIA Library, and new, previously unreleased files.

The Vault includes several new tools and resources for your convenience:

  • Searching for Topics: You can browse or search for specific topics or persons (like Al Capone or Marilyn Monroe) by viewing our alphabetical listing, by using the search tool in the upper right of this site, or by checking the different category lists that can be found in the menu on the right side of this page. In the search results, click on the folder to see all of the files for that particular topic.
  • Searching for Key Words: Thanks to new technology we have developed, you can now search for key words or phrases within some individual files. You can search across all of our electronic files by using the search tool in the upper right of this site, or you can search for key words within a specific document by typing in terms in the search box in the upper right hand of the file after it has been opened and loaded. Note: since many of the files include handwritten notes or are not always in optimal condition due to age, this search feature does not always work perfectly.
  • Viewing the Files: We are now using an open source web document viewer, so you no longer need your own file software to view our records. When you click on a file, it loads in a reader that enables you to view one or two pages at a time, search for key words, shrink or enlarge the size of the text, use different scroll features, and more. In many cases, the quality and clarity of the individual files has also been improved.
  • Requesting a Status Update: Use our new Check the Status of Your FOI/PA Request tool to determine where your request stands in our process. Status information is updated weekly. Note: You need your FOI/PA request number to use this feature.

Please note: the content of the files in the Vault encompasses all time periods of Bureau history and do not always reflect the current views, policies, and priorities of the FBI.

New files will be added on a regular basis, so please check back often.

This may be meant as a distraction but I don’t know from what?

I suppose there is some value in knowing that ineffectual law enforcement investigations did not begin with 9/11.

Encouraging open data usage…

Saturday, February 7th, 2015

Encouraging open data usage by commercial developers: Report

From the post:

The second Share-PSI workshop was very different from the first. Apart from presentations in two short plenary sessions, the majority of the two days was spent in facilitated discussions around specific topics. This followed the success of the bar camp sessions at the first workshop, that is, sessions proposed and organised in an ad hoc fashion, enabling people to discuss whatever subject interests them.

Each session facilitator was asked to focus on three key questions:

  1. What X is the thing that should be done to publish or reuse PSI?
  2. Why does X facilitate the publication or reuse of PSI?
  3. How can one achieve X and how can you measure or test it?

This report summarises the 7 plenary presentations, 17 planned sessions and 7 bar camp sessions. As well as the Share-PSI project itself, the workshop benefited from sessions lead by 8 other projects. The agenda for the event includes links to all papers, slides and notes, with many of those notes being available on the project wiki. In addition, the #sharepsi tweets from the event are archived, as are a number of photo albums from Makx Dekkers,
Peter Krantz and José Luis Roda. The event received a generous write up
on the host’s Web site (in Portuguese). The spirit of the event is captured in this video by Noël Van Herreweghe of CORVe.

To avoid confusion, PSI in this context means Public Sector Information, not Published Subject Identifier (PSI).

Amazing coincidence that the W3C has smudged yet another name. You may recall the W3C decided to confuse URIs and IRIs in its latest attempt to re-write history, calling both the the acronym, URI:

Within this specification, the term URI refers to a Universal Resource Identifier as defined in [RFC 3986] and extended in [RFC 2987] [RFC 3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as “Base URI” that are defined or referenced across the whole family of XML specifications. (Corrected the RFC listing as shown.) (XQuery and XPath Data Model 3.1 , N. Walsh, J. Snelson, Editors, W3C Candidate Recommendation (work in progress), 18 December 2014, . Latest version available at

Interesting discussion but I would pay very close attention to market demand, perhaps I should say, commercial market demand, before planning a start-up based on government data. There is unlimited demand for free data or even better, free enhanced data, but that should not be confused with enhanced data that can be sold to support a start-up on an ongoing basis.

To give you an idea of the uncertainly of conditions for start-ups relying on open data, let me quote the final bullet points of this article:

  • There is a lack of knowledge of what can be done with open data which is hampering uptake.
  • There is a need for many examples of success to help show what can be done.
  • Any long term re-use of PSI must be based on a business plan.
  • Incubators/accelerators should select projects to support based on the business plan.
  • Feedback from re-users is an important component of the ecosystem and can be used to enhance metadata.
  • The boundary between what the public and private sectors can, should and should not do do needs to be better defined to allow the public sector to focus on its core task and businesses to invest with confidence.
  • It is important to build an open data infrastructure, both legal and technical, that supports the sharing of PSI as part of normal activity.
  • Licences and/or rights statements are essential and should be machine readable. This is made easier if the choice of licences is minimised.
  • The most valuable data is the data that the public sector already charges for.
  • Include domain experts who can articulate real problems in hackathons (whether they write code or not).
  • Involvement of the user community and timely response to requests is essential.
  • There are valid business models that should be judged by their effectiveness and/or social impact rather than financial gain.

Just so you know, that last point:

There are valid business models that should be judged by their effectiveness and/or social impact rather than financial gain.

that is not a business model, unless you have renewal financing from some source other than by financial gain. That is a charity model where you are the object of the charity.

Forty and Seven Inspector Generals Hit a Stone Wall

Thursday, February 5th, 2015

Inspectors general testify against agency ‘stonewalling’ before Congress by Sarah Westwood.

From the post:

Frustration with federal agencies that block probes from their inspectors general bubbled over Tuesday in a congressional hearing that dug into allegations of obstruction from a number of government watchdogs.

The Peace Corps, Environmental Protection Agency and Justice Department inspectors general each argued to members of the House Oversight and Government Reform Committee that some of their investigations had been thwarted or stalled by officials who refused to release necessary information to their offices.

Committee members from both parties doubled down on criticisms of the Justice Department’s lack of transparency and called for solutions to the government-wide problem during their first official hearing of the 114th Congress.

“If you can’t do your job, then we can’t do our job in Congress,” Chairman Jason Chaffetz, R-Utah, told the three witnesses and the scores of agency watchdogs who also attended, including the Department of Homeland Security and General Service Administration inspectors general.

Michael Horowitz, the Justice Department’s inspector general, testified that the FBI began reviewing requested documents in 2010 in what he said was a clear violation of federal law that is supposed to grant watchdogs unfettered access to agency records.

The FBI’s process, which involves clearing the release of documents with the attorney general or deputy attorney general, “seriously impairs inspector general independence, creates excessive delays, and may lead to incomplete, inaccurate or significantly delayed findings or recommendations,” Horowitz said.

Perhaps no surprise that the FBI shows up in the non-transparency column. But given the number of inspector generals with similar problems (47), it seems to be part of a larger herd.

If you are interested in going further into this issue, there was a hearing last August 2014), Obstructing Oversight: Concerns from Inspectors General, which is here in ASCII and here with video and witness statements in PDF.

Both sources omit the following documents:

Sept. 9, 2014, letter to Chairman Issa from OMB submitted by Chairman Issa.. 58
Aug. 5, 2014, letter to Reps. Issa, Cummings, Carper, and Coburn from 47 IGs, submitted by Rep. Chaffetz.. 61
Aug. 8, 2014, letter to OMB from Reps. Carper, Coburn, Issa and Cummings, submitted by Rep. Walberg.. 69
Statement for the record from The Institute of Internal Auditors. 71

Isn’t that rather lame? To leave these items in the table of contents but to omit them from the ASCII version and to not even include them with the witness statements.

I’m curious who the other forty-four (44) inspector generals might be. Aren’t you?

If you know where to find these appendix materials, please send me a pointer.

I think it will be more effective to list all of the Inspector Generals who have encountered this stone wall treatment than treat them as all and sundry.

Chairman Jason Chaffetz suggests that by controlling funding that Congress can force transparency. I would use a finer knife. Cut all funding for health care and retirement benefits in the agencies/departments in question. See how the rank and file in the agencies like them apples.

Assuming transparency results, I would not restore those benefits retroactively. Staff chose to support, explicitly or implicitly, illegal behavior. Making bad choices has negative consequences. It would be a teaching opportunity for all future federal staff members.

[U.S.] President’s Fiscal Year 2016 Budget

Wednesday, February 4th, 2015

Data for the The President’s Fiscal Year 2016 Budget

From the webpage:

Each year, after the President’s State of the Union address, the Office of Management and Budget releases the Administration’s Budget, offering proposals on key priorities and newly announced initiatives. This year we are releasing all of the data included in the President’s Fiscal Year 2016 Budget in a machine-readable format here on GitHub. The Budget process should be a reflection of our values as a country, and we think it’s important that members of the public have as many tools at their disposal as possible to see what is in the President’s proposals. And, if they’re motivated to create their own visualizations or products from the data, they should have that chance as well.

You can see the full Budget on Medium.

About this Repository

This repository includes three data files that contain an extract of the Office of Management and Budget (OMB) budget database. These files can be used to reproduce many of the totals published in the Budget and examine unpublished details below the levels of aggregation published in the Budget.

The user guide file contains detailed information about this data, its format, and its limitations. In addition, OMB provides additional data tables, explanations and other supporting documents in XLS format on its website.

Feedback and Issues

Please submit any feedback or comments on this data, or the Budget process here.

Before you start cheering too loudly, spend a few minutes with the User Guide. Not impenetrable but not an easy stroll either. I suspect the additional data tables, etc. are going to be necessary for interpretation of the main files.

Writing up how to use this data set would be a large but worthwhile undertaking.

A larger in scope but also worthwhile project would be to track how the initial allocations in the budget change through the legislative process. That is to know on a day to day basis, which departments, programs, etc. are up or down. Tied to votes in Congress and particular amendments that could prove to be very interesting.

Update: A tweet from Aaron Kirschenfeld directed us to: The U.S. Tax Code Is a Travesty by John Cassidy. Cassidy says to take a look at table S-9 in the numbers section under “Loophole closers.” The trick to the listed loopholes is that very few people qualify for the loophole. See Cassidy’s post for the details.

Other places that merit special attention?

Update: DHS Budget Justification 2016 (3906 pages, PDF). First saw this in a tweet by Dave Maass.

Project Blue Book Collection (UFO’s)

Thursday, January 22nd, 2015

Project Blue Book Collection

From the webpage:

This site was created by The Black Vault to house 129,491 pages, comprising of more than 10,000 cases of the Project Blue Book, Project Sign and Project Grudge files declassifed. Project Blue Book (along with Sign and Grudge) was the name that was given to the official investigation by the United States military to determine what the Unidentified Flying Object (UFO) phenomena was. It lasted from 1947 – 1969. Below you will find the case files compiled for research, and available free to download.

The CNN report Air Force UFO files land on Internet by Emanuella Grinberg reports Roswell is omitted from these files.

You won’t find anything new here, the files have been available on microfilm for years but being searchable and on the Internet is a step forward in terms of accessibility.

When I say “searchable,” the site notes:

1) A search is a good start — but is not 100% – There are more than 10,000 .pdf files here and although all of them are indexed in the search engine, the quality of the original documents, given the fact that many of them are more than 6 decades old, is very poor. This means that when they are converted to text for searching, many of the words are not readable to a computer. As a tip: make your search as basic as possible. Searching for a location? Just search a city, then the state, to see what comes up. Searching for a type of UFO? Use “saucer” vs. “flying saucer” or longer expression. It will increase the chances of finding what you are looking for.

2) The text may look garbled on the search results page (but not the .pdf!) – This is normal. For the same reason above… converting a sentence that may read ok to the human eye, may be gibberish to a computer due to the quality of the decades old state of many of the records. Don’t let that discourage you. Load the .PDF and see what you find. If you searched for “Hollywood” and a .pdf hit came up for Rome, New York, there is a reason why. The word “Hollywood” does appear in the file…so check it out!

3) Not everything was converted to .pdfs – There are a few case files in the Blue Book system that were simply too large to convert. They are:

undated/xxxx-xx-9667997-[BLANK][ 8,198 Pages ]
undated/xxxx-xx-9669100-[ILLEGIBLE]-[ILLEGIBLE]-/ [ 1,450 Pages ]
undated/xxxx-xx-9669191-[ILLEGIBLE]/ [ 3,710 Pages ]

These files will be sorted at a later date. If you are interested in helping, please email

I tried to access the files not yet processed but was redirected. I will see what is required to see the not yet processed files.

If you are interested in trying your skills at PDF conversion/improvement, the main data set should be more than sufficient.

If you are interested in automatic discovery of what or who was blacked out of government reports, this is also an interesting data set. Personally I think blacking out passages should be forbidden. People should have to accept the consequences of their actions, good or bad. We require that of citizens, why not government staff?

I assume crowd sourcing corrections has already been considered. 130K of pages is a fairly small number when it comes to crowd sourcing. Surely there are more than 10,000 people interested in the data set, which would be 13 pages each. Assuming each one did 100 pages each, you would have more than enough overlap to do statistics to choose the best corrections.

For those of you who see patterns in UFO reports, a good way to reach across the myriad sightings and reports would be to topic map the entire collection.

Personally I suspect at least some of the reports do concern alien surveillance and the absence in the intervening years indicates they have lost interest. Given our performance since the 1940’s, that’s not hard to understand.

Key Court Victory Closer for IRS Open-Records Activist

Friday, January 16th, 2015

Key Court Victory Closer for IRS Open-Records Activist by Suzanne Perry.

From the post:

The open-records activist Carl Malamud has moved a step closer to winning his legal battle to give the public greater access to the wealth of information on Form 990 tax returns that nonprofits file.

During a hearing in San Francisco on Wednesday, U.S. District Judge William Orrick said he tentatively planned to rule in favor of Mr. Malamud’s group, Public. Resource. Org, which filed a lawsuit to force the Internal Revenue Service to release nonprofit tax forms in a format that computers can read. That would make it easier to conduct online searches for data about organizations’ finances, governance, and programs.

“It looks like a win for Public. Resource and for the people who care about electronic access to public documents,” said Thomas Burke, the group’s lawyer.

The suit asks the IRS to release Forms 990 in machine-readable format for nine nonprofits that had submitted their forms electronically. Under current practice, the IRS converts all Forms 990 to unsearchable image files, even those that have been filed electronically.

That’s a step in the right direction but not all that will be required.

Suzanne goes on to note that the IRS removes donor lists from the 990 forms.

Any number of organizations will object but I think the donor lists should be public information as well.

Making all donors public may discourage some people from donating to unpopular causes but that’s a hit I would be willing to take to know who owns the political non-profits. And/or who funds the NRA for example.

Data that isn’t open enough to know who is calling the shots at organizations isn’t open data, its an open data tease.

What Counts: Harnessing Data for America’s Communities

Friday, January 16th, 2015

What Counts: Harnessing Data for America’s Communities Senior Editors: Naomi Cytron, Kathryn L.S. Pettit, & G. Thomas Kingsley. (new book, free pdf)

From: A Roadmap: How To Use This Book

This book is a response to the explosive interest in and availability of data, especially for improving America’s communities. It is designed to be useful to practitioners, policymakers, funders, and the data intermediaries and other technical experts who help transform all types of data into useful information. Some of the essays—which draw on experts from community development, population health, education, finance, law, and information systems—address high-level systems-change work. Others are immensely practical, and come close to explaining “how to.” All discuss the incredibly exciting opportunities and challenges that our ever-increasing ability to access and analyze data provide.

As the book’s editors, we of course believe everyone interested in improving outcomes for low-income communities would benefit from reading every essay. But we’re also realists, and know the demands of the day-to-day work of advancing opportunity and promoting well-being for disadvantaged populations. With that in mind, we are providing this roadmap to enable readers with different needs to start with the essays most likely to be of interest to them.

For everyone, but especially those who are relatively new to understanding the promise of today’s data for communities, the opening essay is a useful summary and primer. Similarly, the final essay provides both a synthesis of the book’s primary themes and a focus on the systems challenges ahead.

Section 2, Transforming Data into Policy-Relevant Information (Data for Policy), offers a glimpse into the array of data tools and approaches that advocates, planners, investors, developers and others are currently using to inform and shape local and regional processes.

Section 3, Enhancing Data Access and Transparency (Access and Transparency), should catch the eye of those whose interests are in expanding the range of data that is commonly within reach and finding ways to link data across multiple policy and program domains, all while ensuring that privacy and security are respected.

Section 4, Strengthening the Validity and Use of Data (Strengthening Validity), will be particularly provocative for those concerned about building the capacity of practitioners and policymakers to employ appropriate data for understanding and shaping community change.

The essays in section 5, Adopting More Strategic Practices (Strategic Practices), examine the roles that practitioners, funders, and policymakers all have in improving the ways we capture the multi-faceted nature of community change, communicate about the outcomes and value of our work, and influence policy at the national level.

There are of course interconnections among the essays in each section. We hope that wherever you start reading, you’ll be inspired to dig deeper into the book’s enormous richness, and will join us in an ongoing conversation about how to employ the ideas in this volume to advance policy and practice.

Thirty-one (31) essays by dozens of authors on data and its role in public policy making.

From the acknowledgements:

This book is a joint project of the Federal Reserve Bank of San Francisco and the Urban Institute. The Robert Wood Johnson Foundation provided the Urban Institute with a grant to cover the costs of staff and research that were essential to this project. We also benefited from the field-building work on data from Robert Wood Johnson grantees, many of whom are authors in this volume.

If you are pitching data and/or data projects where the Federal Reserve Bank of San Francisco/Urban Institute set the tone of policy making conversations, a must read. It is likely to have an impact on other policy discussions, but adjusted for local concerns and conventions. You could also use it to shape your local policy discussions.

I first saw this in There is no seamless link between data and transparency by Jennifer Tankard.

Open Addresses

Thursday, January 15th, 2015

Open Addresses

From the homepage:

At Open Addresses, we are bringing together information about the places where we live, work and go about our daily lives. By gathering information provided to us by people about their own addresses, and from open sources on the web, we are creating an open address list for the UK, available to everyone.

Do you want to enter our photography competition?

Or do you want to get involved by submitting an address?

It’s as simple as entering it below.

Addresses are a vital part of the UK’s National Information Infrastructure. Open Addresses will be used by a whole range of individuals and organisations (academics, charities, public sector and private sector). By having accurate information about addresses, we’ll all benefit from getting more of the things we want, and less of the things we don’t.

Datasets as of 10 December 2014 are available for download now. Via BitTorrent so I assume the complete datasets are fairly large. Anyone downloaded them?

If you do download all or part of the records, curious what other public data sets would you combine with them?

SODA Developers

Wednesday, January 14th, 2015

SODA Developers

From the webpage:

The Socrata Open Data API allows you to programatically access a wealth of open data resources from governments, non-profits, and NGOs around the world.

I have mentioned Socrata and their Open Data efforts more than once on this blog but I don’t think I have ever pointed to their developer site.

Very much worth spending time here if you are interested in governmental data.

Not that I take any data, government or otherwise, at face value. Data is created and released/leaked for reasons that may or may not coincide with your assumptions or goals. Access to data is just the first step in uncovering whose interests the data represents.

Project Open Data Dashboard

Sunday, January 4th, 2015

Project Open Data Dashboard

From the about page:

This website shows how Federal agencies are performing on the latest Open Data Policy (M-13-13) using the guidance provided by Project Open Data. It also provides many other other tools and resources to help agencies and other interested parties implement their open data programs. Features include:

  • A dashboard to track the progress of agencies implementing Project Open Data on a quarterly basis
  • Automated analysis of URLs provided within metadata to see if the links work as expected
  • A validator for v1.0 and v1.1 of the Project Open Data Metadata Schema
  • A converter to transform CSV files into JSON as defined by the Project Open Data Metadata Schema Link broken as of 4 January 2014. Site notified.
  • An export API to export from the CKAN API and transform the metadata into JSON as defined by the Project Open Data Metadata Schema
  • A changeset viewer to compare a data.json file to the metadata currently available in CKAN (eg

You can learn more by reading the main documentation page.

The main documentation defines the “Number of Datasets” on the dashboard as:

This element accounts for the total number of all datasets listed in the Enterprise Data Inventory. This includes those marked as “Public”, “Non-Public” and “Restricted”.

If you compare the “Milestone – May 31st 2014″ to November, the number of data sets increases in most cases, as you would expect. However, both the Department of Commerce and the Department of Health and Human Services, had decreases in the number of available data sets.

On May 31st, the Department of Commerce listed 20488 data sets but on November 30th, only 372. A decrease of more than 20,000 data sets.

On May 31st, the Department of Health and Human Services listed 1507 data sets but on November 30th, only 1064, a decrease of 443 data sets.

Looking further, the sudden decrease for both agencies occurred between Milestone 3 and Milestone 4 (August 31st 2014).

Sounds exciting! Yes?

Yes, but this illustrates why you should “drill down” in data whenever possible. And if not possible in interface, check other sources.

I followed the Department of Commerce link (the first column on the left) to the details of the crawl and thence the data link to determine the number of publicly available data sets.

As of today, 04 January 2014, the Department of Commerce has 23,181 datasets and not the 372 reported for Milestones 5 or the 268 reported for Milestone 4.

As of today, 04 January 2014, the Department of Health and Human Services has 1,672 datasets and not the 1064 reported for Milestones 5 or the 1088 reported for Milestone 4.

The reason(s) for the differences are unclear and the dashboard itself offers no explanation for the disparate figures. I suspect there is some glitch in the automatic harvesting of the information and/or in the representation of those results in the dashboard.

Always remember that just because a representation* claims some “fact,” that doesn’t necessarily make it so.

*Representation: Bear in mind that anything you see on a computer screen is a “representation.” There isn’t anything in data storage that has any resemblance to what you see on the screen. Choices have been made out of your sight as to how information will be represented to you.

As I mentioned yesterday, there is a common and naive assumption that data as represented to us has a reliable correspondence with data held in storage. And that the data held in storage has a reliable correspondence to data as entered or obtained from other sources.

Those assumptions aren’t unreasonable, at least until they are. Can you think of ways to illustrate those principles? I ask because at least one way to illustrate those principles makes an excellent case for open source software. More on that anon.

U.S. Appropriations by Fiscal Year

Wednesday, December 31st, 2014

U.S. Appropriations by Fiscal Year

Congressdotgov tweeted about this resource earlier today.

It’s a great starting place for research on U.S. appropriations but it is more of a bulk resource than a granular one.

You will have to wade through this resource and many others to piece together some of the details on any particular line item in the budget. Not surprisingly, anyone interested in the same line item will have to repeat that mechanical process. For every line in the budget.

There are collected resources on different aspects of the budget process, hearing documents, campaign donation records, etc. but they are for the most part all separated and not easily collated. Perhaps that is due to lack of foresight. Perhaps.

In any event, it is a starting place if you have a particular line item in mind. Think about creating a result that can be re-used and shared if at all possible.

Collection of CRS reports released to the public

Friday, December 19th, 2014

Collection of CRS reports released to the public by Kevin Kosar.

From the post:

Something rare has occurred—a collection of reports authored by the Congressional Research Service has been published and made freely available to the public. The 400-page volume, titled, “The Evolving Congress,” and was produced in conjunction with CRS’s celebration of its 100th anniversary this year. Congress, not CRS, published it. (Disclaimer: Before departing CRS in October, I helped edit a portion of the volume.)

The Congressional Research Service does not release its reports publicly. CRS posts its reports at, a website accessible only to Congress and its staff. The agency has a variety of reasons for this policy, not least that its statute does not assign it this duty. Congress, with ease, could change this policy. Indeed, it already makes publicly available the bill digests (or “summaries”) CRS produces at

The Evolving Congress” is a remarkable collection of essays that cover a broad range of topic. Readers would be advised to start from the beginning. Walter Oleszek provides a lengthy essay on how Congress has changed over the past century. Michael Koempel then assesses how the job of Congressman has evolved (or devolved depending on one’s perspective). “Over time, both Chambers developed strategies to reduce the quantity of time given over to legislative work in order to accommodate Members’ other duties,” Koempel observes.

The NIH (National Institutes of Health) requires that NIH funded research be made available to the public. Other government agencies are following suite. Isn’t it time for the Congressional Research Service to make its publicly funded research available to the public that paid for it?

Congress needs to require it. Contact your member of Congress today. Ask for all Congressional Research Service reports, past, present and future be made available to the public.

You have already paid for the reports, why shouldn’t you be able to read them?

Senate Joins House In Publishing Legislative Information In Modern Formats [No More Sneaking?]

Friday, December 19th, 2014

Senate Joins House In Publishing Legislative Information In Modern Formats by Daniel Schuman.

From the post:

There’s big news from today’s Legislative Branch Bulk Data Task Force meeting. The United States Senate announced it would begin publishing text and summary information for Senate legislation, going back to the 113th Congress, in bulk XML. It would join the House of Representatives, which already does this. Both chambers also expect to have bill status information available online in XML format as well, but a little later on in the year.

This move goes a long way to meet the request made by a coalition of transparency organizations, which asked for legislative information be made available online, in bulk, in machine-processable formats. These changes, once implemented, will hopefully put an end to screen scraping and empower users to build impressive tools with authoritative legislative data. A meeting to spec out publication methods will be hosted by the Task Force in late January/early February.

The Senate should be commended for making the leap into the 21st century with respect to providing the American people with crucial legislative information. We will watch closely to see how this is implemented and hope to work with the Senate as it moves forward.

In addition, the Clerk of the House announced significant new information will soon be published online in machine-processable formats. This includes data on nominees, election statistics, and members (such as committee assignments, bioguide IDs, start date, preferred name, etc.) Separately, House Live has been upgraded so that all video is now in H.264 format. The Clerk’s website is also undergoing a redesign.

The Office of Law Revision Counsel, which publishes the US Code, has further upgraded its website to allow pinpoint citations for the US Code. Users can drill down to the subclause level simply by typing the information into their search engine. This is incredibly handy.

This is great news!

Law is a notoriously opaque domain and the process of creating it even more so. Getting the data is a great first step, parsing out steps in the process and their meaning is another. To say nothing of the content of the laws themselves.

Still, progress is progress and always welcome!

Perhaps citizen review will stop the Senate from sneaking changes past sleepy members of the House.

GovTrack’s Summer/Fall Updates

Thursday, December 18th, 2014

GovTrack’s Summer/Fall Updates by Josh Tauberer.

From the post:

Here’s what’s been improved on GovTrack in the summer and fall of this year.


  • Permalinks to individual paragraphs in bill text is now provided (example).
  • We now ask for your congressional district so that we can customize vote and bill pages to show how your Members of Congress voted.
  • Our bill action/status flow charts on bill pages now include activity on certain related bills, which are often crucially important to the main bill.
  • The bill cosponsors list now indicates when a cosponsor of a bill is no longer serving (i.e. because of retirement or death).
  • We switched to gender neutral language when referring to Members of Congress. Instead of “congressman/woman”, we now use “representative.”
  • Our historical votes database (1979-1989) from was refreshed to correct long-standing data errors.
  • We dropped support for Internet Explorer 6 in order to address with POODLE SSL security vulnerability that plagued most of the web.
  • We dropped support for Internet Explorer 7 in order to allow us to make use of more modern technologies, which has always been the point of GovTrack.

The comment I posted was:

Great work! But I read the other day about legislation being “snuck” by the House (Senate changes), US Congress OKs ‘unprecedented’ codification of warrantless surveillance.

Do you have plans for a diff utility that warns members of either house of changes to pending legislation?

In case you aren’t familiar with

From the about page:, a project of Civic Impulse, LLC now in its 10th year, is one of the worldʼs most visited government transparency websites. The site helps ordinary citizens find and track bills in the U.S. Congress and understand their representatives’ legislative record.

In 2013, was used by 8 million individuals. We sent out 3 million legislative update email alerts. Our embeddable widgets were deployed on more than 80 official websites of Members of Congress.

We bring together the status of U.S. federal legislation, voting records, congressional district maps, and more (see the table at the right).
and make it easier to understand. Use GovTrack to track bills for updates or get alerts about votes with email updates and RSS feeds. We also have unique statistical analyses to put the information in context. Read the «Analysis Methodology».

GovTrack openly shares the data it brings together so that other websites can build other tools to help citizens engage with government. See the «Developer Documentation» for more.

Melville House to Publish CIA Torture Report:… [Publishing Gone Awry?]

Tuesday, December 16th, 2014

Melville House to Publish CIA Torture Report: An Interview with Publisher Dennis Johnson by Jonathon Sturgeon.

From the post:

In what must be considered a watershed moment in contemporary publishing, Brooklyn-based independent publisher Melville House will release the Senate Intelligence Committee’s executive summary of a government report — “Study of the Central Intelligence Agency’s Detention and Interrogation Program” — that is said to detail the monstrous torture methods employed by the Central Intelligence Agency in its counter-terrorism efforts.

Melville House’s co-publisher and co-founder Dennis Johnson has called the report “probably the most important government document of our generation, even one of the most significant in the history of our democracy.”

Melville House’s press release confirms that they are releasing both print and digital editions on December 30, 2014.

As of December 30, 2014, I can read and mark my copy, print or digital and you can mark your copy, print or digital, but no collaboration on the torture report.

For the “…most significant [document] in the history of our democracy” that seems rather sad. That is that each of us is going to be limited to whatever we know or can find out when we are reading our copies of the same report.

If there was ever a report (and there have been others) that merited a collaborative reading/annotation, the CIA Torture Report would be one of them.

Given the large number of people who worked on this report and the diverse knowledge required to evaluate it, that sounds like bad publishing choices. Or at least that there are better publishing choices available.

What about casting the entire report into the form of wiki pages, broken down by paragraphs? Once proofed, the original text can be locked and comments only allowed on the text. Free to view but $fee to comment.

What do you think? Viable way to present such a text? Other ways to host the text?

PS: Unlike other significant government reports, major publishing houses did not receive incentives to print the report. Jerry attributes that to Dianne Feinstein not wanting to favor any particular publisher. That’s one explanation. Another would be that if published in hard copy at all, a small press will mean it fades more quickly from public view. Your call.

US Congress OKs ‘unprecedented’ codification of warrantless surveillance

Tuesday, December 16th, 2014

US Congress OKs ‘unprecedented’ codification of warrantless surveillance by Lisa Vaas.

From the post:

Congress last week quietly passed a bill to reauthorize funding for intelligence agencies, over objections that it gives the government “virtually unlimited access to the communications of every American”, without warrant, and allows for indefinite storage of some intercepted material, including anything that’s “enciphered”.

That’s how it was summed up by Rep. Justin Amash, a Republican from Michigan, who pitched and lost a last-minute battle to kill the bill.

The bill is titled the Intelligence Authorization Act for Fiscal Year 2015.

Amash said that the bill was “rushed to the floor” of the house for a vote, following the Senate having passed a version with a new section – Section 309 – that the House had never considered.

Lisa reports that the bill codifies Executive Order 12333, a Ronald Reagan remnant from an earlier attempt to dismantle the United States Constitution.

There is a petition underway to ask President Obama to veto the bill. Are you a large bank? Skip the petition and give the President a call.

From Lisa’s report, it sounds like Congress needs a DEW Line for legislation:

Rep. Zoe Lofgren, a California Democrat who voted against the bill, told the National Journal that the Senate’s unanimous passage of the bill was sneaky and ensured that the House would rubberstamp it without looking too closely:

If this hadn’t been snuck in, I doubt it would have passed. A lot of members were not even aware that this new provision had been inserted last-minute. Had we been given an additional day, we may have stopped it.

How do you “sneak in” legislation in a public body?

Suggestions on an early warning system for changes to legislation between the two houses of Congress?

Global Open Data Index

Friday, December 12th, 2014

Global Open Data Index

From the about page:

For more information on the Open Data Index, you may contact the team at:

Each year, governments are making more data available in an open format. The Global Open Data Index tracks whether this data is actually released in a way that is accessible to citizens, media and civil society and is unique in crowd-sourcing its survey of open data releases around the world. Each year the open data community and Open Knowledge produces an annual ranking of countries, peer reviewed by our network of local open data experts.

Crowd-sourcing this data provides a tool for communities around the world to learn more about the open data available locally and by country, and ensures that the results reflect the experience of civil society in finding open information, rather than government claims. it also ensures that those who actually collect the information that builds the Index are the very people who use the data and are in a strong position to advocate for more and higher quality open data.

The Global Open Data Index measures and benchmarks the openness of data around the world, and then presents this information in a way that is easy to understand and use. This increases its usefulness as an advocacy tool and broadens its impact.

In 2014 we are expanding to more countries (from 70 in 2013) with an emphasis on countries of the Global South.

See the blog post launching the 2014 Index. For more information, please see the FAQ and the methodology section. Join the conversation with our Open Data Census discussion list.

It is better to have some data rather than none but look at the data by which countries are ranked for openness:

Transport Timetables, Government Budget, Government Spending, Election Results, Company Register, National Map, National Statistics, Postcodes/Zipcodes, Pollutant Emissions.

A listing of data that results in the United Kingdom with a 97% score and first place.

It is hard to imagine a less threatening set of data than those listed. I am sure someone will find a use for them but in the great scheme of things, they are a distraction from the data that isn’t being released.

Off-hand, in the United States at least, public data should include who meets with appointed or elected members of government along with transcripts of those meetings (including phone calls). It should also include all personal or corporate donations made to any organization for any reason of greater than $100.00. It should include documents prepared and/or submitted to the U.S. government and its agencies. And those are just the ones that come to mind rather quickly.

Current disclosures by the U.S. government are a fiction of openness that conceals a much larger dark data set, waiting to be revealed at some future date.

I first saw this in a tweet by ChemConnector.

Timeline of sentences from the CIA Torture Report

Wednesday, December 10th, 2014

Chris R. Albon has created a timeline of sentences from the CIA torture report!


1997,”The FBI information included that al-Mairi’s brother “”traveled to Afghanistan in 1997-1998 to train in Bin – Ladencamps.”””
1997,”The FBI information included that al-Mairi’s brother “”traveled to Afghanistan in 1997-1998 to train in Bin – Ladencamps.”””
1997,”For example, on October 12, 2004, another CIA detainee explained how he met al-Kuwaiti at a guesthouse that was operated by Ibn Shaykh al-Libi and Abu Zubaydah in 1997.”

Cleanly imports into Apache OpenOffice Calc and is 6163 rows (after subtracting the header).

Please acknowledge Chris if you use this data.

What other data would you pull from the executive summary?

What other data do you think would convince Senator Udall to release the entire 6,000 page report?

A Tranche of Climate Data

Wednesday, December 10th, 2014

FACT SHEET: Harnessing Climate Data to Boost Ecosystem & Water Resilience

From the document:

Today, the Administration is making a new tranche of data about ecosystems and water resilience available as part of the Climate Data Initiative—including key datasets related water quality, streamflow, land cover, soils, and biodiversity.

In addition to the datasets being added today to, the Department of Interior (DOI) is launching a suite of geospatial mapping tools on that will enable users to visualize and overlay datasets related to ecosystems, land use, water, and wildlife. Together, the data and tools unleashed today will help natural-resource managers, decision makers, and communities on the front lines of climate change build resilience to climate impacts and better plan for the future. (emphasis added)

I had to look up “tranche.” Google offers: “a portion of something, especially money.”

Assume that your contacts and interactions with both sites are monitored and recorded.

Treasury Island: the film

Tuesday, November 25th, 2014

Treasury Island: the film by Lauren Willmott, Boyce Keay, and Beth Morrison.

From the post:

We are always looking to make the records we hold as accessible as possible, particularly those which you cannot search for by keyword in our catalogue, Discovery. And we are experimenting with new ways to do it.

The Treasury series, T1, is a great example of a series which holds a rich source of information but is complicated to search. T1 covers a wealth of subjects (from epidemics to horses) but people may overlook it as most of it is only described in Discovery as a range of numbers, meaning it can be difficult to search if you don’t know how to look. There are different processes for different periods dating back to 1557 so we chose to focus on records after 1852. Accessing these records requires various finding aids and multiple stages to access the papers. It’s a tricky process to explain in words so we thought we’d try demonstrating it.

We wanted to show people how to access these hidden treasures, by providing a visual aid that would work in conjunction with our written research guide. Armed with a tablet and a script, we got to work creating a video.

Our remit was:

  • to produce a video guide no more than four minutes long
  • to improve accessibility to these records through a simple, step-by–step process
  • to highlight what the finding aids and documents actually look like

These records can be useful to a whole range of researchers, from local historians to military historians to social historians, given that virtually every area of government action involved the Treasury at some stage. We hope this new video, which we intend to be watched in conjunction with the written research guide, will also be of use to any researchers who are new to the Treasury records.

Adding video guides to our written research guides are a new venture for us and so we are very keen to hear your feedback. Did you find it useful? Do you like the film format? Do you have any suggestions or improvements? Let us know by leaving a comment below!

This is a great illustration that data management isn’t something new. The Treasury Board has kept records since 1557 and has accumulated a rather extensive set of materials.

The written research guide looks interesting but since I am very unlikely to ever research Treasury Board records, I am unlikely to need it.

However, the authors have anticipated that someone might be interested in process of record keeping itself and so provided this additional reference:

Thomas L Heath, The Treasury (The Whitehall Series, 1927, GP Putnam’s Sons Ltd, London and New York)

That would be an interesting find!

I first saw this in a tweet by Andrew Janes.

Would You Protect Nazi Torturers And Their Superiors?

Saturday, November 15th, 2014

If you answered “Yes,” this post won’t interest you.

If you answered “No,” read on:

Senator Mark Udall faces the question: “Would You Protect Nazi Torturers And Their Superiors?” as reported by Mike Masnick in:

Mark Udall’s Open To Releasing CIA Torture Report Himself If Agreement Isn’t Reached Over Redactions.

Mike writes in part:

As we were worried might happen, Senator Mark Udall lost his re-election campaign in Colorado, meaning that one of the few Senators who vocally pushed back against the surveillance state is about to leave the Senate. However, Trevor Timm pointed out that, now that there was effectively “nothing to lose,” Udall could go out with a bang and release the Senate Intelligence Committee’s CIA torture report. The release of some of that report (a redacted version of the 400+ page “executive summary” — the full report is well over 6,000 pages) has been in limbo for months since the Senate Intelligence Committee agreed to declassify it months ago. The CIA and the White House have been dragging out the process hoping to redact some of the most relevant info — perhaps hoping that a new, Republican-controlled Senate would just bury the report.

Mike details why Senator Udall’s recent reelection defeat makes release of the report, either in full or in summary, a distinct possibility.

In addition to Mike’s report, here is some additional information you may find useful

Contact Information for Senator Udall

Senator Mark Udall
Hart Office Building Suite SH-730
Washington, D.C. 20510

P: 202-224-5941
F: 202-224-6471

An informed electorate is essential to the existence of self-governance.

No less a figure than Thomas Jefferson spoke about the star chamber proceedings we now take for granted saying:

An enlightened citizenry is indispensable for the proper functioning of a republic. Self-government is not possible unless the citizens are educated sufficiently to enable them to exercise oversight. It is therefore imperative that the nation see to it that a suitable education be provided for all its citizens. It should be noted, that when Jefferson speaks of “science,” he is often referring to knowledge or learning in general. “I know no safe depositary of the ultimate powers of the society but the people themselves; and if we think them not enlightened enough to exercise their control with a wholesome discretion, the remedy is not to take it from them, but to inform their discretion by education. This is the true corrective of abuses of constitutional power.” –Thomas Jefferson to William C. Jarvis, 1820. ME 15:278

“Every government degenerates when trusted to the rulers of the people alone. The people themselves, therefore, are its only safe depositories. And to render even them safe, their minds must be improved to a certain degree.” –Thomas Jefferson: Notes on Virginia Q.XIV, 1782. ME 2:207

“The most effectual means of preventing [the perversion of power into tyranny are] to illuminate, as far as practicable, the minds of the people at large, and more especially to give them knowledge of those facts which history exhibits, that possessed thereby of the experience of other ages and countries, they may be enabled to know ambition under all its shapes, and prompt to exert their natural powers to defeat its purposes.” –Thomas Jefferson: Diffusion of Knowledge Bill, 1779. FE 2:221, Papers 2:526

Jefferson didn’t have to contend with Middle East terrorists, only the English terrorizing the country side. Since more Americans died in British prison camps than in the Revolution proper, I would say they were as bad as terrorists. Prisoners of war in the American Revolutionary War

Noise about the CIA torture program post 9/11 is plentiful. But the electorate, that would be voters in the United States, lack facts about the CIA torture program, its oversight (or lack thereof) and those responsible for torture, from top to bottom. There isn’t enough information to “connect the dots,” a common phrase in the intelligence community.

Connecting those dots are what could bring the accountability and transparency necessary to prevent torture from returning as an instrument of US policy.

Thirty retired generals are urging President Obama to declassify the Senate Intelligence Committee’s report on CIA torture, arguing that without accountability and transparency the practice could be resumed. (Even Generals in US Military Oppose CIA Torture)

Hiding the guilty will produce an expectation of potential future torturers that they too will get a free pass on torture.

Voters are responsible for turning out those who authorized the use of torture and to hold their subordinates are held accountable for their crimes. To do so voters must have the information contained in the full CIA torture report.

Release of the Full CIA Torture Report: No Doom and Gloom

Senator Udall should ignore speculation that release of the full CIA torture report will “doom the nation.”


There have been similar claims in the past and none of them, not one, has ever proven to be true. Here are some of the ones that I remember personally:

Documents Released Date Nation Doomed?
Pentagon Papers 1971 No
Nixon White House Tapes 1974 No
The Office of Special Investigations: Striving for Accountability in the Aftermath of the Holocaust 2010 No
United States diplomatic cables leak 2010 No
Snowden (Global Surveillance Disclosures (2013—present)) 2013 No

Others that I should add to this list?

Is Saying “Nazi” Inflammatory?

Is using the term “Nazi” inflammatory in this context? The only difference between CIA and Nazi torture is the government that ordered or tolerated the torture. Unless you know of some other classification of torture. The United States military apparently doesn’t and I am willing to take their word for it.

Some will say the torturers were “serving the American people.” The same could be and was said by many a death camp guard for the Nazis. Wrapping yourself in a flag, any flag, does not put criminal activity beyond the reach of the law. It didn’t at Nuremberg and it should not work here.


A functioning democracy requires an informed electorate. Not elected officials, not a star chamber group but an informed electorate. To date the American people lack details about illegal torture carried out by a government agency, the CIA. To exercise their rights and responsibilities an an informed electorate, American voters must have full access to the full CIA torture report.

Release of anything less than the full CIA torture report protects torture participants and their superiors. I have no interest in protecting those who engage in illegal activities nor their superiors. As an American citizen, do you?

Experience with prior “sensitive” reports indicates that despite the wailing and gnashing of teeth, the United States will not fall when the guilty are exposed, prosecuted and lead off to jail. This case is no different.

As many retired US generals point out, transparency and accountability are the only ways to keep illegal torture from returning as an instrument of United States policy.

Is there any reason to wait until American torturers are in their nineties, suffering from dementia and living in New Jersey to hold them accountable for their crimes?

I don’t think so either.

PS: When Senator Udall releases the full CIA torture report in the Congressional Record (not to the New York Times or Wikileaks, both of which censor information for reasons best known to themselves), I hereby volunteer to assist in the extraction of names, dates, places and the association of those items with other, pubic data, both in topic map form as well as in other formats.

How about you?

PPS: On the relationship between Nazis and the CIA, see: Nazis Were Given ‘Safe Haven’ in U.S., Report Says. The special report that informed that article: The Office of Special Investigations: Striving for Accountability in the Aftermath of the Holocaust. (A leaked document)

When you compare Aryanism to American Exceptionalism the similarities between the CIA and the Nazi regime are quite pronounced. How could any act that protects the fatherland/homeland be a crime?

data.parliament @ Accountability Hack 2014

Friday, November 7th, 2014

data.parliament @ Accountability Hack 2014 by Zeid Hadi.

From the post:

We are pleased to announce that data.parliament will be providing data to be used during the Accountability Hack 2014

data.parliament is a platform that enables the sharing of UK Parliament’s data with consumers both within and outside of Parliament. Designed to complement existing data services it aims to be the central publishing platform and data repository for data that is produced by Parliament. Note our release is in Alpha.

It provides both a repository ( for data and a Linked Data API ( The platform’s ‘shop front’ or data catalogue can be found here (

The following datasets and APIs are now available on data.parliament

  • Commons Written Parliamentary Questions and Answers
  • Lords Written Parliamentary Questions and Answers
  • Commons Oral Questions and Question Times
  • Early Day Motions
  • Lords Divisions
  • Commons Divisions
  • Commons Members
  • Lords Members
  • Constituencies
  • Briefing Papers
  • Papers Laid

A description of the APIs and their usage can be found at All the data exposed by the endpoints can be returned in a variety of formats not least JSON.

To get you started the team has coded two publically available demonstrators that make use of the data in data.parliament. The source code for these can found at One of the demonstrators, a client app, can be found working at Also be sure to read our blog ( for quick start guides, updates, and news about upcoming datasets.

The data.parliament team will be on hand at the Hack, both participating and networking through the event to gather feedback and ideas..

I don’t know enough about British parliamentary procedure to comment on the completeness of the interface.

I am quite interested in the Briefing Papers data feed:

This dataset contains the data for research briefings produced by the Libraries of the House of Commons and House of Lords and the Parliamentary Office of Science and Technology. Each briefing has a pdf document for the briefing itself as well as a set of metadata to accompany it. (

A great project but even a complete set of documents and transcripts of every word spoken at Parliament does not document relationships between members of Parliment, their relationships to economic interests, etc.

Looking forward to collation of information from this project with other data to form a clearer picture of the legislative process in the UK.

I first saw this in a tweet by data.parliament UK.

Core Econ: a free economics textbook

Wednesday, November 5th, 2014

Core Econ: a free economics textbook by Cathy O’Neil.

From the post:

Today I want to tell you guys about, a free (although you do have to register) textbook my buddy Suresh Naidu is using this semester to teach out of and is also contributing to, along with a bunch of other economists.

(image omitted)

It’s super cool, and I wish a class like that had been available when I was an undergrad. In fact I took an economics course at UC Berkeley and it was a bad experience – I couldn’t figure out why anyone would think that people behaved according to arbitrary mathematical rules. There was no discussion of whether the assumptions were valid, no data to back it up. I decided that anybody who kept going had to be either religious or willing to say anything for money.

Not much has changed, and that means that Econ 101 is a terrible gateway for the subject, letting in people who are mostly kind of weird. This is a shame because, later on in graduate level economics, there really is no reason to use toy models of society without argument and without data; the sky’s the limit when you get through the bullshit at the beginning. The goal of the Core Econ project is to give students a taste for the good stuff early; the subtitle on the webpage is teaching economics as if the last three decades happened.

Skepticism of government economic forecasts and data requires knowledge of the lingo and assumptions of economics. This introduction won’t get you to that level but it is a good starting place.

Enjoy! Officially Out of Beta

Tuesday, October 28th, 2014 Officially Out of Beta

From the post:

The free legislative information website,, is officially out of beta form, and beginning today includes several new features and enhancements. URLs that include will be redirected to The site now includes the following:

New Feature: Resources

  • A new resources section providing an A to Z list of hundreds of links related to Congress
  • An expanded list of “most viewed” bills each day, archived to July 20, 2014

New Feature: House Committee Hearing Videos

  • Live streams of House Committee hearings and meetings, and an accompanying archive to January, 2012

Improvement: Advanced Search

  • Support for 30 new fields, including nominations, Congressional Record and name of member

Improvement: Browse

  • Days in session calendar view
  • Roll Call votes
  • Bill by sponsor/co-sponsor

When the Library of Congress, in collaboration with the U.S. Senate, U.S. House of Representatives and the Government Printing Office (GPO) released as a beta site in the fall of 2012, it included bill status and summary, member profiles and bill text from the two most recent congresses at that time – the 111th and 112th.

Since that time, has expanded with the additions of the Congressional Record, committee reports, direct links from bills to cost estimates from the Congressional Budget Office, legislative process videos, committee profile pages, nominations, historic access reaching back to the 103rd Congress and user accounts enabling saved personal searches. Users have been invited to provide feedback on the site’s functionality, which has been incorporated along with the data updates.

Plans are in place for ongoing enhancements in the coming year, including addition of treaties, House and Senate Executive Communications and the Congressional Record Index.

Field Value Lists:

Use search fields in the main search box (available on most pages), or via the advanced and command line search pages. Use terms or codes from the Field Value Lists with corresponding search fields: Congress [congressId], Action – Words and Phrases [billAction], Subject – Policy Area [billSubject], or Subject (All) [allBillSubjects].

Congresses (44, stops with 70th Congress (1927-1929))

Legislative Subject Terms, Subject Terms (541), Geographic Entities (279), Organizational Names (173). (total 993)

Major Action Codes (98)

Policy Area (33)

Search options:

Search Form: “Choose collections and fields from dropdown menus. Add more rows as needed. Use Major Action Codes and Legislative Subject Terms for more precise results.”

Command Line: “Combine fields with operators. Refine searches with field values: Congresses, Major Action Codes, Policy Areas, and Legislative Subject Terms. To use facets in search results, copy your command line query and paste it into the home page search box.”

Search Tips Overview: “You can search using the quick search available on most pages or via the advanced search page. Advanced search gives you the option of using a guided search form or a command line entry box.” (includes examples)


You can follow this project @congressdotgov.

Orientation to Legal Research & is available both as a seminar (in-person) and webinar (online).


I first saw this at is Out of Beta with New Features by Africa S. Hands.

A $23 million venture fund for the government tech set

Tuesday, September 16th, 2014

A $23 million venture fund for the government tech set by Nancy Scola.

Nancy tells a compelling story of a new VC firm, GovTech, which is looking for startups focused on providing governments with better technology infrastructure.

Three facts from the story stand out:

“The U.S. government buys 10 eBays’ worth of stuff just to operate,” from software to heavy-duty trucking equipment.

…working with government might be a tortuous slog, but Bouganim says that he saw that behind that red tape lay a market that could be worth in the neighborhood of $500 billion a year.

What most people don’t realize is government spends nearly $74 billion on technology annually. As a point of comparison, the video game market is a $15 billion annual market.

See Nancy’s post for the full flavor of the story but it sounds like there is gold buried in government IT.

Another way to look at it is the government is already spending $74 billion a year on technology that is largely an object of mockery and mirth. Effective software may be sufficiently novel and threatening to either attract business or a buy-out.

While you are pondering possible opportunities, existing systems, their structures and data are “subjects” in topic map terminology. Which means topic maps can protect existing contracts and relationships, while delivering improved capabilities and data.

Promote topic maps as “in addition to” existing IT systems and you will encounter less resistance both from within and without the government.

Don’t be squeamish about associating with governments, of whatever side. Their money spends just like everyone else’s. You can ask At&T and IBM about supporting both sides in a conflict.

I first saw this in a tweet by Mike Bracken.

Army can’t track spending on $4.3b system to track spending, IG finds

Sunday, September 14th, 2014

Army can’t track spending on $4.3b system to track spending, IG finds. by Mark Flatten.

From the post:

More than $725 million was spent by the Army on a high-tech network for tracking supplies and expenses that failed to comply with federal financial reporting rules meant to allow auditors to track spending, according to an inspector general’s report issued Wednesday.

The Global Combat Support System-Army, a logistical support system meant to track supplies, spare parts and other equipment, was launched in 1997. In 2003, the program switched from custom software to a web-based commercial software system.

About $95 million was spent before the switch was made, according to the report from the Department of Defense IG.

As of this February, the Army had spent $725.7 million on the system, which is ultimately expected to cost about $4.3 billion.

The problem, according to the IG, is that the Army has failed to comply with a variety of federal laws that require agencies to standardize reporting and prepare auditable financial statements.

The report is full of statements like this one:

PMO personnel provided a system change request, which they indicated would correct four account attributes in July 2014. In addition, PMO personnel provided another system change request they indicated would correct the remaining account attribute (Prior Period Adjustment) in late FY 2015.

PMO = Project Management Office (in this case, of GCSS–Army).

The lack of identification of personnel speaking on behalf of the project or various offices pervades the report. Moreover, the same is true for twenty-seven (27) other reports on issues with this project.

If the sources of statements and information were identified in these reports, then it would be possible to track people across reports and to identify who has failed to follow up on representations made in the reports.

The first step towards accountability is identification of decision makers in audit reports.

Tracking decision makers from one position to another and linking them to specific decisions is a natural application of topic maps.

I first saw this in Links I Liked by Chris Blattman, September 7, 2014.

6,482 Datasets Available

Tuesday, August 26th, 2014

6,482 Datasets Available Across 22 Federal Agencies In Data.json Files by Kin Lane.

From the post:

It has been a few months since I ran any of my federal government data.json harvesting, so I picked back up my work, and will be doing more work around datasets that federal agnecies have been making available, and telling the stories across my network.

I’m still surprised at how many people are unaware that 22 of the top federal agencies have data inventories of their public data assets, available in the root of their domain as a data.json file. This means you can go to many and there is a machine readable list of that agencies current inventory of public datasets.

See Kin’s post for links to the agency data.json files.

You may also want to read: What Happened With Federal Agencies And Their Data.json Files, which details Kin’s earlier efforts with tracking agency data.json files.

Kin points out that these data.json files are governed by: OMB M-13-13 Open Data Policy—Managing Information as an Asset. It’s pretty joyless reading but if you are interested in the the policy details or the requirements agencies must meet, it’s required reading.

If you are looking for datasets to clean up or combine together, it would be hard to imagine a more diverse set to choose from.

FDA Recall Data

Wednesday, July 16th, 2014

OpenFDA Provides Ready Access to Recall Data by Taha A. Kass-Hout.

From the post:

Every year, hundreds of foods, drugs, and medical devices are recalled from the market by manufacturers. These products may be labeled incorrectly or might pose health or safety issues. Most recalls are voluntary; in some cases they may be ordered by the U.S. Food and Drug Administration. Recalls are reported to the FDA, and compiled into its Recall Enterprise System, or RES. Every week, the FDA releases an enforcement report that catalogues these recalls. And now, for the first time, there is an Application Programming Interface (API) that offers developers and researchers direct access to all of the drug, device, and food enforcement reports, dating back to 2004.

The recalls in this dataset provide an illuminating window into both the safety of individual products and the safety of the marketplace at large. Recent reports have included such recalls as certain food products (for not containing the vitamins listed on the label), a soba noodle salad (for containing unlisted soy ingredients), and a pain reliever (for not following laboratory testing requirements).

You will get warnings that this data is “not for clinical use.”

Sounds like a treasure trove of data if you are looking for products still being sold despite being recalled.

Or if you want to advertise for “victims” of faulty products that have been recalled.

I think both of those are non-clinical uses. ;-)

Free Companies House data to boost UK economy

Tuesday, July 15th, 2014

Free Companies House data to boost UK economy

From the post:

Companies House is to make all of its digital data available free of charge. This will make the UK the first country to establish a truly open register of business information.

As a result, it will be easier for businesses and members of the public to research and scrutinise the activities and ownership of companies and connected individuals. Last year (2013/14), customers searching the Companies House website spent £8.7 million accessing company information on the register.

This is a considerable step forward in improving corporate transparency; a key strand of the G8 declaration at the Lough Erne summit in 2013.

It will also open up opportunities for entrepreneurs to come up with innovative ways of using the information.

This change will come into effect from the second quarter of 2015 (April – June).

In a side bar, Business Secretary Vince Cable said in part:

Companies House is making the UK a more transparent, efficient and effective place to do business.

I’m not sure about “efficient,” but providing incentives for lawyers and others to track down insider trading and other business as usual practices and arming them with open data would be a start in the right direction.

I first saw this in a tweet by Hadley Beeman.


Wednesday, July 9th, 2014


From the webpage:

MuckRock is an open news tool powered by state and federal Freedom of Information laws and you: Requests are based on your questions, concerns and passions, and you are free to embed, share and write about any of the verified government documents hosted here. Want to learn more? Check out our about page. MuckRock has been funded in part by grants from the Sunlight Foundation, the Freedom of the Press Foundation and the Knight Foundation.

Join Our Mailing List »

An amazing site.

I found MuckRock while looking for documents released by mistake by DHS. DHS Releases Trove of Documents Related to Wrong “Aurora” in Response to Freedom of Information Act (FOIA) Request (Maybe the DHS needs a topic map?)

I’ve signed up for their mailing list. Thinking about what government lies I want to read. ;-)

Looks like a great place to use your data mining/analysis skills.