Archive for the ‘Government Data’ Category

The US Patent and Trademark Office should switch from documents to data

Sunday, February 15th, 2015

The US Patent and Trademark Office should switch from documents to data by Justin Duncan.

From the post:

The debate over patent reform — one of Silicon Valley’s top legislative priorities — is once again in focus with last week’s introduction of the Innovation Act (H.R. 9) by House Judiciary Committee Chairman Bob Goodlatte (R-Va.), Rep. Peter DeFazio (D-Ore.), Subcommittee on Courts, Intellectual Property, and the Internet Chairman Darrell Issa (R-Calif.) and Ranking Member Jerrold Nadler (D-N.Y.), and 15 other original cosponsors.

The Innovation Act largely takes aim at patent trolls (formally “non-practicing entities”), who use patent litigation as a business strategy and make money by threatening lawsuits against other companies. While cracking down on litigious patent trolls is important, that challenge is only one facet of what should be a larger context for patent reform.

The need to transform patent information into open data deserves some attention, too.

The United States Patent and Trademark Office (PTO), the agency within the Department of Commerce that grants patents and registers trademarks, plays a crucial role in empowering American innovators and entrepreneurs to create new technologies. Ironically, many of the PTO’s own systems and technologies are out of date.

Last summer, Data Transparency Coalition advisor Joel Gurin and his colleagues organized an Open Data Roundtable with the Department of Commerce, co-hosted by the Governance Lab at New York University (GovLab) and the White House Office of Science and Technology Policy (OSTP). The roundtable focused on ways to improve data management, dissemination, and use at the Department of Commerce. It shed some light on problems faced by the PTO.

According to GovLab’s report of the day’s findings and recommendations, the PTO is currently working to improve the use and availability of some patent data by putting it in a more centralized, easily searchable form.

To make patent applications easier to navigate – for inventors, investors, the public, and the agency itself – the PTO should more fully embrace the use of structured data formats, like XML, to express the information currently collected as PDFs or text documents.

Justin’s post is a brief history of efforts to improve access to patent and trademark information, mostly focusing on the need for the USPTO (US Patent and Trademark Office) to stop relying on PDF as its default format.

Other potential improvements:

Additional GovLab recommendations included:

  • PTO [should] make more information available about the scope of patent rights, including expiration dates, or decisions by the agency and/or courts about patent claims.
  • PTO should add more context to its data to make it usable by non-experts – e.g. trademark transaction data and trademark assignment.
  • Provide Application Programming Interfaces (APIs) to enable third parties to build better interfaces for the existing legacy systems. Access to Patent Application Information Retrieval (PAIR) and Patent Trial and Appeal Board (PTAB) data are most important here.
  • Improve access to Cooperative Patent Classification (CPC)/U.S. Patent Classification (USPC) harmonization data; tie this data more closely to economic data to facilitate analysis.

Tying in related information, the first and last recommendations on the GovLab list is another step in the right direction.

But only a step.

If you have ever searched the USPTO patent database you know making the data “searchable” is only a nod and wink towards accessibility. Making the data is nothing to sneeze at but USPTO reform should have a higher target than simple being “searchable.”

Outside of patent search specialists (and not all of them), what ordinary citizen is going to be able to navigate the terms of art across domains when searching patents?

The USPTO should go beyond making patents literally “searchable” and instead make patents “reliably” searchable. By “reliable” searching I mean searching that returns all the relevant patents. A safe harbor if you will that protects inventors, investors and implementers from costly suits arising out of the murky wood filled with traps, intellectual quicksand and formulaic chants that are the USPTO patent database.

I first saw this in a tweet by Joel Gurin.

Federal Spending Data Elements

Sunday, February 15th, 2015

Federal Spending Data Elements

From the webpage:

The data elements in the below list represent the existing Federal Funding Accountability and Transparency Act (FFATA) data elements currently displayed on and the additional data elements that will be posted pursuant to the DATA Act. These elements are currently being deliberated on and discussed by the Federal community as a part of DATA Act implementation. At this point, this list is exhaustive. However, additional data elements may be standardized for transparency reporting in the future based on agency or community needs.

Join the Conversation

At this time, we are asking for comments in response to the following questions:

  1. Which data elements are most crucial to your current reporting and/or analysis?
  2. In setting standards, what are industry standards the Treasury and OMB should be considering?
  3. What are some of the considerations that Treasury and OMB should take into account when establishing data standards?

Just reading the responses to the questions on GitHub will give you a sense of what other community members are thinking about.

What responses are you going to contribute?

I first saw this in a tweet by Hudson Hollister.

Mercury [March 5, 2015, Washington, DC]

Saturday, February 14th, 2015

Mercury Registration Deadline: February 17, 2015.

From the post:

The Intelligence Advanced Research Projects Activity (IARPA) will host a Proposers’ Day Conference for the Mercury Program on March 5, in anticipation of the release of a new solicitation in support of the program. The Conference will be held from 8:30 AM to 5:00 PM EST in the Washington, DC metropolitan area. The purpose of the conference will be to provide introductory information on Mercury and the research problems that the program aims to address, to respond to questions from potential proposers, and to provide a forum for potential proposers to present their capabilities and identify potential team partners.

Program Description and Goals

Past research has found that publicly available data can be used to accurately forecast events such as political crises and disease outbreaks. However, in many cases, relevant data are not available, have significant lag times, or lack accuracy. Little research has examined whether data from foreign Signals Intelligence (SIGINT) can be used to improve forecasting accuracy in these cases.

The Mercury Program seeks to develop methods for continuous, automated analysis of SIGINT in order to anticipate and/or detect political crises, disease outbreaks, terrorist activity, and military actions. Anticipated innovations include: development of empirically driven sociological models for population-level behavior change in anticipation of, and response to, these events; processing and analysis of streaming data that represent those population behavior changes; development of data extraction techniques that focus on volume, rather than depth, by identifying shallow features of streaming SIGINT data that correlate with events; and development of models to generate probabilistic forecasts of future events. Successful proposers will combine cutting-edge research with the ability to develop robust forecasting capabilities from SIGINT data.

Mercury will not fund research on U.S. events, or on the identification or movement of specific individuals, and will only leverage existing foreign SIGINT data for research purposes.

The Mercury Program will consist of both unclassified and classified research activities and expects to draw upon the strengths of academia and industry through collaborative teaming. It is anticipated that teams will be multidisciplinary, and might include social scientists, mathematicians, statisticians, computer scientists, content extraction experts, information theorists, and SIGINT subject matter experts with applied experience in the U.S. SIGINT System.

Attendees must register no later than 6:00 pm EST, February 27, 2015 at Directions to the conference facility and other materials will be provided upon registration. No walk-in registrations will be allowed.

I might be interested if you can hide me under a third or fourth level sub-contractor. ;-)

Seriously, it isn’t that I despair of the legitimate missions of intelligence agencies but I do despise waste on ways known to not work. Government funding, even unlimited funding, isn’t going to magically confer the correct semantics on data or enable analysts to meaningfully share their work products across domains.

You would think going on fourteen (14) years post-9/11 and not being one step closer to preventing a similar event, that would be a “wake-up” call to someone. If not in the U.S. intelligence community, perhaps in intelligence communities who tire of aping the U.S. community with no better results.

OpenGov Voices: Bringing transparency to earmarks buried in the budget

Saturday, February 14th, 2015

OpenGov Voices: Bringing transparency to earmarks buried in the budget by Matthew Heston, Madian Khabsa, Vrushank Vora, Ellery Wulczyn and Joe Walsh.

From the post:

Last week, President Obama kicked off the fiscal year 2016 budget cycle by unveiling his $3.99 trillion budget proposal. Congress has the next eight months to write the final version, leaving plenty of time for individual senators and representatives, state and local governments, corporate lobbyists, bureaucrats, citizens groups, think tanks and other political groups to prod and cajole for changes. The final bill will differ from Obama’s draft in major and minor ways, and it won’t always be clear how those changes came about. Congress will reveal many of its budget decisions after voting on the budget, if at all.

We spent this past summer with the Data Science for Social Good program trying to bring transparency to this process. We focused on earmarks – budget allocations to specific people, places or projects – because they are “the best known, most notorious, and most misunderstood aspect of the congressional budgetary process” — yet remain tedious and time-consuming to find. Our goal: to train computers to extract all the earmarks from the hundreds of pages of mind-numbing legalese and numbers found in each budget.

Watchdog groups such as Citizens Against Government Waste and Taxpayers for Common Sense have used armies of human readers to sift through budget documents, looking for earmarks. The White House Office of Management and Budget enlisted help from every federal department and agency, and the process still took three months. In comparison, our software is free and transparent and generates similar results in only 15 minutes. We used the software to construct the first publicly available database of earmarks that covers every year back to 1995.

Despite our success, we barely scratched the surface of the budget. Not only do earmarks comprise a small portion of federal spending but senators and representatives who want to hide the money they budget for friends and allies have several ways to do it:

I was checking the Sunlight Foundation Blog for any updated information on the soon to be released indexes of federal data holdings when I encountered this jewel on earmarks.

Important to read/support because:

  1. By dramatically reducing the human time investment to find earmarks, it frees up that time to be spent gathering deeper information about each earmark
  2. It represents a major step forward in the ability to discover relationships between players in the data (what the NSA wants to do but with a rationally chosen data set).
  3. It will educate you on earmarks and their hiding places.
  4. It is an inspirational example of how darkness can be replaced with transparency, some of it anyway.

Will transparency reduce earmarks? I rather doubt it because a sense of shame doesn’t seem to motivate elected and appointed officials.

What transparency can do is create a more level playing field for those who want to buy government access and benefits.

For example, if I knew what it cost to have the following exemption in the FOIA:

Exemption 9: Geological information on wells.

it might be possible to raise enough funds to purchase the deletion of:

Exemption 5: Information that concerns communications within or between agencies which are protected by legal privileges, that include but are not limited to:

4 Deliberative Process Privilege

Which is where some staffers hide their negotiations with former staffers as they prepare to exit the government.

I don’t know that matching what Big Oil paid for the geological information on wells exemption would be enough but it would set a baseline for what it takes to start the conversation.

I say “Big Oil paid…” assuming that most of us don’t equate matters of national security with geological information. Do you have another explanation for such an offbeat provision?

If government is (and I think it is) for sale, then let’s open up the bidding process.

A big win for open government: Sunlight gets U.S. to…

Saturday, February 14th, 2015

A big win for open government: Sunlight gets U.S. to release indexes of federal data by Matthew Rumsey and Sean Vitka and John Wonderlich.

From the post:

For the first time, the United States government has agreed to release what we believe to be the largest index of government data in the world.

On Friday, the Sunlight Foundation received a letter from the Office of Management and Budget (OMB) outlining how they plan to comply with our FOIA request from December 2013 for agency Enterprise Data Inventories. EDIs are comprehensive lists of a federal agency’s information holdings, providing an unprecedented view into data held internally across the government. Our FOIA request was submitted 14 months ago.

These lists of the government’s data were not public, however, until now. More than a year after Sunlight’s FOIA request and with a lawsuit initiated by Sunlight about to be filed, we’re finally going to see what data the government holds.

Since 2013, federal agencies have been required to construct a list of all of their major data sets, subject only to a few exceptions detailed in President Obama’s executive order as well as some information exempted from disclosure under the FOIA.

Many kudos to the Sunlight Foundation!

As to using the word “win,” do we need to wait and see what Enterprise Data Inventories are in fact produced?

I say that because the executive order of President Obama that is cited in the post, provides these exemptions from disclosure:

4 (d) (d) Nothing in this order shall compel or authorize the disclosure of privileged information, law enforcement information, national security information, personal information, or information the disclosure of which is prohibited by law.

Will that be taken as an excuse to not list the data collections at all?

Or, will the NSA say:

one (1) collection of telephone metadata, timeSpan: 4 (d) exempt, size: 4 (d) exempt, metadataStructure: 4 (d) exempt source: 4 (d) exempt

Do they mean internal NSA phone logs? Do they mean some other source?

Or will they simply not list telephone metadata at all?

What’s exempt under FOAI? (From

Not all records can be released under the FOIA.  Congress established certain categories of information that are not required to be released in response to a FOIA request because release would be harmful to governmental or private interests.   These categories are called "exemptions" from disclosures.  Still, even if an exemption applies, agencies may use their discretion to release information when there is no foreseeable harm in doing so and disclosure is not otherwise prohibited by law.  There are nine categories of exempt information and each is described below.  

Exemption 1: Information that is classified to protect national security.  The material must be properly classified under an Executive Order.

Exemption 2: Information related solely to the internal personnel rules and practices of an agency.

Exemption 3: Information that is prohibited from disclosure by another federal law. Additional resources on the use of Exemption 3 can be found on the Department of Justice FOIA Resources page.

Exemption 4: Information that concerns business trade secrets or other confidential commercial or financial information.

Exemption 5: Information that concerns communications within or between agencies which are protected by legal privileges, that include but are not limited to:

  1. Attorney-Work Product Privilege
  2. Attorney-Client Privilege
  3. Deliberative Process Privilege
  4. Presidential Communications Privilege

Exemption 6: Information that, if disclosed, would invade another individual’s personal privacy.

Exemption 7: Information compiled for law enforcement purposes if one of the following harms would occur.  Law enforcement information is exempt if it: 

  • 7(A). Could reasonably be expected to interfere with enforcement proceedings
  • 7(B). Would deprive a person of a right to a fair trial or an impartial adjudication
  • 7(C). Could reasonably be expected to constitute an unwarranted invasion of personal privacy
  • 7(D). Could reasonably be expected to disclose the identity of a confidential source
  • 7(E). Would disclose techniques and procedures for law enforcement investigations or prosecutions
  • 7(F). Could reasonably be expected to endanger the life or physical safety of any individual

Exemption 8: Information that concerns the supervision of financial institutions.

Exemption 9: Geological information on wells.

And the exclusions:

Congress has provided special protection in the FOIA for three narrow categories of law enforcement and national security records. The provisions protecting those records are known as “exclusions.” The first exclusion protects the existence of an ongoing criminal law enforcement investigation when the subject of the investigation is unaware that it is pending and disclosure could reasonably be expected to interfere with enforcement proceedings. The second exclusion is limited to criminal law enforcement agencies and protects the existence of informant records when the informant’s status has not been officially confirmed. The third exclusion is limited to the Federal Bureau of Investigation and protects the existence of foreign intelligence or counterintelligence, or international terrorism records when the existence of such records is classified. Records falling within an exclusion are not subject to the requirements of the FOIA. So, when an office or agency responds to your request, it will limit its response to those records that are subject to the FOIA.

You can spot the truck sized holes as well as I can that may prevent disclosure.

One analytic challenge upon the release of the Enterprise Data Inventories will be to determine what is present and what is missing but should be present. Another will be to assist the Sunlight Foundation in its pursuit of additional FOIAs to obtain data listed but not available. Perhaps I should call this an important victory although of a battle and not the long term war for government transparency.


FBI Records: The Vault

Wednesday, February 11th, 2015

FBI Records: The Vault

From the webpage:

The Vault is our new FOIA Library, containing 6,700 documents and other media that have been scanned from paper into digital copies so you can read them in the comfort of your home or office. 

Included here are many new FBI files that have been released to the public but never added to this website; dozens of records previously posted on our site but removed as requests diminished; files from our previous FOIA Library, and new, previously unreleased files.

The Vault includes several new tools and resources for your convenience:

  • Searching for Topics: You can browse or search for specific topics or persons (like Al Capone or Marilyn Monroe) by viewing our alphabetical listing, by using the search tool in the upper right of this site, or by checking the different category lists that can be found in the menu on the right side of this page. In the search results, click on the folder to see all of the files for that particular topic.
  • Searching for Key Words: Thanks to new technology we have developed, you can now search for key words or phrases within some individual files. You can search across all of our electronic files by using the search tool in the upper right of this site, or you can search for key words within a specific document by typing in terms in the search box in the upper right hand of the file after it has been opened and loaded. Note: since many of the files include handwritten notes or are not always in optimal condition due to age, this search feature does not always work perfectly.
  • Viewing the Files: We are now using an open source web document viewer, so you no longer need your own file software to view our records. When you click on a file, it loads in a reader that enables you to view one or two pages at a time, search for key words, shrink or enlarge the size of the text, use different scroll features, and more. In many cases, the quality and clarity of the individual files has also been improved.
  • Requesting a Status Update: Use our new Check the Status of Your FOI/PA Request tool to determine where your request stands in our process. Status information is updated weekly. Note: You need your FOI/PA request number to use this feature.

Please note: the content of the files in the Vault encompasses all time periods of Bureau history and do not always reflect the current views, policies, and priorities of the FBI.

New files will be added on a regular basis, so please check back often.

This may be meant as a distraction but I don’t know from what?

I suppose there is some value in knowing that ineffectual law enforcement investigations did not begin with 9/11.

Encouraging open data usage…

Saturday, February 7th, 2015

Encouraging open data usage by commercial developers: Report

From the post:

The second Share-PSI workshop was very different from the first. Apart from presentations in two short plenary sessions, the majority of the two days was spent in facilitated discussions around specific topics. This followed the success of the bar camp sessions at the first workshop, that is, sessions proposed and organised in an ad hoc fashion, enabling people to discuss whatever subject interests them.

Each session facilitator was asked to focus on three key questions:

  1. What X is the thing that should be done to publish or reuse PSI?
  2. Why does X facilitate the publication or reuse of PSI?
  3. How can one achieve X and how can you measure or test it?

This report summarises the 7 plenary presentations, 17 planned sessions and 7 bar camp sessions. As well as the Share-PSI project itself, the workshop benefited from sessions lead by 8 other projects. The agenda for the event includes links to all papers, slides and notes, with many of those notes being available on the project wiki. In addition, the #sharepsi tweets from the event are archived, as are a number of photo albums from Makx Dekkers,
Peter Krantz and José Luis Roda. The event received a generous write up
on the host’s Web site (in Portuguese). The spirit of the event is captured in this video by Noël Van Herreweghe of CORVe.

To avoid confusion, PSI in this context means Public Sector Information, not Published Subject Identifier (PSI).

Amazing coincidence that the W3C has smudged yet another name. You may recall the W3C decided to confuse URIs and IRIs in its latest attempt to re-write history, calling both the the acronym, URI:

Within this specification, the term URI refers to a Universal Resource Identifier as defined in [RFC 3986] and extended in [RFC 2987] [RFC 3987] with the new name IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as “Base URI” that are defined or referenced across the whole family of XML specifications. (Corrected the RFC listing as shown.) (XQuery and XPath Data Model 3.1 , N. Walsh, J. Snelson, Editors, W3C Candidate Recommendation (work in progress), 18 December 2014, . Latest version available at

Interesting discussion but I would pay very close attention to market demand, perhaps I should say, commercial market demand, before planning a start-up based on government data. There is unlimited demand for free data or even better, free enhanced data, but that should not be confused with enhanced data that can be sold to support a start-up on an ongoing basis.

To give you an idea of the uncertainly of conditions for start-ups relying on open data, let me quote the final bullet points of this article:

  • There is a lack of knowledge of what can be done with open data which is hampering uptake.
  • There is a need for many examples of success to help show what can be done.
  • Any long term re-use of PSI must be based on a business plan.
  • Incubators/accelerators should select projects to support based on the business plan.
  • Feedback from re-users is an important component of the ecosystem and can be used to enhance metadata.
  • The boundary between what the public and private sectors can, should and should not do do needs to be better defined to allow the public sector to focus on its core task and businesses to invest with confidence.
  • It is important to build an open data infrastructure, both legal and technical, that supports the sharing of PSI as part of normal activity.
  • Licences and/or rights statements are essential and should be machine readable. This is made easier if the choice of licences is minimised.
  • The most valuable data is the data that the public sector already charges for.
  • Include domain experts who can articulate real problems in hackathons (whether they write code or not).
  • Involvement of the user community and timely response to requests is essential.
  • There are valid business models that should be judged by their effectiveness and/or social impact rather than financial gain.

Just so you know, that last point:

There are valid business models that should be judged by their effectiveness and/or social impact rather than financial gain.

that is not a business model, unless you have renewal financing from some source other than by financial gain. That is a charity model where you are the object of the charity.

Forty and Seven Inspector Generals Hit a Stone Wall

Thursday, February 5th, 2015

Inspectors general testify against agency ‘stonewalling’ before Congress by Sarah Westwood.

From the post:

Frustration with federal agencies that block probes from their inspectors general bubbled over Tuesday in a congressional hearing that dug into allegations of obstruction from a number of government watchdogs.

The Peace Corps, Environmental Protection Agency and Justice Department inspectors general each argued to members of the House Oversight and Government Reform Committee that some of their investigations had been thwarted or stalled by officials who refused to release necessary information to their offices.

Committee members from both parties doubled down on criticisms of the Justice Department’s lack of transparency and called for solutions to the government-wide problem during their first official hearing of the 114th Congress.

“If you can’t do your job, then we can’t do our job in Congress,” Chairman Jason Chaffetz, R-Utah, told the three witnesses and the scores of agency watchdogs who also attended, including the Department of Homeland Security and General Service Administration inspectors general.

Michael Horowitz, the Justice Department’s inspector general, testified that the FBI began reviewing requested documents in 2010 in what he said was a clear violation of federal law that is supposed to grant watchdogs unfettered access to agency records.

The FBI’s process, which involves clearing the release of documents with the attorney general or deputy attorney general, “seriously impairs inspector general independence, creates excessive delays, and may lead to incomplete, inaccurate or significantly delayed findings or recommendations,” Horowitz said.

Perhaps no surprise that the FBI shows up in the non-transparency column. But given the number of inspector generals with similar problems (47), it seems to be part of a larger herd.

If you are interested in going further into this issue, there was a hearing last August 2014), Obstructing Oversight: Concerns from Inspectors General, which is here in ASCII and here with video and witness statements in PDF.

Both sources omit the following documents:

Sept. 9, 2014, letter to Chairman Issa from OMB submitted by Chairman Issa.. 58
Aug. 5, 2014, letter to Reps. Issa, Cummings, Carper, and Coburn from 47 IGs, submitted by Rep. Chaffetz.. 61
Aug. 8, 2014, letter to OMB from Reps. Carper, Coburn, Issa and Cummings, submitted by Rep. Walberg.. 69
Statement for the record from The Institute of Internal Auditors. 71

Isn’t that rather lame? To leave these items in the table of contents but to omit them from the ASCII version and to not even include them with the witness statements.

I’m curious who the other forty-four (44) inspector generals might be. Aren’t you?

If you know where to find these appendix materials, please send me a pointer.

I think it will be more effective to list all of the Inspector Generals who have encountered this stone wall treatment than treat them as all and sundry.

Chairman Jason Chaffetz suggests that by controlling funding that Congress can force transparency. I would use a finer knife. Cut all funding for health care and retirement benefits in the agencies/departments in question. See how the rank and file in the agencies like them apples.

Assuming transparency results, I would not restore those benefits retroactively. Staff chose to support, explicitly or implicitly, illegal behavior. Making bad choices has negative consequences. It would be a teaching opportunity for all future federal staff members.

[U.S.] President’s Fiscal Year 2016 Budget

Wednesday, February 4th, 2015

Data for the The President’s Fiscal Year 2016 Budget

From the webpage:

Each year, after the President’s State of the Union address, the Office of Management and Budget releases the Administration’s Budget, offering proposals on key priorities and newly announced initiatives. This year we are releasing all of the data included in the President’s Fiscal Year 2016 Budget in a machine-readable format here on GitHub. The Budget process should be a reflection of our values as a country, and we think it’s important that members of the public have as many tools at their disposal as possible to see what is in the President’s proposals. And, if they’re motivated to create their own visualizations or products from the data, they should have that chance as well.

You can see the full Budget on Medium.

About this Repository

This repository includes three data files that contain an extract of the Office of Management and Budget (OMB) budget database. These files can be used to reproduce many of the totals published in the Budget and examine unpublished details below the levels of aggregation published in the Budget.

The user guide file contains detailed information about this data, its format, and its limitations. In addition, OMB provides additional data tables, explanations and other supporting documents in XLS format on its website.

Feedback and Issues

Please submit any feedback or comments on this data, or the Budget process here.

Before you start cheering too loudly, spend a few minutes with the User Guide. Not impenetrable but not an easy stroll either. I suspect the additional data tables, etc. are going to be necessary for interpretation of the main files.

Writing up how to use this data set would be a large but worthwhile undertaking.

A larger in scope but also worthwhile project would be to track how the initial allocations in the budget change through the legislative process. That is to know on a day to day basis, which departments, programs, etc. are up or down. Tied to votes in Congress and particular amendments that could prove to be very interesting.

Update: A tweet from Aaron Kirschenfeld directed us to: The U.S. Tax Code Is a Travesty by John Cassidy. Cassidy says to take a look at table S-9 in the numbers section under “Loophole closers.” The trick to the listed loopholes is that very few people qualify for the loophole. See Cassidy’s post for the details.

Other places that merit special attention?

Update: DHS Budget Justification 2016 (3906 pages, PDF). First saw this in a tweet by Dave Maass.

Project Blue Book Collection (UFO’s)

Thursday, January 22nd, 2015

Project Blue Book Collection

From the webpage:

This site was created by The Black Vault to house 129,491 pages, comprising of more than 10,000 cases of the Project Blue Book, Project Sign and Project Grudge files declassifed. Project Blue Book (along with Sign and Grudge) was the name that was given to the official investigation by the United States military to determine what the Unidentified Flying Object (UFO) phenomena was. It lasted from 1947 – 1969. Below you will find the case files compiled for research, and available free to download.

The CNN report Air Force UFO files land on Internet by Emanuella Grinberg reports Roswell is omitted from these files.

You won’t find anything new here, the files have been available on microfilm for years but being searchable and on the Internet is a step forward in terms of accessibility.

When I say “searchable,” the site notes:

1) A search is a good start — but is not 100% – There are more than 10,000 .pdf files here and although all of them are indexed in the search engine, the quality of the original documents, given the fact that many of them are more than 6 decades old, is very poor. This means that when they are converted to text for searching, many of the words are not readable to a computer. As a tip: make your search as basic as possible. Searching for a location? Just search a city, then the state, to see what comes up. Searching for a type of UFO? Use “saucer” vs. “flying saucer” or longer expression. It will increase the chances of finding what you are looking for.

2) The text may look garbled on the search results page (but not the .pdf!) – This is normal. For the same reason above… converting a sentence that may read ok to the human eye, may be gibberish to a computer due to the quality of the decades old state of many of the records. Don’t let that discourage you. Load the .PDF and see what you find. If you searched for “Hollywood” and a .pdf hit came up for Rome, New York, there is a reason why. The word “Hollywood” does appear in the file…so check it out!

3) Not everything was converted to .pdfs – There are a few case files in the Blue Book system that were simply too large to convert. They are:

undated/xxxx-xx-9667997-[BLANK][ 8,198 Pages ]
undated/xxxx-xx-9669100-[ILLEGIBLE]-[ILLEGIBLE]-/ [ 1,450 Pages ]
undated/xxxx-xx-9669191-[ILLEGIBLE]/ [ 3,710 Pages ]

These files will be sorted at a later date. If you are interested in helping, please email

I tried to access the files not yet processed but was redirected. I will see what is required to see the not yet processed files.

If you are interested in trying your skills at PDF conversion/improvement, the main data set should be more than sufficient.

If you are interested in automatic discovery of what or who was blacked out of government reports, this is also an interesting data set. Personally I think blacking out passages should be forbidden. People should have to accept the consequences of their actions, good or bad. We require that of citizens, why not government staff?

I assume crowd sourcing corrections has already been considered. 130K of pages is a fairly small number when it comes to crowd sourcing. Surely there are more than 10,000 people interested in the data set, which would be 13 pages each. Assuming each one did 100 pages each, you would have more than enough overlap to do statistics to choose the best corrections.

For those of you who see patterns in UFO reports, a good way to reach across the myriad sightings and reports would be to topic map the entire collection.

Personally I suspect at least some of the reports do concern alien surveillance and the absence in the intervening years indicates they have lost interest. Given our performance since the 1940’s, that’s not hard to understand.

Key Court Victory Closer for IRS Open-Records Activist

Friday, January 16th, 2015

Key Court Victory Closer for IRS Open-Records Activist by Suzanne Perry.

From the post:

The open-records activist Carl Malamud has moved a step closer to winning his legal battle to give the public greater access to the wealth of information on Form 990 tax returns that nonprofits file.

During a hearing in San Francisco on Wednesday, U.S. District Judge William Orrick said he tentatively planned to rule in favor of Mr. Malamud’s group, Public. Resource. Org, which filed a lawsuit to force the Internal Revenue Service to release nonprofit tax forms in a format that computers can read. That would make it easier to conduct online searches for data about organizations’ finances, governance, and programs.

“It looks like a win for Public. Resource and for the people who care about electronic access to public documents,” said Thomas Burke, the group’s lawyer.

The suit asks the IRS to release Forms 990 in machine-readable format for nine nonprofits that had submitted their forms electronically. Under current practice, the IRS converts all Forms 990 to unsearchable image files, even those that have been filed electronically.

That’s a step in the right direction but not all that will be required.

Suzanne goes on to note that the IRS removes donor lists from the 990 forms.

Any number of organizations will object but I think the donor lists should be public information as well.

Making all donors public may discourage some people from donating to unpopular causes but that’s a hit I would be willing to take to know who owns the political non-profits. And/or who funds the NRA for example.

Data that isn’t open enough to know who is calling the shots at organizations isn’t open data, its an open data tease.

What Counts: Harnessing Data for America’s Communities

Friday, January 16th, 2015

What Counts: Harnessing Data for America’s Communities Senior Editors: Naomi Cytron, Kathryn L.S. Pettit, & G. Thomas Kingsley. (new book, free pdf)

From: A Roadmap: How To Use This Book

This book is a response to the explosive interest in and availability of data, especially for improving America’s communities. It is designed to be useful to practitioners, policymakers, funders, and the data intermediaries and other technical experts who help transform all types of data into useful information. Some of the essays—which draw on experts from community development, population health, education, finance, law, and information systems—address high-level systems-change work. Others are immensely practical, and come close to explaining “how to.” All discuss the incredibly exciting opportunities and challenges that our ever-increasing ability to access and analyze data provide.

As the book’s editors, we of course believe everyone interested in improving outcomes for low-income communities would benefit from reading every essay. But we’re also realists, and know the demands of the day-to-day work of advancing opportunity and promoting well-being for disadvantaged populations. With that in mind, we are providing this roadmap to enable readers with different needs to start with the essays most likely to be of interest to them.

For everyone, but especially those who are relatively new to understanding the promise of today’s data for communities, the opening essay is a useful summary and primer. Similarly, the final essay provides both a synthesis of the book’s primary themes and a focus on the systems challenges ahead.

Section 2, Transforming Data into Policy-Relevant Information (Data for Policy), offers a glimpse into the array of data tools and approaches that advocates, planners, investors, developers and others are currently using to inform and shape local and regional processes.

Section 3, Enhancing Data Access and Transparency (Access and Transparency), should catch the eye of those whose interests are in expanding the range of data that is commonly within reach and finding ways to link data across multiple policy and program domains, all while ensuring that privacy and security are respected.

Section 4, Strengthening the Validity and Use of Data (Strengthening Validity), will be particularly provocative for those concerned about building the capacity of practitioners and policymakers to employ appropriate data for understanding and shaping community change.

The essays in section 5, Adopting More Strategic Practices (Strategic Practices), examine the roles that practitioners, funders, and policymakers all have in improving the ways we capture the multi-faceted nature of community change, communicate about the outcomes and value of our work, and influence policy at the national level.

There are of course interconnections among the essays in each section. We hope that wherever you start reading, you’ll be inspired to dig deeper into the book’s enormous richness, and will join us in an ongoing conversation about how to employ the ideas in this volume to advance policy and practice.

Thirty-one (31) essays by dozens of authors on data and its role in public policy making.

From the acknowledgements:

This book is a joint project of the Federal Reserve Bank of San Francisco and the Urban Institute. The Robert Wood Johnson Foundation provided the Urban Institute with a grant to cover the costs of staff and research that were essential to this project. We also benefited from the field-building work on data from Robert Wood Johnson grantees, many of whom are authors in this volume.

If you are pitching data and/or data projects where the Federal Reserve Bank of San Francisco/Urban Institute set the tone of policy making conversations, a must read. It is likely to have an impact on other policy discussions, but adjusted for local concerns and conventions. You could also use it to shape your local policy discussions.

I first saw this in There is no seamless link between data and transparency by Jennifer Tankard.

Open Addresses

Thursday, January 15th, 2015

Open Addresses

From the homepage:

At Open Addresses, we are bringing together information about the places where we live, work and go about our daily lives. By gathering information provided to us by people about their own addresses, and from open sources on the web, we are creating an open address list for the UK, available to everyone.

Do you want to enter our photography competition?

Or do you want to get involved by submitting an address?

It’s as simple as entering it below.

Addresses are a vital part of the UK’s National Information Infrastructure. Open Addresses will be used by a whole range of individuals and organisations (academics, charities, public sector and private sector). By having accurate information about addresses, we’ll all benefit from getting more of the things we want, and less of the things we don’t.

Datasets as of 10 December 2014 are available for download now. Via BitTorrent so I assume the complete datasets are fairly large. Anyone downloaded them?

If you do download all or part of the records, curious what other public data sets would you combine with them?

SODA Developers

Wednesday, January 14th, 2015

SODA Developers

From the webpage:

The Socrata Open Data API allows you to programatically access a wealth of open data resources from governments, non-profits, and NGOs around the world.

I have mentioned Socrata and their Open Data efforts more than once on this blog but I don’t think I have ever pointed to their developer site.

Very much worth spending time here if you are interested in governmental data.

Not that I take any data, government or otherwise, at face value. Data is created and released/leaked for reasons that may or may not coincide with your assumptions or goals. Access to data is just the first step in uncovering whose interests the data represents.

Project Open Data Dashboard

Sunday, January 4th, 2015

Project Open Data Dashboard

From the about page:

This website shows how Federal agencies are performing on the latest Open Data Policy (M-13-13) using the guidance provided by Project Open Data. It also provides many other other tools and resources to help agencies and other interested parties implement their open data programs. Features include:

  • A dashboard to track the progress of agencies implementing Project Open Data on a quarterly basis
  • Automated analysis of URLs provided within metadata to see if the links work as expected
  • A validator for v1.0 and v1.1 of the Project Open Data Metadata Schema
  • A converter to transform CSV files into JSON as defined by the Project Open Data Metadata Schema Link broken as of 4 January 2014. Site notified.
  • An export API to export from the CKAN API and transform the metadata into JSON as defined by the Project Open Data Metadata Schema
  • A changeset viewer to compare a data.json file to the metadata currently available in CKAN (eg

You can learn more by reading the main documentation page.

The main documentation defines the “Number of Datasets” on the dashboard as:

This element accounts for the total number of all datasets listed in the Enterprise Data Inventory. This includes those marked as “Public”, “Non-Public” and “Restricted”.

If you compare the “Milestone – May 31st 2014″ to November, the number of data sets increases in most cases, as you would expect. However, both the Department of Commerce and the Department of Health and Human Services, had decreases in the number of available data sets.

On May 31st, the Department of Commerce listed 20488 data sets but on November 30th, only 372. A decrease of more than 20,000 data sets.

On May 31st, the Department of Health and Human Services listed 1507 data sets but on November 30th, only 1064, a decrease of 443 data sets.

Looking further, the sudden decrease for both agencies occurred between Milestone 3 and Milestone 4 (August 31st 2014).

Sounds exciting! Yes?

Yes, but this illustrates why you should “drill down” in data whenever possible. And if not possible in interface, check other sources.

I followed the Department of Commerce link (the first column on the left) to the details of the crawl and thence the data link to determine the number of publicly available data sets.

As of today, 04 January 2014, the Department of Commerce has 23,181 datasets and not the 372 reported for Milestones 5 or the 268 reported for Milestone 4.

As of today, 04 January 2014, the Department of Health and Human Services has 1,672 datasets and not the 1064 reported for Milestones 5 or the 1088 reported for Milestone 4.

The reason(s) for the differences are unclear and the dashboard itself offers no explanation for the disparate figures. I suspect there is some glitch in the automatic harvesting of the information and/or in the representation of those results in the dashboard.

Always remember that just because a representation* claims some “fact,” that doesn’t necessarily make it so.

*Representation: Bear in mind that anything you see on a computer screen is a “representation.” There isn’t anything in data storage that has any resemblance to what you see on the screen. Choices have been made out of your sight as to how information will be represented to you.

As I mentioned yesterday, there is a common and naive assumption that data as represented to us has a reliable correspondence with data held in storage. And that the data held in storage has a reliable correspondence to data as entered or obtained from other sources.

Those assumptions aren’t unreasonable, at least until they are. Can you think of ways to illustrate those principles? I ask because at least one way to illustrate those principles makes an excellent case for open source software. More on that anon.

U.S. Appropriations by Fiscal Year

Wednesday, December 31st, 2014

U.S. Appropriations by Fiscal Year

Congressdotgov tweeted about this resource earlier today.

It’s a great starting place for research on U.S. appropriations but it is more of a bulk resource than a granular one.

You will have to wade through this resource and many others to piece together some of the details on any particular line item in the budget. Not surprisingly, anyone interested in the same line item will have to repeat that mechanical process. For every line in the budget.

There are collected resources on different aspects of the budget process, hearing documents, campaign donation records, etc. but they are for the most part all separated and not easily collated. Perhaps that is due to lack of foresight. Perhaps.

In any event, it is a starting place if you have a particular line item in mind. Think about creating a result that can be re-used and shared if at all possible.

Collection of CRS reports released to the public

Friday, December 19th, 2014

Collection of CRS reports released to the public by Kevin Kosar.

From the post:

Something rare has occurred—a collection of reports authored by the Congressional Research Service has been published and made freely available to the public. The 400-page volume, titled, “The Evolving Congress,” and was produced in conjunction with CRS’s celebration of its 100th anniversary this year. Congress, not CRS, published it. (Disclaimer: Before departing CRS in October, I helped edit a portion of the volume.)

The Congressional Research Service does not release its reports publicly. CRS posts its reports at, a website accessible only to Congress and its staff. The agency has a variety of reasons for this policy, not least that its statute does not assign it this duty. Congress, with ease, could change this policy. Indeed, it already makes publicly available the bill digests (or “summaries”) CRS produces at

The Evolving Congress” is a remarkable collection of essays that cover a broad range of topic. Readers would be advised to start from the beginning. Walter Oleszek provides a lengthy essay on how Congress has changed over the past century. Michael Koempel then assesses how the job of Congressman has evolved (or devolved depending on one’s perspective). “Over time, both Chambers developed strategies to reduce the quantity of time given over to legislative work in order to accommodate Members’ other duties,” Koempel observes.

The NIH (National Institutes of Health) requires that NIH funded research be made available to the public. Other government agencies are following suite. Isn’t it time for the Congressional Research Service to make its publicly funded research available to the public that paid for it?

Congress needs to require it. Contact your member of Congress today. Ask for all Congressional Research Service reports, past, present and future be made available to the public.

You have already paid for the reports, why shouldn’t you be able to read them?

Senate Joins House In Publishing Legislative Information In Modern Formats [No More Sneaking?]

Friday, December 19th, 2014

Senate Joins House In Publishing Legislative Information In Modern Formats by Daniel Schuman.

From the post:

There’s big news from today’s Legislative Branch Bulk Data Task Force meeting. The United States Senate announced it would begin publishing text and summary information for Senate legislation, going back to the 113th Congress, in bulk XML. It would join the House of Representatives, which already does this. Both chambers also expect to have bill status information available online in XML format as well, but a little later on in the year.

This move goes a long way to meet the request made by a coalition of transparency organizations, which asked for legislative information be made available online, in bulk, in machine-processable formats. These changes, once implemented, will hopefully put an end to screen scraping and empower users to build impressive tools with authoritative legislative data. A meeting to spec out publication methods will be hosted by the Task Force in late January/early February.

The Senate should be commended for making the leap into the 21st century with respect to providing the American people with crucial legislative information. We will watch closely to see how this is implemented and hope to work with the Senate as it moves forward.

In addition, the Clerk of the House announced significant new information will soon be published online in machine-processable formats. This includes data on nominees, election statistics, and members (such as committee assignments, bioguide IDs, start date, preferred name, etc.) Separately, House Live has been upgraded so that all video is now in H.264 format. The Clerk’s website is also undergoing a redesign.

The Office of Law Revision Counsel, which publishes the US Code, has further upgraded its website to allow pinpoint citations for the US Code. Users can drill down to the subclause level simply by typing the information into their search engine. This is incredibly handy.

This is great news!

Law is a notoriously opaque domain and the process of creating it even more so. Getting the data is a great first step, parsing out steps in the process and their meaning is another. To say nothing of the content of the laws themselves.

Still, progress is progress and always welcome!

Perhaps citizen review will stop the Senate from sneaking changes past sleepy members of the House.

GovTrack’s Summer/Fall Updates

Thursday, December 18th, 2014

GovTrack’s Summer/Fall Updates by Josh Tauberer.

From the post:

Here’s what’s been improved on GovTrack in the summer and fall of this year.


  • Permalinks to individual paragraphs in bill text is now provided (example).
  • We now ask for your congressional district so that we can customize vote and bill pages to show how your Members of Congress voted.
  • Our bill action/status flow charts on bill pages now include activity on certain related bills, which are often crucially important to the main bill.
  • The bill cosponsors list now indicates when a cosponsor of a bill is no longer serving (i.e. because of retirement or death).
  • We switched to gender neutral language when referring to Members of Congress. Instead of “congressman/woman”, we now use “representative.”
  • Our historical votes database (1979-1989) from was refreshed to correct long-standing data errors.
  • We dropped support for Internet Explorer 6 in order to address with POODLE SSL security vulnerability that plagued most of the web.
  • We dropped support for Internet Explorer 7 in order to allow us to make use of more modern technologies, which has always been the point of GovTrack.

The comment I posted was:

Great work! But I read the other day about legislation being “snuck” by the House (Senate changes), US Congress OKs ‘unprecedented’ codification of warrantless surveillance.

Do you have plans for a diff utility that warns members of either house of changes to pending legislation?

In case you aren’t familiar with

From the about page:, a project of Civic Impulse, LLC now in its 10th year, is one of the worldʼs most visited government transparency websites. The site helps ordinary citizens find and track bills in the U.S. Congress and understand their representatives’ legislative record.

In 2013, was used by 8 million individuals. We sent out 3 million legislative update email alerts. Our embeddable widgets were deployed on more than 80 official websites of Members of Congress.

We bring together the status of U.S. federal legislation, voting records, congressional district maps, and more (see the table at the right).
and make it easier to understand. Use GovTrack to track bills for updates or get alerts about votes with email updates and RSS feeds. We also have unique statistical analyses to put the information in context. Read the «Analysis Methodology».

GovTrack openly shares the data it brings together so that other websites can build other tools to help citizens engage with government. See the «Developer Documentation» for more.

Melville House to Publish CIA Torture Report:… [Publishing Gone Awry?]

Tuesday, December 16th, 2014

Melville House to Publish CIA Torture Report: An Interview with Publisher Dennis Johnson by Jonathon Sturgeon.

From the post:

In what must be considered a watershed moment in contemporary publishing, Brooklyn-based independent publisher Melville House will release the Senate Intelligence Committee’s executive summary of a government report — “Study of the Central Intelligence Agency’s Detention and Interrogation Program” — that is said to detail the monstrous torture methods employed by the Central Intelligence Agency in its counter-terrorism efforts.

Melville House’s co-publisher and co-founder Dennis Johnson has called the report “probably the most important government document of our generation, even one of the most significant in the history of our democracy.”

Melville House’s press release confirms that they are releasing both print and digital editions on December 30, 2014.

As of December 30, 2014, I can read and mark my copy, print or digital and you can mark your copy, print or digital, but no collaboration on the torture report.

For the “…most significant [document] in the history of our democracy” that seems rather sad. That is that each of us is going to be limited to whatever we know or can find out when we are reading our copies of the same report.

If there was ever a report (and there have been others) that merited a collaborative reading/annotation, the CIA Torture Report would be one of them.

Given the large number of people who worked on this report and the diverse knowledge required to evaluate it, that sounds like bad publishing choices. Or at least that there are better publishing choices available.

What about casting the entire report into the form of wiki pages, broken down by paragraphs? Once proofed, the original text can be locked and comments only allowed on the text. Free to view but $fee to comment.

What do you think? Viable way to present such a text? Other ways to host the text?

PS: Unlike other significant government reports, major publishing houses did not receive incentives to print the report. Jerry attributes that to Dianne Feinstein not wanting to favor any particular publisher. That’s one explanation. Another would be that if published in hard copy at all, a small press will mean it fades more quickly from public view. Your call.

US Congress OKs ‘unprecedented’ codification of warrantless surveillance

Tuesday, December 16th, 2014

US Congress OKs ‘unprecedented’ codification of warrantless surveillance by Lisa Vaas.

From the post:

Congress last week quietly passed a bill to reauthorize funding for intelligence agencies, over objections that it gives the government “virtually unlimited access to the communications of every American”, without warrant, and allows for indefinite storage of some intercepted material, including anything that’s “enciphered”.

That’s how it was summed up by Rep. Justin Amash, a Republican from Michigan, who pitched and lost a last-minute battle to kill the bill.

The bill is titled the Intelligence Authorization Act for Fiscal Year 2015.

Amash said that the bill was “rushed to the floor” of the house for a vote, following the Senate having passed a version with a new section – Section 309 – that the House had never considered.

Lisa reports that the bill codifies Executive Order 12333, a Ronald Reagan remnant from an earlier attempt to dismantle the United States Constitution.

There is a petition underway to ask President Obama to veto the bill. Are you a large bank? Skip the petition and give the President a call.

From Lisa’s report, it sounds like Congress needs a DEW Line for legislation:

Rep. Zoe Lofgren, a California Democrat who voted against the bill, told the National Journal that the Senate’s unanimous passage of the bill was sneaky and ensured that the House would rubberstamp it without looking too closely:

If this hadn’t been snuck in, I doubt it would have passed. A lot of members were not even aware that this new provision had been inserted last-minute. Had we been given an additional day, we may have stopped it.

How do you “sneak in” legislation in a public body?

Suggestions on an early warning system for changes to legislation between the two houses of Congress?

Global Open Data Index

Friday, December 12th, 2014

Global Open Data Index

From the about page:

For more information on the Open Data Index, you may contact the team at:

Each year, governments are making more data available in an open format. The Global Open Data Index tracks whether this data is actually released in a way that is accessible to citizens, media and civil society and is unique in crowd-sourcing its survey of open data releases around the world. Each year the open data community and Open Knowledge produces an annual ranking of countries, peer reviewed by our network of local open data experts.

Crowd-sourcing this data provides a tool for communities around the world to learn more about the open data available locally and by country, and ensures that the results reflect the experience of civil society in finding open information, rather than government claims. it also ensures that those who actually collect the information that builds the Index are the very people who use the data and are in a strong position to advocate for more and higher quality open data.

The Global Open Data Index measures and benchmarks the openness of data around the world, and then presents this information in a way that is easy to understand and use. This increases its usefulness as an advocacy tool and broadens its impact.

In 2014 we are expanding to more countries (from 70 in 2013) with an emphasis on countries of the Global South.

See the blog post launching the 2014 Index. For more information, please see the FAQ and the methodology section. Join the conversation with our Open Data Census discussion list.

It is better to have some data rather than none but look at the data by which countries are ranked for openness:

Transport Timetables, Government Budget, Government Spending, Election Results, Company Register, National Map, National Statistics, Postcodes/Zipcodes, Pollutant Emissions.

A listing of data that results in the United Kingdom with a 97% score and first place.

It is hard to imagine a less threatening set of data than those listed. I am sure someone will find a use for them but in the great scheme of things, they are a distraction from the data that isn’t being released.

Off-hand, in the United States at least, public data should include who meets with appointed or elected members of government along with transcripts of those meetings (including phone calls). It should also include all personal or corporate donations made to any organization for any reason of greater than $100.00. It should include documents prepared and/or submitted to the U.S. government and its agencies. And those are just the ones that come to mind rather quickly.

Current disclosures by the U.S. government are a fiction of openness that conceals a much larger dark data set, waiting to be revealed at some future date.

I first saw this in a tweet by ChemConnector.

Timeline of sentences from the CIA Torture Report

Wednesday, December 10th, 2014

Chris R. Albon has created a timeline of sentences from the CIA torture report!


1997,”The FBI information included that al-Mairi’s brother “”traveled to Afghanistan in 1997-1998 to train in Bin – Ladencamps.”””
1997,”The FBI information included that al-Mairi’s brother “”traveled to Afghanistan in 1997-1998 to train in Bin – Ladencamps.”””
1997,”For example, on October 12, 2004, another CIA detainee explained how he met al-Kuwaiti at a guesthouse that was operated by Ibn Shaykh al-Libi and Abu Zubaydah in 1997.”

Cleanly imports into Apache OpenOffice Calc and is 6163 rows (after subtracting the header).

Please acknowledge Chris if you use this data.

What other data would you pull from the executive summary?

What other data do you think would convince Senator Udall to release the entire 6,000 page report?

A Tranche of Climate Data

Wednesday, December 10th, 2014

FACT SHEET: Harnessing Climate Data to Boost Ecosystem & Water Resilience

From the document:

Today, the Administration is making a new tranche of data about ecosystems and water resilience available as part of the Climate Data Initiative—including key datasets related water quality, streamflow, land cover, soils, and biodiversity.

In addition to the datasets being added today to, the Department of Interior (DOI) is launching a suite of geospatial mapping tools on that will enable users to visualize and overlay datasets related to ecosystems, land use, water, and wildlife. Together, the data and tools unleashed today will help natural-resource managers, decision makers, and communities on the front lines of climate change build resilience to climate impacts and better plan for the future. (emphasis added)

I had to look up “tranche.” Google offers: “a portion of something, especially money.”

Assume that your contacts and interactions with both sites are monitored and recorded.

Treasury Island: the film

Tuesday, November 25th, 2014

Treasury Island: the film by Lauren Willmott, Boyce Keay, and Beth Morrison.

From the post:

We are always looking to make the records we hold as accessible as possible, particularly those which you cannot search for by keyword in our catalogue, Discovery. And we are experimenting with new ways to do it.

The Treasury series, T1, is a great example of a series which holds a rich source of information but is complicated to search. T1 covers a wealth of subjects (from epidemics to horses) but people may overlook it as most of it is only described in Discovery as a range of numbers, meaning it can be difficult to search if you don’t know how to look. There are different processes for different periods dating back to 1557 so we chose to focus on records after 1852. Accessing these records requires various finding aids and multiple stages to access the papers. It’s a tricky process to explain in words so we thought we’d try demonstrating it.

We wanted to show people how to access these hidden treasures, by providing a visual aid that would work in conjunction with our written research guide. Armed with a tablet and a script, we got to work creating a video.

Our remit was:

  • to produce a video guide no more than four minutes long
  • to improve accessibility to these records through a simple, step-by–step process
  • to highlight what the finding aids and documents actually look like

These records can be useful to a whole range of researchers, from local historians to military historians to social historians, given that virtually every area of government action involved the Treasury at some stage. We hope this new video, which we intend to be watched in conjunction with the written research guide, will also be of use to any researchers who are new to the Treasury records.

Adding video guides to our written research guides are a new venture for us and so we are very keen to hear your feedback. Did you find it useful? Do you like the film format? Do you have any suggestions or improvements? Let us know by leaving a comment below!

This is a great illustration that data management isn’t something new. The Treasury Board has kept records since 1557 and has accumulated a rather extensive set of materials.

The written research guide looks interesting but since I am very unlikely to ever research Treasury Board records, I am unlikely to need it.

However, the authors have anticipated that someone might be interested in process of record keeping itself and so provided this additional reference:

Thomas L Heath, The Treasury (The Whitehall Series, 1927, GP Putnam’s Sons Ltd, London and New York)

That would be an interesting find!

I first saw this in a tweet by Andrew Janes.

Would You Protect Nazi Torturers And Their Superiors?

Saturday, November 15th, 2014

If you answered “Yes,” this post won’t interest you.

If you answered “No,” read on:

Senator Mark Udall faces the question: “Would You Protect Nazi Torturers And Their Superiors?” as reported by Mike Masnick in:

Mark Udall’s Open To Releasing CIA Torture Report Himself If Agreement Isn’t Reached Over Redactions.

Mike writes in part:

As we were worried might happen, Senator Mark Udall lost his re-election campaign in Colorado, meaning that one of the few Senators who vocally pushed back against the surveillance state is about to leave the Senate. However, Trevor Timm pointed out that, now that there was effectively “nothing to lose,” Udall could go out with a bang and release the Senate Intelligence Committee’s CIA torture report. The release of some of that report (a redacted version of the 400+ page “executive summary” — the full report is well over 6,000 pages) has been in limbo for months since the Senate Intelligence Committee agreed to declassify it months ago. The CIA and the White House have been dragging out the process hoping to redact some of the most relevant info — perhaps hoping that a new, Republican-controlled Senate would just bury the report.

Mike details why Senator Udall’s recent reelection defeat makes release of the report, either in full or in summary, a distinct possibility.

In addition to Mike’s report, here is some additional information you may find useful

Contact Information for Senator Udall

Senator Mark Udall
Hart Office Building Suite SH-730
Washington, D.C. 20510

P: 202-224-5941
F: 202-224-6471

An informed electorate is essential to the existence of self-governance.

No less a figure than Thomas Jefferson spoke about the star chamber proceedings we now take for granted saying:

An enlightened citizenry is indispensable for the proper functioning of a republic. Self-government is not possible unless the citizens are educated sufficiently to enable them to exercise oversight. It is therefore imperative that the nation see to it that a suitable education be provided for all its citizens. It should be noted, that when Jefferson speaks of “science,” he is often referring to knowledge or learning in general. “I know no safe depositary of the ultimate powers of the society but the people themselves; and if we think them not enlightened enough to exercise their control with a wholesome discretion, the remedy is not to take it from them, but to inform their discretion by education. This is the true corrective of abuses of constitutional power.” –Thomas Jefferson to William C. Jarvis, 1820. ME 15:278

“Every government degenerates when trusted to the rulers of the people alone. The people themselves, therefore, are its only safe depositories. And to render even them safe, their minds must be improved to a certain degree.” –Thomas Jefferson: Notes on Virginia Q.XIV, 1782. ME 2:207

“The most effectual means of preventing [the perversion of power into tyranny are] to illuminate, as far as practicable, the minds of the people at large, and more especially to give them knowledge of those facts which history exhibits, that possessed thereby of the experience of other ages and countries, they may be enabled to know ambition under all its shapes, and prompt to exert their natural powers to defeat its purposes.” –Thomas Jefferson: Diffusion of Knowledge Bill, 1779. FE 2:221, Papers 2:526

Jefferson didn’t have to contend with Middle East terrorists, only the English terrorizing the country side. Since more Americans died in British prison camps than in the Revolution proper, I would say they were as bad as terrorists. Prisoners of war in the American Revolutionary War

Noise about the CIA torture program post 9/11 is plentiful. But the electorate, that would be voters in the United States, lack facts about the CIA torture program, its oversight (or lack thereof) and those responsible for torture, from top to bottom. There isn’t enough information to “connect the dots,” a common phrase in the intelligence community.

Connecting those dots are what could bring the accountability and transparency necessary to prevent torture from returning as an instrument of US policy.

Thirty retired generals are urging President Obama to declassify the Senate Intelligence Committee’s report on CIA torture, arguing that without accountability and transparency the practice could be resumed. (Even Generals in US Military Oppose CIA Torture)

Hiding the guilty will produce an expectation of potential future torturers that they too will get a free pass on torture.

Voters are responsible for turning out those who authorized the use of torture and to hold their subordinates are held accountable for their crimes. To do so voters must have the information contained in the full CIA torture report.

Release of the Full CIA Torture Report: No Doom and Gloom

Senator Udall should ignore speculation that release of the full CIA torture report will “doom the nation.”


There have been similar claims in the past and none of them, not one, has ever proven to be true. Here are some of the ones that I remember personally:

Documents Released Date Nation Doomed?
Pentagon Papers 1971 No
Nixon White House Tapes 1974 No
The Office of Special Investigations: Striving for Accountability in the Aftermath of the Holocaust 2010 No
United States diplomatic cables leak 2010 No
Snowden (Global Surveillance Disclosures (2013—present)) 2013 No

Others that I should add to this list?

Is Saying “Nazi” Inflammatory?

Is using the term “Nazi” inflammatory in this context? The only difference between CIA and Nazi torture is the government that ordered or tolerated the torture. Unless you know of some other classification of torture. The United States military apparently doesn’t and I am willing to take their word for it.

Some will say the torturers were “serving the American people.” The same could be and was said by many a death camp guard for the Nazis. Wrapping yourself in a flag, any flag, does not put criminal activity beyond the reach of the law. It didn’t at Nuremberg and it should not work here.


A functioning democracy requires an informed electorate. Not elected officials, not a star chamber group but an informed electorate. To date the American people lack details about illegal torture carried out by a government agency, the CIA. To exercise their rights and responsibilities an an informed electorate, American voters must have full access to the full CIA torture report.

Release of anything less than the full CIA torture report protects torture participants and their superiors. I have no interest in protecting those who engage in illegal activities nor their superiors. As an American citizen, do you?

Experience with prior “sensitive” reports indicates that despite the wailing and gnashing of teeth, the United States will not fall when the guilty are exposed, prosecuted and lead off to jail. This case is no different.

As many retired US generals point out, transparency and accountability are the only ways to keep illegal torture from returning as an instrument of United States policy.

Is there any reason to wait until American torturers are in their nineties, suffering from dementia and living in New Jersey to hold them accountable for their crimes?

I don’t think so either.

PS: When Senator Udall releases the full CIA torture report in the Congressional Record (not to the New York Times or Wikileaks, both of which censor information for reasons best known to themselves), I hereby volunteer to assist in the extraction of names, dates, places and the association of those items with other, pubic data, both in topic map form as well as in other formats.

How about you?

PPS: On the relationship between Nazis and the CIA, see: Nazis Were Given ‘Safe Haven’ in U.S., Report Says. The special report that informed that article: The Office of Special Investigations: Striving for Accountability in the Aftermath of the Holocaust. (A leaked document)

When you compare Aryanism to American Exceptionalism the similarities between the CIA and the Nazi regime are quite pronounced. How could any act that protects the fatherland/homeland be a crime?

data.parliament @ Accountability Hack 2014

Friday, November 7th, 2014

data.parliament @ Accountability Hack 2014 by Zeid Hadi.

From the post:

We are pleased to announce that data.parliament will be providing data to be used during the Accountability Hack 2014

data.parliament is a platform that enables the sharing of UK Parliament’s data with consumers both within and outside of Parliament. Designed to complement existing data services it aims to be the central publishing platform and data repository for data that is produced by Parliament. Note our release is in Alpha.

It provides both a repository ( for data and a Linked Data API ( The platform’s ‘shop front’ or data catalogue can be found here (

The following datasets and APIs are now available on data.parliament

  • Commons Written Parliamentary Questions and Answers
  • Lords Written Parliamentary Questions and Answers
  • Commons Oral Questions and Question Times
  • Early Day Motions
  • Lords Divisions
  • Commons Divisions
  • Commons Members
  • Lords Members
  • Constituencies
  • Briefing Papers
  • Papers Laid

A description of the APIs and their usage can be found at All the data exposed by the endpoints can be returned in a variety of formats not least JSON.

To get you started the team has coded two publically available demonstrators that make use of the data in data.parliament. The source code for these can found at One of the demonstrators, a client app, can be found working at Also be sure to read our blog ( for quick start guides, updates, and news about upcoming datasets.

The data.parliament team will be on hand at the Hack, both participating and networking through the event to gather feedback and ideas..

I don’t know enough about British parliamentary procedure to comment on the completeness of the interface.

I am quite interested in the Briefing Papers data feed:

This dataset contains the data for research briefings produced by the Libraries of the House of Commons and House of Lords and the Parliamentary Office of Science and Technology. Each briefing has a pdf document for the briefing itself as well as a set of metadata to accompany it. (

A great project but even a complete set of documents and transcripts of every word spoken at Parliament does not document relationships between members of Parliment, their relationships to economic interests, etc.

Looking forward to collation of information from this project with other data to form a clearer picture of the legislative process in the UK.

I first saw this in a tweet by data.parliament UK.

Core Econ: a free economics textbook

Wednesday, November 5th, 2014

Core Econ: a free economics textbook by Cathy O’Neil.

From the post:

Today I want to tell you guys about, a free (although you do have to register) textbook my buddy Suresh Naidu is using this semester to teach out of and is also contributing to, along with a bunch of other economists.

(image omitted)

It’s super cool, and I wish a class like that had been available when I was an undergrad. In fact I took an economics course at UC Berkeley and it was a bad experience – I couldn’t figure out why anyone would think that people behaved according to arbitrary mathematical rules. There was no discussion of whether the assumptions were valid, no data to back it up. I decided that anybody who kept going had to be either religious or willing to say anything for money.

Not much has changed, and that means that Econ 101 is a terrible gateway for the subject, letting in people who are mostly kind of weird. This is a shame because, later on in graduate level economics, there really is no reason to use toy models of society without argument and without data; the sky’s the limit when you get through the bullshit at the beginning. The goal of the Core Econ project is to give students a taste for the good stuff early; the subtitle on the webpage is teaching economics as if the last three decades happened.

Skepticism of government economic forecasts and data requires knowledge of the lingo and assumptions of economics. This introduction won’t get you to that level but it is a good starting place.

Enjoy! Officially Out of Beta

Tuesday, October 28th, 2014 Officially Out of Beta

From the post:

The free legislative information website,, is officially out of beta form, and beginning today includes several new features and enhancements. URLs that include will be redirected to The site now includes the following:

New Feature: Resources

  • A new resources section providing an A to Z list of hundreds of links related to Congress
  • An expanded list of “most viewed” bills each day, archived to July 20, 2014

New Feature: House Committee Hearing Videos

  • Live streams of House Committee hearings and meetings, and an accompanying archive to January, 2012

Improvement: Advanced Search

  • Support for 30 new fields, including nominations, Congressional Record and name of member

Improvement: Browse

  • Days in session calendar view
  • Roll Call votes
  • Bill by sponsor/co-sponsor

When the Library of Congress, in collaboration with the U.S. Senate, U.S. House of Representatives and the Government Printing Office (GPO) released as a beta site in the fall of 2012, it included bill status and summary, member profiles and bill text from the two most recent congresses at that time – the 111th and 112th.

Since that time, has expanded with the additions of the Congressional Record, committee reports, direct links from bills to cost estimates from the Congressional Budget Office, legislative process videos, committee profile pages, nominations, historic access reaching back to the 103rd Congress and user accounts enabling saved personal searches. Users have been invited to provide feedback on the site’s functionality, which has been incorporated along with the data updates.

Plans are in place for ongoing enhancements in the coming year, including addition of treaties, House and Senate Executive Communications and the Congressional Record Index.

Field Value Lists:

Use search fields in the main search box (available on most pages), or via the advanced and command line search pages. Use terms or codes from the Field Value Lists with corresponding search fields: Congress [congressId], Action – Words and Phrases [billAction], Subject – Policy Area [billSubject], or Subject (All) [allBillSubjects].

Congresses (44, stops with 70th Congress (1927-1929))

Legislative Subject Terms, Subject Terms (541), Geographic Entities (279), Organizational Names (173). (total 993)

Major Action Codes (98)

Policy Area (33)

Search options:

Search Form: “Choose collections and fields from dropdown menus. Add more rows as needed. Use Major Action Codes and Legislative Subject Terms for more precise results.”

Command Line: “Combine fields with operators. Refine searches with field values: Congresses, Major Action Codes, Policy Areas, and Legislative Subject Terms. To use facets in search results, copy your command line query and paste it into the home page search box.”

Search Tips Overview: “You can search using the quick search available on most pages or via the advanced search page. Advanced search gives you the option of using a guided search form or a command line entry box.” (includes examples)


You can follow this project @congressdotgov.

Orientation to Legal Research & is available both as a seminar (in-person) and webinar (online).


I first saw this at is Out of Beta with New Features by Africa S. Hands.

A $23 million venture fund for the government tech set

Tuesday, September 16th, 2014

A $23 million venture fund for the government tech set by Nancy Scola.

Nancy tells a compelling story of a new VC firm, GovTech, which is looking for startups focused on providing governments with better technology infrastructure.

Three facts from the story stand out:

“The U.S. government buys 10 eBays’ worth of stuff just to operate,” from software to heavy-duty trucking equipment.

…working with government might be a tortuous slog, but Bouganim says that he saw that behind that red tape lay a market that could be worth in the neighborhood of $500 billion a year.

What most people don’t realize is government spends nearly $74 billion on technology annually. As a point of comparison, the video game market is a $15 billion annual market.

See Nancy’s post for the full flavor of the story but it sounds like there is gold buried in government IT.

Another way to look at it is the government is already spending $74 billion a year on technology that is largely an object of mockery and mirth. Effective software may be sufficiently novel and threatening to either attract business or a buy-out.

While you are pondering possible opportunities, existing systems, their structures and data are “subjects” in topic map terminology. Which means topic maps can protect existing contracts and relationships, while delivering improved capabilities and data.

Promote topic maps as “in addition to” existing IT systems and you will encounter less resistance both from within and without the government.

Don’t be squeamish about associating with governments, of whatever side. Their money spends just like everyone else’s. You can ask At&T and IBM about supporting both sides in a conflict.

I first saw this in a tweet by Mike Bracken.