Archive for the ‘Government Data’ Category

What’s Up With Data Padding? (

Wednesday, March 29th, 2017

I forgot to mention in Copyright Troll Hunting – 92,398 Possibles -> 146 Possibles that while using LibreOffice, I deleted a large number of either N/A only or columns not relevant for

Except as otherwise noted, after removal of “no last name,” these fields had N/A for all records except as noted:

  1. L – Implementation Date
  2. M – Effective Date
  3. N – Related RINs
  4. O – Document SubType (Comment(s))
  5. P – Subject
  6. Q – Abstract
  7. R – Status – (Posted, except for 2)
  8. S – Source Citation
  9. T – OMB Approval Number
  10. U – FR Citation
  11. V – Federal Register Number (8 exceptions)
  12. W – Start End Page (8 exceptions)
  13. X – Special Instructions
  14. Y – Legacy ID
  15. Z – Post Mark Date
  16. AA – File Type (1 docx)
  17. AB – Number of Pages
  18. AC – Paper Width
  19. AD – Paper Length
  20. AE – Exhibit Type
  21. AF – Exhibit Location
  22. AG – Document Field_1
  23. AH – Document Field_2, not the Copyright Office, is responsible for the collection and management of comments, including the bulked up export of comments.

From the state of the records, one suspects the “bulking up” is NOT an artifact of the export but represents the storage of each record.

One way to test that theory would be a query on the noise fields via the API for

The documentation for the API is out-dated, the Field References documentation lacks the Document Detail (field AI), which contains the URL to access the comment.

The closest thing I could find was:

fileFormats Formats of the document, included as URLs to download from the API

How easy/hard it will be to download attachments isn’t clear.

BTW, the comment pages themselves are seriously puffed up. Take

Saved to disk: 148.6 KB.

Content of the comment: 2.5 KB.

The content of the comment is 1.6 % of the delivered webpage.

It must have taken serious effort to achieve a 98.4% noise to 1.6% signal ratio.

How transparent is data when you have to mine for the 1.6% that is actual content?

Executive Orders (Bulk Data From Federal Register)

Tuesday, January 31st, 2017

Executive Orders

From the webpage:

The President of the United States manages the operations of the Executive branch of Government through Executive orders. After the President signs an Executive order, the White House sends it to the Office of the Federal Register (OFR). The OFR numbers each order consecutively as part of a series, and publishes it in the daily Federal Register shortly after receipt.

Executive orders issued since 1994 are available as a single bulk download and as a bulk download by President, or you can browse by President and year from the list below. More details about our APIs and other developer tools can be found on our developer pages.

Don’t ignore the developer pages.

Whether friend or foe of the current regime in Washington, the API enables access to all the regulatory material published in the Federal Register. Use it.

It should be especially useful in light of Presidential Executive Order on Reducing Regulation and Controlling Regulatory Costs, which provides in part:

Sec. 2. Regulatory Cap for Fiscal Year 2017. (a) Unless prohibited by law, whenever an executive department or agency (agency) publicly proposes for notice and comment or otherwise promulgates a new regulation, it shall identify at least two existing regulations to be repealed.

Disclaimer: Any resemblance to an executive order is purely coincidental:

The CIA’s Secret History Is Now Online [Indexing, Mapping, NLP Anyone?]

Wednesday, January 18th, 2017

The CIA’s Secret History Is Now Online by Jason Leopold.

From the post:

Decades ago, the CIA declassified a 26-page secret document cryptically titled “clarifying statement to Fidel Castro concerning assassination.”

It was a step toward greater transparency for one of the most secretive of all federal agencies. But to find out what the document actually said, you had to trek to the National Archives in College Park, Maryland, between the hours of 9 a.m. and 4:30 p.m. and hope that one of only four computers designated by the CIA to access its archives would be available.

But today the CIA posted the Castro record on its website along with more than 12 million pages of the agency’s other declassified documents that have eluded the public, journalists, and historians for nearly two decades. You can view the documents here.

The title of the Castro document, as it turns out, was far more interesting than the contents. It includes a partial transcript of a 1977 transcript between Barbara Walters and Fidel Castro in which she asked the late Cuban dictator whether he had “proof” of the CIA’s last attempt to assassinate him. The transcript was sent to Adm. Stansfield Turner, the CIA director at the time, by a public affairs official at the agency with a note highlighting all references to CIA.

But that’s just one of the millions documents, which date from the 1940s to 1990s, are wide-ranging, covering everything from Nazi war crimes to mind-control experiments to the role the CIA played in overthrowing governments in Chile and Iran. There are also secret documents about a telepathy and precognition program known as Star Gate, files the CIA kept on certain media publications, such as Mother Jones, photographs, more than 100,000 pages of internal intelligence bulletins, policy papers, and memos written by former CIA directors.

Michael Best, @NatSecGeek has pointed out the “CIA de-OCRed at least some of the CREST files before they uploaded them.”

Spy agency class petty. Grant public access but force the restoration of text search.

The restoration of text search work is underway so next steps will be indexing, NLP, mapping, etc.

A great set of documents to get ready for future official and unofficial leaks of CIA documents.


PS: Curious if any of the search engine vendors will use CREST as demonstration data? Non-trivial size, interesting search issues, etc.

Ask at the next search conference.

CIA Cartography [Comparison to other maps?]

Monday, November 28th, 2016

CIA Cartography

From the webpage:

Tracing its roots to October 1941, CIA’s Cartography Center has a long, proud history of service to the Intelligence Community (IC) and continues to respond to a variety of finished intelligence map requirements. The mission of the Cartography Center is to provide a full range of maps, geographic analysis, and research in support of the Agency, the White House, senior policymakers, and the IC at large. Its chief objectives are to analyze geospatial information, extract intelligence-related geodata, and present the information visually in creative and effective ways for maximum understanding by intelligence consumers.

Since 1941, the Cartography Center maps have told the stories of post-WWII reconstruction, the Suez crisis, the Cuban Missile crisis, the Falklands War, and many other important events in history.

There you will find:

Cartography Tools 211 photos

Cartography Maps 1940s 22 photos

Cartography Maps 1950s 14 photos

Cartography Maps 1960s 16 photos

Cartography Maps 1970s 19 photos

Cartography Maps 1980s 12 photos

Cartography Maps 1990s 16 photos

Cartography Maps 2000s 16 photos

Cartography Maps 2010s 15 photos

The albums have this motto at the top:

CIA Cartography Center has been making vital contributions to our Nation’s security, providing policymakers with crucial insights that simply cannot be conveyed through words alone.

President-elect Trump is said to be gaining foreign intelligence from sources other than his national security briefings. Trump is ignoring daily intelligence briefings, relying on ‘a number of sources’ instead. That report is based on a Washington Post account, which puts its credibility somewhere between a conversation overhead in a laundry mat and a stump speech by a member of Congress.

Assuming Trump is gaining intelligence from other sources, just how good are other sources of intelligence?

This release of maps by the CIA, some 160 maps spread from the 1940’s to the 2010’s, provides one axis for evaluating CIA intelligence versus what was commonly known at the time.

As a starting point, may I suggest: Image matching for historical maps comparison by C. Balletti and F. Guerrae, Perimetron, Vol. 4, No. 3, 2009 [180-186] | ISSN 1790-3769?


In cartographic heritage we suddenly find maps of the same mapmaker and of the same area, published in different years, or new editions due to integration of cartographic, such us in national cartographic series. These maps have the same projective system and the same cut, but they present very small differences. The manual comparison can be very difficult and with uncertain results, because it’s easy to leave some particulars out. It is necessary to find an automatic procedure to compare these maps and a solution can be given by digital maps comparison.

In the last years our experience in cartographic data processing was opted for find new tools for digital comparison and today solution is given by a new software, ACM (Automatic Correlation Map), which finds areas that are candidate to contain differences between two maps. ACM is based on image matching, a key component in almost any image analysis process.

Interesting paper but it presupposes a closeness of the maps that is likely to be missing when comparing CIA maps to other maps of the same places and time period.

I am in the process of locating other tools for map comparison.

Any favorites you would like to suggest?

OPM Farce Continues – 2016 Inspector General Report

Monday, November 21st, 2016

U.S. Office of Personnel Management – Office of the Inspector General – Office of Audits

The Office of Personnel Management hack was back in the old days when China was being blamed for every hack. There’s no credible evidence of that but the Chinese were blamed in any event.

The OMP hack illustrated the danger inherent in appointing campaign staff to run mission critical federal agencies. Just a sampling of the impressive depth of Archuleta’s incompetence, read Flash Audit on OPM Infrastructure Update Plan.

The executive summary of the current report offers little room for hope:

This audit report again communicates a material weakness related to OPM’s Security Assessment and Authorization (Authorization) program. In April 2015, the then Chief Information Officer issued a memorandum that granted an extension of the previous Authorizations for all systems whose Authorization had already expired, and for those scheduled to expire through September 2016. Although the moratorium on Authorizations has since been lifted, the effects of the April 2015 memorandum continue to have a significant negative impact on OPM. At the end of fiscal year (FY) 2016, the agency still had at least 18 major systems without a valid Authorization in place.

However, OPM did initiate an “Authorization Sprint” during FY 2016 in an effort to get all of the agency’s systems compliant with the Authorization requirements. We acknowledge that OPM is once again taking system Authorization seriously. We intend to perform a comprehensive audit of OPM’s Authorization process in early FY 2017.

This audit report also re-issues a significant deficiency related to OPM’s information security management structure. Although OPM has developed a security management structure that we believe can be effective, there has been an extremely high turnover rate of critical positions. The negative impact of these staffing issues is apparent in the results of our current FISMA audit work. There has been a significant regression in OPM’s compliance with FISMA requirements, as the agency failed to meet requirements that it had successfully met in prior years. We acknowledge that OPM has placed significant effort toward filling these positions, but simply having the staff does not guarantee that the team can effectively manage information security and keep OPM compliant with FISMA requirements. We will continue to closely monitor activity in this area throughout FY 2017.

It’s illegal but hacking the OPM remains easier than the NSA.

Hacking the NSA requires a job at Booz Allen and a USB drive.

“connecting the dots” requires dots (Support Michael Best)

Friday, November 11th, 2016

Michael Best is creating a massive archive of government documents.

From the post:

Since 2015, I’ve published millions of government documents (about 10% of the text items on the Internet Archive, with some items containing thousands of documents) and terabytes of data; but in order to keep going, I need your help. Since I’ve gotten started, no outlet has matched the number of government documents that I’ve published and made freely available. The only non-governmental publisher that rivals the size and scope of the government files I’ve uploaded is WikiLeaks. While I analyze and write about these documents, I consider publishing them to be more important because it enables and empowers an entire generation of journalists, researchers and students of history.

I’ve also pressured government agencies into making their documents more widely available. This includes the more than 13,000,000 pages of CIA documents that are being put online soon, partially in response to my Kickstarter and publishing efforts. These documents are coming from CREST, which is a special CIA database of declassified records. Currently, it can only be accessed from four computers in the world, all of them just outside of Washington D.C. These records, which represent more than 3/4 of a million CIA files, will soon be more accessible than ever – but even once that’s done, there’s a lot more work left to do.

Question: Do you want a transparent and accountable Trump presidency?

Potential Answers include:

1) Yes, but I’m going to spend time and resources hyper-ventilating with others and roaming the streets.

2) Yes, and I’m going to support Michael Best and FOIA efforts.

Governments, even Trump’s presidency, don’t spring from ocean foam.


The people chosen fill cabinet and other posts have history, in many cases government history.

For example, I heard a rumor today that Ed Meese, a former government crime lord, is on the Trump transition team. Hell, I thought he was dead.

Michael’s efforts produce the dots that connect past events, places, people, and even present administrations.

The dots Michael produces may support your expose, winning story and/or indictment.

Are you in or out?

Attn: Secrecy Bed-Wetters! All Five Volumes of Bay of Pigs History Released!

Thursday, November 3rd, 2016

Hand-wringers and bed-wetters who use government secrecy to hide incompetence and errors will sleep less easy tonight.

All Five Volumes of Bay of Pigs History Released and Together at Last: FRINFORMSUM 11/3/2016 by Lauren Harper.

From the post:

After more than twenty years, it appears that fear of exposing the Agency’s dirty linen, rather than any significant security information, is what prompts continued denial of requests for release of these records. Although this volume may do nothing to modify that position, hopefully it does put one of the nastiest internal power struggles into proper perspective for the Agency’s own record.” This is according to Agency historian Jack Pfeiffer, author of the CIA’s long-contested Volume V of its official history of the Bay of Pigs invasion that was released after years of work by the National Security Archive to win the volume’s release. Chief CIA Historian David Robarge states in the cover letter announcing the document’s release that the agency is “releasing this draft volume today because recent 2016 changes in the Freedom of Information Act (FOIA) requires us to release some drafts that are responsive to FOIA requests if they are more than 25 years old.” This improvement – codified by the FOIA Improvement Act of 2016 – came directly from the National Security Archive’s years of litigation.

The CIA argued in court for years – backed by Department of Justice lawyers – that the release of this volume would “confuse the public.” National Security Archive Director Tom Blanton says, “Now the public gets to decide for itself how confusing the CIA can be. How many thousands of taxpayer dollars were wasted trying to hide a CIA historian’s opinion that the Bay of Pigs aftermath degenerated into a nasty internal power struggle?”

To read all five volumes of the CIA’s Official History of the Bay of Pigs Operation – together at last – visit the National Security Archive’s website.

Even the CIA’s own retelling of the story, The Bay of Pigs Invasion, ends with a chilling reminder for all “rebels” being presently supported by the United States.

Brigade 2506’s pleas for air and naval support were refused at the highest US Government levels, although several CIA contract pilots dropped munitions and supplies, resulting in the deaths of four of them: Pete Ray, Leo Baker, Riley Shamburger, and Wade Gray.

Kennedy refused to authorize any extension beyond the hour granted. To this day, there has been no resolution as to what caused this discrepancy in timing.

Without direct air support—no artillery and no weapons—and completely outnumbered by Castro’s forces, members of the Brigade either surrendered or returned to the turquoise water from which they had come.

Two American destroyers attempted to move into the Bay of Pigs to evacuate these members, but gunfire from Cuban forces made that impossible.

In the following days, US entities continued to monitor the waters surrounding the bay in search of survivors, with only a handful being rescued. A few members of the Brigade managed to escape and went into hiding, but soon surrendered due to a lack of food and water. When all was said and done, more than seventy-five percent of Brigade 2506 ended up in Cuban prisons.

100% captured or killed. There’s an example of US support.

In a less media savvy time, the US did pay $53 million (in 1962 dollars, about $424 million today) for the release of 1113 members of Brigade 2506.

Another important fact is that fifty-seven (57) years of delay enabled the participants to escape censure and/or a trip to the gallows for their misdeeds and crimes.

Let’s not let that happen with the full CIA Torture Report. Even the sanitized 6,700 page version would be useful. More so the documents upon which it was based.

All of that exists somewhere. We lack a person with access and moral courage to inform their fellow citizens of the full truth about the CIA torture program. So far.

Update: Michael Best, NatSecGeek advises CIA Histories has the most complete CIA history collection. Thanks Michael!

Hackers May Fake Documents, Congress Publishes False Ones

Monday, September 19th, 2016

I pointed out in Lions, Tigers, and Lies! Oh My! that Bruce Schneier‘s concerns over the potential for hackers faking documents to be leaked pales beside the mis-information distributed by government.

Executive Summary of Review of the Unauthorized Disclosures of Former National Security Agency Contractor Edward Snowden (their title, not mine), is a case in point.

Barton Gellman in The House Intelligence Committee’s Terrible, Horrible, Very Bad Snowden Report leaves no doubt the House Permanent Select Committee on Intelligence (HPSCI) report is a sack of lies.

Not mistakes, not exaggerations, not simply misleading, but actual, factual lies.

For example:

Since I’m on record claiming the report is dishonest, let’s skip straight to the fourth section. That’s the one that describes Snowden as “a serial exaggerator and fabricator,” with “a pattern of intentional lying.” Here is the evidence adduced for that finding, in its entirety.

“He claimed to have obtained a high school degree equivalent when in fact he never did.”

I do not know how the committee could get this one wrong in good faith. According to the official Maryland State Department of Education test report, which I have reviewed, Snowden sat for the high school equivalency test on May 4, 2004. He needed a score of 2250 to pass. He scored 3550. His Diploma No. 269403 was dated June 2, 2004, the same month he would have graduated had he returned to Arundel High School after losing his sophomore year to mononucleosis. In the interim, he took courses at Anne Arundel Community College.

See Gellman’s post for more examples.

All twenty-two members of the HPSCI signed the report. To save you time in the future, here’s a listing of the members of Congress who agreed to report these lies:



I sorted each group in to alphabetical order. The original listings were in an order that no doubt makes sense to fellow rodents but not to the casual reader.

That’s twenty-two members of Congress who are willing to distribute known falsehoods.

Does anyone have an equivalent list of hackers? Corrects Clinton-Impeachment Search Results

Monday, September 19th, 2016

After posting Search Alert: “…previous total of 261 to the new total of 0.” [Solved] yesterday, pointing out that a change from http:// to https:// altered a search result for Clinton w/in 5 words impeachment, I got an email this morning:


I appreciate the update and correction for saved searches, but my point about remote data changing without notice to you remains valid.

I’m still waiting for word on bulk downloads from both Wikileaks and DC Leaks.

Why leak information vital to public discussion and then limit access to search? Search Alert: “…previous total of 261 to the new total of 0.” [Solved]

Sunday, September 18th, 2016

Odd message from the search alert this AM:


Here’s the search I created back in June, 2016:


My probably inaccurate recall at the moment was I was searching for some quote from the impeachment of Bill Clinton and was too lazy to specify a term of congress, hence:

all congresses – searching for Clinton within five words, impeachment

Fairly trivial search that produced 261 “hits.”

I set the search alert more to explore the search options than any expectation of different future results.

Imagine my surprise to find that all congresses – searching for Clinton within five words, impeachment performed today, results in 0 “hits.”

Suspecting some internal changes to the search interface, I re-entered the search today and got 0 “hits.”

Other saved searches with radically different search results as of today?

This is not, repeat not, the result of some elaborate conspiracy to assist Secretary Clinton in her bid for the presidency.

I do think something fundamental has gone wrong with searching at and it needs to be fixed.

This is an illustration of why Wikileaks, DC Leaks and other data sites should provide easy to access downloads in bulk of their materials.

Providing search interfaces to document collections is a public service, but document collections or access to them can change in ways not transparent to search users. Such as demonstrated by the CIA removing documents previously delivered to the Senate.

Petition Wikileaks, DC Leaks and other data sites for easy bulk downloads.

That will ensure the “evidence” will not shift under your feet and the availability of more sophisticated means of analysis than brute-force search.

Update: The changing from http:// to https:// by the site, trashed my save query and using http:// to re-perform the same search.

Using https:// returns the same 261 search results.

What your experience with other saved searches at

Inside the fight to reveal the CIA’s torture secrets [Support The Guardian]

Monday, September 12th, 2016

Inside the fight to reveal the CIA’s torture secrets by Spencer Ackerman.

Part one: Crossing the bridge

Part two: A constitutional crisis

Part three: The aftermath

Ackerman captures the drama of a failed attempt by the United States Senate to exercise oversight on the Central Intelligence Agency (CIA) in this series.

I say “failed attempt” because even if the full 6,200+ page report is ever released, the lead Senate investigator, Daniel Jones, obscured the identities of all the responsible CIA personnel and sources of information in the report.

Even if the full report is serialized in your local newspaper, the CIA contractors and staff guilty of multiple felonies, will be not one step closer to being brought to justice.

To that extent, the “full” report is itself a disservice to the American people, who elect their congressional leaders and expect them to oversee agencies such as the CIA.

From Ackerman’s account you will learn that the CIA can dictate to its overseers, the location and conditions under which it can view documents, decide which documents it is allowed to see and in cases of conflict, the CIA can spy on the Select Senate Committee on Intelligence.

Does that sound like effective oversight to you?

BTW, you will also learn that members of the “most transparent administration in history” aided and abetted the CIA in preventing an effective investigation into the CIA and its torture program. I use “aided and abetted” deliberately and in their legal sense.

I mention in my header that you should support The Guardian.

This story by Spencer Ackerman is one reason.

Another reason is that given the plethora of names and transfers recited in Ackerman’s story, we need The Guardian to cover future breaks in this story.

Despite the tales of superhuman security, nobody is that good.

I leave you with the thought that if more than one person knows a secret, then it it can be discovered.

Check Ackerman’s story for a starting list of those who know secrets about the CIA torture program.

Good hunting!

New Plea: Charges Don’t Reflect Who I Am Today

Wednesday, September 7th, 2016

Traditionally, pleas have been guilty, not guilty, not guilty by reason of insanity and nolo contendere (no contest).

Beth Cobert, acting director at the OPM, has added a fifth plea:

Charges Don’t Reflect Who I Am Today

Greg Masters captures the new plea in Congressional report faults OPM over breach preparedness and response:

While welcoming the committee’s acknowledgement of the OPM’s progress, Beth Cobert, acting director at the OPM, disagreed with the committee’s findings in a blog post published on the OPM site on Wednesday, responding that the report does “not fully reflect where this agency stands today.”
… (emphasis added)

Any claims about “…where this agency stands today…” are a distraction from the question of responsibility for a system wide failure of security.

If you know any criminal defense lawyers, suggest they quote Beth Cobert as setting a precedent for responding to allegations of prior misconduct with:

Charges Don’t Reflect Who I Am Today

Please forward links to news reports of successful use of that plea to my attention.

Congressional Research Service Fiscal 2015 – Full Report List

Saturday, August 6th, 2016

Congressional Research Service Fiscal 2015

The Director’s Message:

From international conflicts and humanitarian crises, to immigration, transportation, and secondary education, the Congressional Research Service (CRS) helped every congressional office and committee navigate the wide range of complex and controversial issues that confronted Congress in FY2015.

We kicked off the year strongly, preparing for the newly elected Members of the 114th Congress with the tenth biannual CRS Seminar for New Members, and wrapped up 2015 supporting the transition to a new Speaker and the crafting of the omnibus appropriations bill. In between, CRS experts answered over 62,000 individual requests; hosted over 7,400 Congressional participants at seminars, briefings and trainings; provided over 3,600 new or refreshed products; and summarized over 8,000 pieces of legislation.

While the CRS mission remains the same, Congress and the environment in which it works are continually evolving. To ensure that the Service is well positioned to anticipate and meet the information and research needs of a 21st-century Congress, we launched a comprehensive strategic planning effort that has identified the most critical priorities, goals, and objectives that will enable us to most efficiently and effectively serve Congress as CRS moves into its second century.

Responding to the increasingly rapid pace of congressional business, and taking advantage of new technologies, we continued to explore new and innovative ways to deliver authoritative information and timely analysis to Congress. For example, we introduced shorter report formats and added infographics to our website to better serve congressional needs.

It is an honor and privilege to work for the U.S. Congress. With great dedication, our staff creatively supports Members, staff and committees as they help shape and direct the legislative process and our nation’s future. Our accomplishments in fiscal 2015 reflect that dedication.

All true but also true that the funders of all those wonderful efforts, taxpayers, have spotty and/or erratic access to those research goodies.

Perhaps that will change in the not too distant future.

But until then, perhaps a list of all the new CRS products in 2015, which runs from page 47 to page 124 may be of interest.

Not all entries are unique as they may appear under different categories.

Sadly the only navigation you are offered is by chunky categories like “Health” and “Law and Justice.”

Hmmm, perhaps that can be fixed, at least to some degree.

Watch for more CRS news this coming week.

How-To Track Projects Like A Defense Contractor

Sunday, July 31st, 2016

Transparency Tip: How to Track Government Projects Like a Defense Contractor by Dave Maass.

From the post:

Over the last year, thousands of pages of sensitive documents outlining the government’s intelligence practices have landed on our desktops.

One set of documents describes the Director of National Intelligence’s goal of funding “dramatic improvements in unconstrained face recognition.” A presentation from the Navy uses examples from Star Trek to explain its electronic warfare program. Other records show the FBI was purchasing mobile phone extraction devices, malware and fiber network-tapping systems. A sign-in list shows the names and contact details of hundreds of cybersecurity contractors who turned up a Department of Homeland Security “Industry Day.” Yet another document, a heavily redacted contract, provides details of U.S. assistance with drone surveillance programs in Burundi, Kenya and Uganda.

But these aren’t top-secret records carefully leaked to journalists. They aren’t classified dossiers pasted haphazardly on the Internet by hacktivists. They weren’t even liberated through the Freedom of Information Act. No, these public documents are available to anyone who looks at the U.S. government’s contracting website, In this case “anyone,” is usually just contractors looking to sell goods, services, or research to the government. But, because the government often makes itself more accessible to businesses than the general public, it’s also a useful tool for watchdogs. Every government program costs money, and whenever money is involved, there’s a paper trail.

Searching is difficult enough that there are firms that offer search services to assist contractors with locating business opportunities.

Collating data with topic maps (read adding data) will be a value-add to watchdogs, potential contractors (including yourself), or watchers watching watchers.

Dave’s post will get you started on your way.

U.S. Climate Resilience Toolkit

Thursday, July 28th, 2016

Bringing climate information to your backyard: the U.S. Climate Resilience Toolkit by Tamara Dickinson and Kathryn Sullivan.

From the post:

Climate change is a global challenge that will requires local solutions. Today, a new version of the Climate Resilience Toolkit brings climate information to your backyard.

The Toolkit, called for in the President’s Climate Action Plan and developed by the National Oceanic and Atmospheric Administration (NOAA), in collaboration with a number of Federal agencies, was launched in 2014. After collecting feedback from a diversity of stakeholders, the team has updated the Toolkit to deliver more locally-relevant information and to better serve the needs of its users. Starting today, Toolkit users will find:

  • A redesigned user interface that is responsive to mobile devices;
  • County-scale climate projections through the new version of the Toolkit’s Climate Explorer;
  • A new “Reports” section that includes state and municipal climate-vulnerability assessments, adaptation plans, and scientific reports; and
  • A revised “Steps to Resilience” guide, which communicates steps to identifying and addressing climate-related vulnerabilities.

Thanks to the Toolkit’s Climate Explorer, citizens, communities, businesses, and policy leaders can now visualize both current and future climate risk on a single interface by layering up-to-date, county-level, climate-risk data with maps. The Climate Explorer allows coastal communities, for example, to overlay anticipated sea-level rise with bridges in their jurisdiction in order to identify vulnerabilities. Water managers can visualize which areas of the country are being impacted by flooding and drought. Tribal nations can see which of their lands will see the greatest mean daily temperature increases over the next 100 years.  

A number of decision makers, including the members of the State, Local, and Tribal Leaders Task Force, have called on the Federal Government to develop actionable information at local-to-regional scales.  The place-based, forward-looking information now available through the Climate Explorer helps to meet this demand.

The Climate Resilience Toolkit update builds upon the Administration’s efforts to boost access to data and information through resources such as the National Climate Assessment and the Climate Data Initiative. The updated Toolkit is a great example of the kind of actionable information that the Federal Government can provide to support community and business resilience efforts. We look forward to continuing to work with leaders from across the country to provide the tools, information, and support they need to build healthy and climate-ready communities.

Check out the new capabilities today at!

I have only started to explore this resource but thought I should pass it along.

Of particular interest to me is the integration of data/analysis from this resource with other data.


Accessing IRS 990 Filings (Old School)

Monday, July 25th, 2016

Like many others, I was glad to see: IRS 990 Filings on AWS.

From the webpage:

Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to present are available for anyone to use via Amazon S3.

Form 990 is the form used by the United States Internal Revenue Service to gather financial information about nonprofit organizations. Data for each 990 filing is provided in an XML file that contains structured information that represents the main 990 form, any filed forms and schedules, and other control information describing how the document was filed. Some non-disclosable information is not included in the files.

This data set includes Forms 990, 990-EZ and 990-PF which have been electronically filed with the IRS and is updated regularly in an XML format. The data can be used to perform research and analysis of organizations that have electronically filed Forms 990, 990-EZ and 990-PF. Forms 990-N (e-Postcard) are not available withing this data set. Forms 990-N can be viewed and downloaded from the IRS website.

I could use AWS but I’m more interested in deep analysis of a few returns than analysis of the entire dataset.

Fortunately the webpage continues:

An index listing all of the available filings is available at s3://irs-form-990/index.json. This file includes basic information about each filing including the name of the filer, the Employer Identificiation Number (EIN) of the filer, the date of the filing, and the path to download the filing.

All of the data is publicly accessible via the S3 bucket’s HTTPS endpoint at No authentication is required to download data over HTTPS. For example, the index file can be accessed at and the example filing mentioned above can be accessed at (emphasis in original).

I open a terminal window and type:


which as of today, results in:

-rw-rw-r-- 1 patrick patrick 1036711819 Jun 16 10:23 index.json

A trial grep:

grep "NATIONAL RIFLE" index.json > nra.txt

Which produces:

{“EIN”: “530116130”, “SubmittedOn”: “2014-11-25”, “TaxPeriod”: “201312”, “DLN”: “93493309004174”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201423099349300417”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2013-12-20”, “TaxPeriod”: “201212”, “DLN”: “93493260005203”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201302609349300520”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2012-12-06”, “TaxPeriod”: “201112”, “DLN”: “93493311011202”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201203119349301120”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “396056607”, “SubmittedOn”: “2011-05-12”, “TaxPeriod”: “201012”, “FormType”: “990EZ”, “LastUpdated”: “2016-06-14T01:22:09.915971Z”, “OrganizationName”: “EAU CLAIRE NATIONAL RIFLE CLUB”, “IsElectronic”: false, “IsAvailable”: false},
{“EIN”: “530116130”, “SubmittedOn”: “2011-11-09”, “TaxPeriod”: “201012”, “DLN”: “93493270005081”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201132709349300508”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2016-01-11”, “TaxPeriod”: “201412”, “DLN”: “93493259005035”, “LastUpdated”: “2016-04-29T13:40:20”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201532599349300503”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},

We have one errant result, the “EAU CLAIRE NATIONAL RIFLE CLUB,” so let’s delete that, re-order by year and the NATIONAL RIFLE ASSOCIATION OF AMERICA result reads (most recent to oldest):

{“EIN”: “530116130”, “SubmittedOn”: “2016-01-11”, “TaxPeriod”: “201412”, “DLN”: “93493259005035”, “LastUpdated”: “2016-04-29T13:40:20”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201532599349300503”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2014-11-25”, “TaxPeriod”: “201312”, “DLN”: “93493309004174”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201423099349300417”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2013-12-20”, “TaxPeriod”: “201212”, “DLN”: “93493260005203”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201302609349300520”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2012-12-06”, “TaxPeriod”: “201112”, “DLN”: “93493311011202”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201203119349301120”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2011-11-09”, “TaxPeriod”: “201012”, “DLN”: “93493270005081”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “”, “FormType”: “990”, “ObjectId”: “201132709349300508”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},

Of course, now you want the XML 990 returns, so extract the URLs for the 990s to a file, here nra-urls.txt (I would use awk if it is more than a handful):

Back to wget:

wget -i nra-urls.txt


-rw-rw-r– 1 patrick patrick 111798 Mar 21 16:12 201132709349300508_public.xml
-rw-rw-r– 1 patrick patrick 123490 Mar 21 19:47 201203119349301120_public.xml
-rw-rw-r– 1 patrick patrick 116786 Mar 21 22:12 201302609349300520_public.xml
-rw-rw-r– 1 patrick patrick 122071 Mar 21 15:20 201423099349300417_public.xml
-rw-rw-r– 1 patrick patrick 132081 Apr 29 10:10 201532599349300503_public.xml

Ooooh, it’s in XML! 😉

For the XML you are going to need: Current Valid XML Schemas and Business Rules for Exempt Organizations Modernized e-File, not to mention a means of querying the data (may I suggest XQuery?).

Once you have the index.json file, with grep, a little awk and wget, you can quickly explore IRS 990 filings for further analysis or to prepare queries for running on AWS (such as discovery of common directors, etc.).


What’s the “CFR” and Why Is It So Important to Me?

Wednesday, July 20th, 2016

What’s the “CFR” and Why Is It So Important to Me? Government Printing Office (GPO) blog, GovernmentBookTalk.

From the post:

If you’re a GPO Online Bookstore regular or public official you probably know we’re speaking about the “Code of Federal Regulations.” CFRs are produced routinely by all federal departments and agencies to inform the public and government officials of regulatory changes and updates for literally every subject that the federal government has jurisdiction to manage.

For the general public these constantly updated federal regulations can spell fantastic opportunity. Farmer, lawyer, construction owner, environmentalist, it makes no difference. Within the 50 codes are a wide variety of regulations that impact citizens from all walks of life. Federal Rules, Regulations, Processes, or Procedures on the surface can appear daunting, confusing, and even may seem to impede progress. In fact, the opposite is true. By codifying critical steps to anyone who operates within the framework of any of these sectors, the CFR focused on a particular issue can clarify what’s legal, how to move forward, and how to ultimately successfully translate one’s projects or ideas into reality.

Without CFR documentation the path could be strewn with uncertainty, unknown liabilities, and lost opportunities, especially regarding federal development programs, simply because an interested party wouldn’t know where or how to find what’s available within their area of interest.

The authors of CFRs are immersed in the technical and substantive issues associated within their areas of expertise. For a private sector employer or entrepreneur who becomes familiar with the content of CFRs relative to their field of work, it’s like having an expert staff on board.

I like the CFRs but I stumbled on:

For a private sector employer or entrepreneur who becomes familiar with the content of CFRs relative to their field of work, it’s like having an expert staff on board.

I don’t doubt the expertise of the CFR authors, but their writing often requires an expert for accurate interpretation. If you doubt that statement, test your reading skills on any section of CFR Title 26, Internal Revenue.

Try your favorite NLP parser out on any of the CFRs.

The post lists a number of ways to acquire the CFRs but personally I would use the free Electronic Code of Federal Regulations unless you need to impress clients with the paper version.


IRS E-File Bucket – Internet Archive

Saturday, June 18th, 2016

IRS E-File Bucket courtesy of Carl Malamud and Public.Resource.Org.

From the webpage:

This bucket contains a mirror of the IRS e-file release as of June 16, 2016. You may access the source files at The present bucket may or may not be updated in the future.

To access this bucket, use the download links.

Note that tarballs is image scans from 2002-2015 are also available in this IRS 990 Forms collection.

Many thanks to the Internal Revenue Service for making this information available. Here is their announcement on June 16, 2016. Here is a statement from Public.Resource.Org congratulating the IRS on a job well done.

As I noted in IRS 990 Filing Data (2001 to date):

990* disclosures aren’t detailed enough to pinch but when combined with other data, say leaked data, the results can be remarkable.

It’s up to you to see that public disclosures pinch.

IRS 990 Filing Data (2001 to date)

Thursday, June 16th, 2016

IRS 990 Filing Data Now Available as an AWS Public Data Set

From the post:

We are excited to announce that over one million electronic IRS 990 filings are available via Amazon Simple Storage Service (Amazon S3). Filings from 2011 to the present are currently available and the IRS will add new 990 filing data each month.

(image omitted)

Form 990 is the form used by the United States Internal Revenue Service (IRS) to gather financial information about nonprofit organizations. By making electronic 990 filing data available, the IRS has made it possible for anyone to programmatically access and analyze information about individual nonprofits or the entire nonprofit sector in the United States. This also makes it possible to analyze it in the cloud without having to download the data or store it themselves, which lowers the cost of product development and accelerates analysis.

Each electronic 990 filing is available as a unique XML file in the “irs-form-990” S3 bucket in the AWS US East (N. Virginia) region. Information on how the data is organized and what it contains is available on the IRS 990 Filings on AWS Public Data Set landing page.

Some of the forms and instructions that will help you make sense of the data reported:

990 – Form 990 Return of Organization Exempt from Income Tax, Annual Form 990 Requirements for Tax-Exempt Organizations

990-EZ – 2015 Form 990-EZ, Instructions for IRS 990 EZ – Internal Revenue Service

990-PF – 2015 Form 990-PF, 2015 Instructions for Form 990-PF

As always, use caution with law related data as words may have unusual nuances and/or unexpected meanings.

These forms and instructions are only a tiny part of a vast iceberg of laws, regulations, rulings, court decisions and the like.

990* disclosures aren’t detailed enough to pinch but when combined with other data, say leaked data, the results can be remarkable.

Breaking Californication (An Act Performed On The Public)

Monday, June 6th, 2016

Law Enforcement Lobby Succeeds In Killing California Transparency Bill by Kit O’Connell.

From the post:

A California Senate committee killed a bill to increase transparency in police misconduct investigations, hampering victims’ efforts to obtain justice.

Chauncee Smith, legislative advocate at the ACLU of California, told MintPress News that the state Legislature “caved to the tremendous influence and power of the law enforcement lobby” and “failed to listen to the demands and concerns of everyday Californian people.”

California has some of the most secretive rules in the country when it comes to investigations into police misconduct and excessive use of force. Records are kept sealed, regardless of the outcome, as the ACLU of Northern California explains on its website:

“In places like Texas, Kentucky, and Utah, peace officer records are made public when an officer is found guilty of misconduct. Other states make records public regardless of whether misconduct is found. This is not the case in California.”

“Right now, there is a tremendous cloud of secrecy that is unparalleled compared to many other states,” Smith added. “California is in the minority in which the public do not know basic information when someone is killed or potentially harmed by those are sworn to serve and protect them.”

In February, Sen. Mark Leno, a Democrat from San Francisco, introduced SB 1286, the “Enhance Community Oversight on Police Misconduct and Serious Uses of Force” bill. It would have allowed “public access to investigations, findings and discipline information on serious uses of force by police” and would have increased transparency in other cases of police misconduct, according to an ACLU fact sheet. Polling data cited by the ACLU suggests about 80 percent of Californians would support the measure.

But the bill’s progress through the legislature ended on May 27, when it failed to pass out of the Senate Appropriations committee.

“Today is a sad day for transparency, accountability, and justice in California,” said Peter Bibring, police practices director for the ACLU of California, in a May 27 press release.

Mistrust between police officers and citizens makes the job of police officers more difficult and dangerous, while denying citizens the full advantages of a trained police force, paid for by their tax dollars.

The state legislature, finding sowing and fueling mistrust between police officers and citizens has election upsides for them, fans those flames with secrecy over police misconduct investigations.

Open, not secret (read grand jury) proceedings where witnesses can be fairly examined (unlike the deliberately thrown Michael Brown investigation), can go a long way to re-establishing trust between the police and the public.

Members of the community know when someone was a danger to police officers and others. Whether their family members will admit it or not. Likewise, police officers know which officers are far to quick to escalate to deadly force. Want better community policing? What better citizen cooperation? That’s not going to happen with completely secret police misconduct investigations.

So the State of California is going to collect the evidence, statements, etc., in police misconduct investigations, but won’t share that information with the public. At least not willingly.

Official attempts to break illegitimate government secrecy failed. Even if it had succeeded you’d be paying least $0.25 per page plus a service fee.

Two observations about government networks:

  • Secret (and otherwise) government documents are usually printed on networked printers.
  • Passively capturing Ethernet traffic (network tap) captures printer traffic too.

Whistle blowers don’t have to hack heavily monitored systems, steal logins/passwords, leaking illegally withheld documents is within the reach of anyone who can plug in an Ethernet cable.

There’s a bit more to it than that, but remember all those network cables running through the ceiling, walls, closets, the next time your security consultant, assures you of your network’s security.

As a practical matter, if you start leaking party menus and football pools, someone will start looking for a network tap.

Leak when it makes a significant difference to public discussion and/or legal proceedings. Even then, look for ways to attribute the leak to factions within the government.

Remember the DoD’s amused reaction to State’s huffing and puffing over the Afghan diplomatic cables? That sort of rivalry exists at every level of government. You should use it to your advantage.

The State of California would have you believe that government information sharing is at its sufferance.

I beg to differ.

So should you.

11 Million Pages of CIA Files [+ Allen Dulles, war criminal]

Thursday, March 3rd, 2016

11 Million Pages of CIA Files May Soon Be Shared By This Kickstarter by Joseph Cox.

From the post:

Millions of pages of CIA documents are stored in Room 3000. The CIA Records Search Tool (CREST), the agency’s database of declassified intelligence files, is only accessible via four computers in the National Archives Building in College Park, MD, and contains everything from Cold War intelligence, research and development files, to images.

Now one activist is aiming to get those documents more readily available to anyone who is interested in them, by methodically printing, scanning, and then archiving them on the internet.

“It boils down to freeing information and getting as much of it as possible into the hands of the public, not to mention journalists, researchers and historians,” Michael Best, analyst and freedom of information activist told Motherboard in an online chat.

Best is trying to raise $10,000 on Kickstarter in order to purchase the high speed scanner necessary for such a project, a laptop, office supplies, and to cover some other costs. If he raises more than the main goal, he might be able to take on the archiving task full-time, as well as pay for FOIAs to remove redactions from some of the files in the database. As a reward, backers will help to choose what gets archived first, according to the Kickstarter page.

“Once those “priority” documents are done, I’ll start going through the digital folders more linearly and upload files by section,” Best said. The files will be hosted on the Internet Archive, which converts documents into other formats too, such as for Kindle devices, and sometimes text-to-speech for e-books. The whole thing has echoes of Cryptome—the freedom of information duo John Young and Deborah Natsios, who started off scanning documents for the infamous cypherpunk mailing list in the 1990s.

Good news! Kickstarter has announced this project funded!

Additional funding will help make this archive of documents available sooner rather than later.

As opposed to an attempt to boil the ocean of 11 million pages of CIA files, what about smaller topic mapping/indexing projects that focus on bounded sub-sets of documents of interest to particular communities?

I don’t have any interest in the STAR GATE project (clairvoyance, precognition, or telepathy, continued now by the DHS at airport screening facilities) but would be very interested in the records of Allen Dulles, a war criminal of some renown.

Just so you know, Michael has already uploaded documents on Allen Dulles from the CIA Records Search Tool (CREST) tool:

History of Allen Welsh Dulles as CIA Director – Volume I: The Man

History of Allen Welsh Dulles as CIA Director – Volume II: Coordination of Intelligence

History of Allen Welsh Dulles as CIA Director – Volume III: Covert Activities

History of Allen Welsh Dulles as CIA Director – Volume IV: Congressional Oversight and Internal Administration

History of Allen Welsh Dulles as CIA Director – Volume V: Intelligence Support of Policy

To describe Allen Dulles as a war criminal is no hyperbole. Among his other crimes, overthrow of President Jacobo Arbenz Guzman of Guatemala (think United Fruit Company), removal of Mohammad Mossadeq, prime minister of Iran (think Shah of Iran), are only two of his crimes, the full extent of which will probably never be known.

Files are being uploaded to That 1 Archive.

Scripting FOIA Requests

Thursday, March 3rd, 2016

An Activist Wrote a Script to FOIA the Files of 7,000 Dead FBI Officials by Joseph Cox.

From the post:

One of the best times to file a Freedom of Information request with the FBI is when someone dies; after that, any files that the agency holds on them can be requested. Asking for FBI files on the deceased is therefore pretty popular, with documents released on Steve Jobs, Malcolm X and even the Insane Clown Posse.

One activist is turning this back onto the FBI itself, by requesting files on nearly 7,000 dead FBI employees en masse, and releasing a script that allows anyone else to do the same.

“At the very least, it’ll be like having an extensive ‘Who’s Who in the FBI’ to consult, without worrying that anyone in there is still alive and might face retaliation for being in law enforcement,” Michael Best told Motherboard in an online chat. “For some folks, they’ll probably show allegations of wrongdoing while others probably highlight some of the FBI’s best and brightest.”

On Monday, Best will file FOIAs for FBI records and files relating to 6,912 employees named in the FBI’s own “Dead List,” a list of people that the FBI understands to be deceased. A recent copy of the list, which includes special agents and section chiefs, was FOIA’d by MuckRock editor JPat Brown in January.

Points to remember:

  • Best’s script works for any FOIA office that accepts email requests (not just the FBI, be creative)
  • Get 3 or more people to file the same FOIA requests
  • Publicize your spreadsheet of FOIA targets

Don’t forget the need to scan, OCR and index (topic map) the results of your FOIA requests.

Information that cannot be found may as well still be concealed by the FBI (and others).

Earthdata Search – Smells Like A Topic Map?*

Sunday, February 28th, 2016

Earthdata Search

From the webpage:

Search NASA Earth Science data by keyword and filter by time or space.

After choosing tour:

Keyword Search

Here you can enter search terms to find relevant data. Search terms can be science terms, instrument names, or even collection IDs. Let’s start by searching for Snow Cover NRT to find near real-time snow cover data. Type Snow Cover NRT in the keywords box and press Enter.

Which returns a screen in three sections, left to right: Browse Collections, 21 Matching Collections (Add collections to your project to compare and retrieve their data), and the third section displays a world map (navigate by grabbing the view)

Under Browse Collections:

In addition to searching for keywords, you can narrow your search through this list of terms. Click Platform to expand the list of platforms (still in a tour box)

Next step:

Now click Terra to select the Terra satellite.

Comment: Wondering how I will know which “platform” or “instrument” to select? There may be more/better documentation but I haven’t seen it yet.

The data follows the Unified Metadata Model (UMM):

NASA’s Common Metadata Repository (CMR) is a high-performance, high-quality repository for earth science metadata records that is designed to handle metadata at the Concept level. Collections and Granules are common metadata concepts in the Earth Observation (EO) world, but this can be extended out to Visualizations, Parameters, Documentation, Services, and more. The CMR metadata records are supplied by a diverse array of data providers, using a variety of supported metadata standards, including:


Initially, designers of the CMR considered standardizing all CMR metadata to a single, interoperable metadata format – ISO 19115. However, NASA decided to continue supporting multiple metadata standards in the CMR — in response to concerns expressed by the data provider community over the expense involved in converting existing metadata systems to systems capable of generating ISO 19115. In order to continue supporting multiple metadata standards, NASA designed a method to easily translate from one supported standard to another and constructed a model to support the process. Thus, the Unified Metadata Model (UMM) for EOSDIS metadata was born as part of the EOSDIS Metadata Architecture Studies (MAS I and II) conducted between 2012 and 2013.

What is the UMM?

The UMM is an extensible metadata model which provides a ‘Rosetta stone’ or cross-walk for mapping between CMR-supported metadata standards. Rather than create mappings from each CMR-supported metadata standard to each other, each standard is mapped centrally to the UMM model, thus reducing the number of translations required from n x (n-1) to 2n.

Here the mapping graphic:


Granting profiles don’t make the basis for mappings explicit, but the mappings have the same impact post mapping as a topic map would post merging.

The site could use better documentation for the interface and data, at least in the view of this non-expert in the area.

Thoughts on documentation for the interface or making the mapping more robust via use of a topic map?

I first saw this in a tweet by Kirk Borne.

*Smells Like A Topic Map – Sorry, culture bound reference to a routine on the first Cheech & Chong album. No explanation would do it justice.

No Patriotic Senators, Senate Staffers, Agency Heads – CIA Torture Report

Saturday, February 20th, 2016

I highly recommend your reading The CIA torture report belongs to the public by Lauren Harper.

I have quoted from Lauren’s introduction below as an inducement to read the article in full, but she fails to explore why a patriotic senator, staffer or agency head has not already leaked the CIA Torture Report?

It has already been printed, bound, etc., and who knows how many people were involved in every step of that process.

Do you seriously believe that report has gone unread except for its authors?

So far as I know, member of Congress, that “other” branch of the government, are entitled to make their own decisions about the handling of their reports.

What America needs now is a Senator or even a Senate staffer with more loyalty to the USA than to the bed wetters and torturers (same group) in the DoJ.

If you remember a part of the constitution that grants the DoJ the role of censor for the American public, please point it out in comments below.

From the post:

The American public’s ability to read the Senate Intelligence Committee’s full, scathing report on the Central Intelligence Agency’s torture program is in danger because David Ferriero, the archivist of the United States, will not call the report what it is, a federal record. He is refusing to use his clear statutory authority to label the report a federal record, which would be subject to Freedom of Information Act (FOIA) disclosure requirements, because the Justice Department has told the National Archives and Records Administration (NARA) not to. The DOJ has a long history of breaking the law to avoid releasing information in response to FOIA requests. The NARA does not have such a legacy and should not allow itself to be bullied by the DOJ.

The DOJ instructed the NARA not to make any determination on the torture report’s status as a federal record, ostensibly because it would jeopardize the government’s position in a FOIA lawsuit seeking the report’s release. The DOJ, however, has no right to tell the NARA not to weigh in on the record’s status, and the Presidential and Federal Records Act Amendments of 2014 gives the archivist of the United States the binding legal authority to make precisely that determination.

Democratic Sens. Patrick Leahy of Vermont and Dianne Feinstein of California revealed the DOJ’s insistence that the archivist of the United States not faithfully fulfill his duty in a Nov. 5, 2015, letter to Attorney General Loretta Lynch. They protested the DOJ’s refusal to allow its officials as well as those of the Defense Department, the CIA and the State Department to read the report. Leahy and Feinstein’s letter notes that “personnel at the National Archives and Records Administration have stated that, based on guidance from the Department of Justice, they will not respond to questions about whether the study constitutes a federal record under the Federal Records Act because the FOIA case is pending.” Rather than try to win the FOIA case on a technicality and step on the NARA’s statutory toes, the DOJ should allow the FOIA review process to determine on the case’s merits whether the document may be released.

Not even officials with security clearances may read the document while its status as a congressional or federal record is debated. The New York Times reported in November 2015 that in December of the previous year, a Senate staffer delivered envelopes containing the 6,700-page top secret report to the DOJ, the State Department, the Federal Bureau of Investigation and the Pentagon. Yet a year later, none of the envelopes had been opened, and none of the country’s top officials had read the report’s complete findings. This is because the DOJ, the Times wrote, “prohibited officials from the government agencies that possess it from even opening the report, effectively keeping the people in charge of America’s counterterrorism future from reading about its past.” The DOJ contends that if any agency officials read the report, it could alter the outcome of the FOIA lawsuit.

American war criminals who are identified or who can be discovered because of the CIA Torture Report should be prosecuted to the full extent of national and international law.

Anyone who has participated in attempts to conceal those events or to prevent disclosure of the CIA Torture Report, should be tried as accomplices after the fact to those war crimes.

The facility at Guantanamo Bay can be converted into a holding facility for DoJ staffers who tried to conceal war crimes. Poetic justice I would say.

For What It’s Worth: CIA Releases Declassified Documents to National Archives

Wednesday, February 17th, 2016

CIA Releases Declassified Documents to National Archives

From the webpage:

Today, CIA released about 750,000 pages of declassified intelligence papers, records, research files and other content which are now accessible through CIA’s Records Search Tool (CREST) at the National Archives in College Park, MD. This release will include nearly 100,000 pages of analytic intelligence publication files, and about 20,000 pages of research and development files from CIA’s Directorate of Science and Technology, among others.

The newly available documents are being released in partnership with the National Geospatial Intelligence Agency (NGA) and are available by accessing CREST at the National Archives. This release continues CIA’s efforts to systematically review and release documents under Executive Order 13526. With this release, the CIA collection of records on the CREST system increases to nearly 13 million declassified pages.

That was posted on 16 February 2016.

Disclaimer: No warranty express or implied is made with regards to the accuracy of the notice quoted above or as to the accuracy of anything you may or may not find in the released documents, if they in fact exist.

I merely report that the quoted material was posted to the CIA website at the location and on the date recited.

Sunlight launches Hall of Justice… [ Topic Map “like” features?]

Tuesday, February 2nd, 2016

Sunlight launches Hall of Justice, a massive data inventory on criminal justice across the U.S. by Josh Stewart.

From the post:

Today, Sunlight is launching Hall of Justice, a robust, searchable data inventory of nearly 10,000 datasets and research documents from across all 50 states, the District of Columbia and the federal government. Hall of Justice is the culmination of 18 months of work gathering data and refining technology.

The process was no easy task: Building Hall of Justice required manual entry of publicly available data sources from a multitude of locations across the country.

Sunlight’s team went from state to state, meeting and calling local officials to inquire about and find data related to criminal justice. Some states like California have created a data portal dedicated to making criminal justice data easily accessible to the public; others had their data buried within hard to find websites. We also found data collected by state departments of justice, police forces, court systems, universities and everything in between.

“Data is shaping the future of how we address some of our most pressing problems,” said John Wonderlich, executive director of the Sunlight Foundation. “This new resource is an experiment in how a robust snapshot of data can inform policy and research decisions.”

In addition to being a great data collection, the Hall of Justice attempts to deliver topic map like capability for searches:

The resource attempts to consolidate different terminology across multiple states, which is far from uniform or standardized. For example, if you search solitary confinement you will return results for data around solitary confinement, but also for the terms “segregated housing unit,” “SHU,” “administrative segregation” and “restrictive housing.” This smart search functionality makes finding datasets much easier and accessible.


Looking at all thirteen results for a search on “solitary confinement,” I don’t see the mapping in question. Or certainly no mapping based on characteristics of the subject, “solitary confinement.”

As close as Georgia’s 2013 Juvenile Justice Reform is using the word “restrictive” as in:

Create a two-class system within the Designated Felony Act. Designated felony offenses are divided into two classes, based on severity—Class A and Class B—that continue to allow restrictive custody while also adjusting available sanctions to account for both offense severity and risk level.

Restrictive custody is what jail systems are about so that doesn’t trip the wire for “solitary confinement.”

Of course, the links are to entire reports/documents/data sets so each researcher will have to extract and collate content individually. When that happens, a means to contribute that collation/mapping to the Hall of Justice would be a boon for other researchers. (Can you say “topic map?”)

As I write this, you will need to prefer Mozilla over Chrome, at least on Ubuntu.

Trigger Warning: If you are sensitive to traumatic events and/or reports of traumatic events, you may want to ask someone less sensitive to review these data sources.

The only difference between a concentration camp and American prisons is the lack of mass gas chambers. Every horror and abuse that you can imagine and some you probably can’t, are visited on people in U.S. prisons everyday.

As Joan Baez says in Prison Triology:

Sunlight’s Hall of Justice is a great step forward in documenting the chambers of horror we call American prisons.

And we’re gonna raze, raze the prisons

To the ground

Help us raze, raze the prisons

To the ground

Are you ready?

Voter Record Privacy? WTF?

Monday, December 28th, 2015

Leaky database tramples privacy of 191 million American voters by Dell Cameron.

From the post:

The voter information of more than 191 million Americans—including full names, dates of birth, home addresses, and more—was exposed online for anyone who knew the right IP address.

The misconfigured database, which was reportedly shut down at around 7pm ET Monday night, was discovered by security researcher Chris Vickery. Less than two weeks ago, Vickery also exposed a flaw in MacKeeper’s database, similarly exposing 13 million customer records.

What amazes me about this “leak” is the outrage is focused on the 191+ million records being online.


What about the six or seven organizations who denied being the owners of the IP address in question?

I take it none of them denied having possession of the same or essentially the same data, just that they didn’t “leak” it.

Quick question: Was voter privacy breached when these six or seven organizations got the same data or when it went online?

I would say when the Gang of Six or Seven got the same data.

You don’t have any meaningful voter privacy, aside from your actual ballot, and with your credit record (also for sale), you voting behavior can be nailed too.

You don’t have privacy but the Gang of Six or Seven do.

Attempting to protect lost privacy is pointless.

Making corporate overlords lose their privacy as well has promise.

PS: Torrents of corporate overlord data? Much more interesting than voter data. Enhancements: Quick Search, Congressional Record Index, and More

Monday, December 14th, 2015

New End of Year Enhancements: Quick Search, Congressional Record Index, and More by Andrew Weber.

From the post:

In our quest to retire THOMAS, we have made many enhancements to this year.  Our first big announcement was the addition of email alerts, which notify users of the status of legislation, new issues of the Congressional Record, and when Members of Congress sponsor and cosponsor legislation.  That development was soon followed by the addition of treaty documents and better default bill text in early spring; improved search, browse, and accessibility in late spring; user driven feedback in the summer; and Senate Executive Communications and a series of Two-Minute Tip videos in the fall.

Today’s update on end of year enhancements includes a new Quick Search for legislation, the Congressional Record Index (back to 1995), and the History of Bills from the Congressional Record Index (available from the Actions tab).  We have also brought over the State Legislature Websites page from THOMAS, which has links to state level websites similar to

Text of legislation from the 101st and 102nd Congresses (1989-1992) has been migrated to The Legislative Process infographic that has been available from the homepage as a JPG and PDF is now available in Spanish as a JPG and PDF (translated by Francisco Macías). Margaret and Robert added Fiscal Year 2003 and 2004 to the Appropriations Table. There is also a new About page on the site for XML Bulk Data.

The Quick Search provides a form-based search with fields similar to those available from the Advanced Legislation Search on THOMAS.  The Advanced Search on is still there with many additional fields and ways to search for those who want to delve deeper into the data.  We are providing the new Quick Search interface based on user feedback, which highlights selected fields most likely needed for a search.

There’s an impressive summary of changes!

Speaking of practicing programming, are you planning on practicing XQuery on congressional data in the coming year?

Why the Open Government Partnership Needs a Reboot [Governments Too]

Saturday, December 12th, 2015

Why the Open Government Partnership Needs a Reboot by Steve Adler.

From the post:

The Open Government Partnership was created in 2011 as an international forum for nations committed to implementing Open Government programs for the advancement of their societies. The idea of open government started in the 1980s after CSPAN was launched to broadcast U.S. Congressional proceedings and hearings to the American public on TV. While the galleries above the House of Representatives and Senate had been “open” to the “public” (if you got permission from your representative to attend) for decades, never before had all public democratic deliberations been broadcast on TV for the entire nation to behold at any time they wished to tune in.

I am a big fan of OGP and feel that the ideals and ambition of this partnership are noble and essential to the survival of democracy in this millennium. But OGP is a startup, and every startup business or program faces a chasm it must cross from early adopters and innovators to early majority market implementation and OGP is very much at this crossroads today. It has expanded membership at a furious pace the past three years and it’s clear to me that expansion is now far more important to OGP than the delivery of the benefits of open government to the hundreds of millions of citizens who need transparent transformation.

OGP needs a reboot.

The structure of a system produces its own behavior. OGP needs a new organizational structure with new methods for evaluating national commitments. But that reboot needs to happen within its current mission. We should see clearly that the current structure is straining due to the rapid expansion of membership. There aren’t enough support unit resources to manage the expansion. We have to rethink how we manage national commitments and how we evaluate what it means to be an open government. It’s just not right that countries can celebrate baby steps at OGP events while at the same time passing odious legislation, sidestepping OGP accomplishments, buckling to corruption, and cracking down on journalists.

Unlike Steve I didn’t and don’t have a lot of faith in governments being voluntarily transparent.

As I pointed out in Congress: More XQuery Fodder, sometime in 2016, full bill status data will be available for all legislation before the United States Congress.

A lot more data than is easy to access now but it is more smoke than fire.

With legislation status data, you can track the civics lesson progression of a bill through Congress, but that leaves you at least 3 to 4 degrees short of knowing who was behind the legislation.

Just a short list of what more would be useful:

  • Visitor/caller list for everyone who spoke to a member of Congress and their staff. With date and subject of the call.
  • All visitors and calls tied to particular legislation and/or classes of legislation
  • All fund raising calls made by members of Congress and/or their staffs, date, results, substance of call.
  • Representative conversations with reconciliation committee members or their staffers about legislation and requested “corrections.”
  • All conversations between a representative or member of their staff and agency staff, identifying all parties and the substance of the conversation
  • Notes, proposals, discussion notes for all agencies decisions

Current transparency proposals are sufficient to confuse the public with mounds of nearly useless data. None of it reflects the real decision making processes of government.

Before someone shouts “privacy,” I would point out that no citizen has a right to privacy when their request is for a government representative to favor them over other citizens of the same government.

Real government transparency will require breaking the mini-star chamber proceedings from the lowest to the highest levels of government.

What we need is a rebooting of governments.

Congress: More XQuery Fodder

Tuesday, December 8th, 2015

Congress Poised for Leap to Open Up Legislative Data by Daniel Schuman.

From the post:

Following bills in Congress requires three major pieces of information: the text of the bill, a summary of what the bill is about, and the status information associated with the bill. For the last few years, Congress has been publishing the text and summaries for all legislation moving in Congress, but has not published bill status information. This key information is necessary to identify the bill author, where the bill is in the legislative process, who introduced the legislation, and so on.

While it has been in the works for a while, this week Congress confirmed it will make “Bill Statuses in XML format available through the GPO’s Federal Digital System (FDsys) Bulk Data repository starting with the 113th Congress,” (i.e. January 2013). In “early 2016,” bill status information will be published online in bulk– here. This should mean that people who wish to use the legislative information published on and THOMAS will no longer need to scrape those websites for current legislative information, but instead should be able to access it automatically.

Congress isn’t just going to pull the plug without notice, however. Through the good offices of the Bulk Data Task Force, Congress will hold a public meeting with power users of legislative information to review how this will work. Eight sample bill status XML files and draft XML User Guides were published on GPO’s GitHub page this past Monday. Based on past positive experiences with the Task Force, the meeting is a tremendous opportunity for public feedback to make sure the XML files serve their intended purposes. It will take place next Tuesday, Dec. 15, from 1-2:30. RSVP details below.

If all goes as planned, this milestone has great significance.

  • It marks the publication of essential legislative information in a format that supports unlimited public reuse, analysis, and republication. It will be possible to see much of a bill’s life cycle.
  • It illustrates the positive relationship that has grown between Congress and the public on access to legislative information, where there is growing open dialog and conversation about how to best meet our collective needs.
  • It is an example of how different components within the legislative branch are engaging with one another on a range of data-related issues, sometimes for the first time ever, under the aegis of the Bulk Data Task Force.
  • It means the Library of Congress and GPO will no longer be tied to the antiquated THOMAS website and can focus on more rapid technological advancement.
  • It shows how a diverse community of outside organizations and interests came together and built a community to work with Congress for the common good.

To be sure, this is not the end of the story. There is much that Congress needs to do to address its antiquated technological infrastructure. But considering where things were a decade ago, the bulk publication of information about legislation is a real achievement, the culmination of a process that overcame high political barriers and significant inertia to support better public engagement with democracy and smarter congressional processes.

Much credit is due in particular to leadership in both parties in the House who have partnered together to push for public access to legislative information, as well as the staff who worked tireless to make it happen.

If you look at the sample XML files, pay close attention to the <bioguideID> element and its contents. Is is the same value as you will find for roll-call votes, but there the value appears in the name-id attribute of the <legislator> element. See: and do view source.

Oddly, the <bioguideID> element does not appear in the documentation on GitHub, you just have to know the correspondence to the name-id attribute of the <legislator> element

As I said in the title, this is going to be XQuery fodder.