Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 19, 2016

Congress.gov Corrects Clinton-Impeachment Search Results

Filed under: Government,Government Data,Searching — Patrick Durusau @ 8:14 am

After posting Congress.gov Search Alert: “…previous total of 261 to the new total of 0.” [Solved] yesterday, pointing out that a change from http:// to https:// altered a search result for Clinton w/in 5 words impeachment, I got an email this morning:

congress-gov-correction-460

I appreciate the update and correction for saved searches, but my point about remote data changing without notice to you remains valid.

I’m still waiting for word on bulk downloads from both Wikileaks and DC Leaks.

Why leak information vital to public discussion and then limit access to search?

September 18, 2016

Congress.gov Search Alert: “…previous total of 261 to the new total of 0.” [Solved]

Filed under: Government,Government Data,Searching — Patrick Durusau @ 11:03 am

Odd message from the Congress.org search alert this AM:

congress-alert-460

Here’s the search I created back in June, 2016:

congress-alert-search-460

My probably inaccurate recall at the moment was I was searching for some quote from the impeachment of Bill Clinton and was too lazy to specify a term of congress, hence:

all congresses – searching for Clinton within five words, impeachment

Fairly trivial search that produced 261 “hits.”

I set the search alert more to explore the search options than any expectation of different future results.

Imagine my surprise to find that all congresses – searching for Clinton within five words, impeachment performed today, results in 0 “hits.”

Suspecting some internal changes to the search interface, I re-entered the search today and got 0 “hits.”

Other saved searches with radically different search results as of today?

This is not, repeat not, the result of some elaborate conspiracy to assist Secretary Clinton in her bid for the presidency.

I do think something fundamental has gone wrong with searching at Congress.gov and it needs to be fixed.

This is an illustration of why Wikileaks, DC Leaks and other data sites should provide easy to access downloads in bulk of their materials.

Providing search interfaces to document collections is a public service, but document collections or access to them can change in ways not transparent to search users. Such as demonstrated by the CIA removing documents previously delivered to the Senate.

Petition Wikileaks, DC Leaks and other data sites for easy bulk downloads.

That will ensure the “evidence” will not shift under your feet and the availability of more sophisticated means of analysis than brute-force search.


Update: The changing from http:// to https:// by the congress.gov site, trashed my save query and using http:// to re-perform the same search.

Using https:// returns the same 261 search results.

What your experience with other saved searches at congress.gov?

September 12, 2016

Inside the fight to reveal the CIA’s torture secrets [Support The Guardian]

Filed under: Government,Government Data,Journalism,News,Politics,Reporting,Transparency — Patrick Durusau @ 3:19 pm

Inside the fight to reveal the CIA’s torture secrets by Spencer Ackerman.

Part one: Crossing the bridge

Part two: A constitutional crisis

Part three: The aftermath

Ackerman captures the drama of a failed attempt by the United States Senate to exercise oversight on the Central Intelligence Agency (CIA) in this series.

I say “failed attempt” because even if the full 6,200+ page report is ever released, the lead Senate investigator, Daniel Jones, obscured the identities of all the responsible CIA personnel and sources of information in the report.

Even if the full report is serialized in your local newspaper, the CIA contractors and staff guilty of multiple felonies, will be not one step closer to being brought to justice.

To that extent, the “full” report is itself a disservice to the American people, who elect their congressional leaders and expect them to oversee agencies such as the CIA.

From Ackerman’s account you will learn that the CIA can dictate to its overseers, the location and conditions under which it can view documents, decide which documents it is allowed to see and in cases of conflict, the CIA can spy on the Select Senate Committee on Intelligence.

Does that sound like effective oversight to you?

BTW, you will also learn that members of the “most transparent administration in history” aided and abetted the CIA in preventing an effective investigation into the CIA and its torture program. I use “aided and abetted” deliberately and in their legal sense.

I mention in my header that you should support The Guardian.

This story by Spencer Ackerman is one reason.

Another reason is that given the plethora of names and transfers recited in Ackerman’s story, we need The Guardian to cover future breaks in this story.

Despite the tales of superhuman security, nobody is that good.

I leave you with the thought that if more than one person knows a secret, then it it can be discovered.

Check Ackerman’s story for a starting list of those who know secrets about the CIA torture program.

Good hunting!

September 7, 2016

New Plea: Charges Don’t Reflect Who I Am Today

Filed under: Cybersecurity,Government,Government Data,Security — Patrick Durusau @ 3:20 pm

Traditionally, pleas have been guilty, not guilty, not guilty by reason of insanity and nolo contendere (no contest).

Beth Cobert, acting director at the OPM, has added a fifth plea:

Charges Don’t Reflect Who I Am Today

Greg Masters captures the new plea in Congressional report faults OPM over breach preparedness and response:


While welcoming the committee’s acknowledgement of the OPM’s progress, Beth Cobert, acting director at the OPM, disagreed with the committee’s findings in a blog post published on the OPM site on Wednesday, responding that the report does “not fully reflect where this agency stands today.”
… (emphasis added)

Any claims about “…where this agency stands today…” are a distraction from the question of responsibility for a system wide failure of security.

If you know any criminal defense lawyers, suggest they quote Beth Cobert as setting a precedent for responding to allegations of prior misconduct with:

Charges Don’t Reflect Who I Am Today

Please forward links to news reports of successful use of that plea to my attention.

August 6, 2016

Congressional Research Service Fiscal 2015 – Full Report List

Filed under: CRS,Government,Government Data,Open Access — Patrick Durusau @ 4:25 pm

Congressional Research Service Fiscal 2015

The Director’s Message:

From international conflicts and humanitarian crises, to immigration, transportation, and secondary education, the Congressional Research Service (CRS) helped every congressional office and committee navigate the wide range of complex and controversial issues that confronted Congress in FY2015.

We kicked off the year strongly, preparing for the newly elected Members of the 114th Congress with the tenth biannual CRS Seminar for New Members, and wrapped up 2015 supporting the transition to a new Speaker and the crafting of the omnibus appropriations bill. In between, CRS experts answered over 62,000 individual requests; hosted over 7,400 Congressional participants at seminars, briefings and trainings; provided over 3,600 new or refreshed products; and summarized over 8,000 pieces of legislation.

While the CRS mission remains the same, Congress and the environment in which it works are continually evolving. To ensure that the Service is well positioned to anticipate and meet the information and research needs of a 21st-century Congress, we launched a comprehensive strategic planning effort that has identified the most critical priorities, goals, and objectives that will enable us to most efficiently and effectively serve Congress as CRS moves into its second century.

Responding to the increasingly rapid pace of congressional business, and taking advantage of new technologies, we continued to explore new and innovative ways to deliver authoritative information and timely analysis to Congress. For example, we introduced shorter report formats and added infographics to our website CRS.gov to better serve congressional needs.

It is an honor and privilege to work for the U.S. Congress. With great dedication, our staff creatively supports Members, staff and committees as they help shape and direct the legislative process and our nation’s future. Our accomplishments in fiscal 2015 reflect that dedication.

All true but also true that the funders of all those wonderful efforts, taxpayers, have spotty and/or erratic access to those research goodies.

Perhaps that will change in the not too distant future.

But until then, perhaps a list of all the new CRS products in 2015, which runs from page 47 to page 124 may be of interest.

Not all entries are unique as they may appear under different categories.

Sadly the only navigation you are offered is by chunky categories like “Health” and “Law and Justice.”

Hmmm, perhaps that can be fixed, at least to some degree.

Watch for more CRS news this coming week.

July 31, 2016

How-To Track Projects Like A Defense Contractor

Filed under: Funding,Government,Government Data,Open Source Intelligence — Patrick Durusau @ 7:17 pm

Transparency Tip: How to Track Government Projects Like a Defense Contractor by Dave Maass.

From the post:

Over the last year, thousands of pages of sensitive documents outlining the government’s intelligence practices have landed on our desktops.

One set of documents describes the Director of National Intelligence’s goal of funding “dramatic improvements in unconstrained face recognition.” A presentation from the Navy uses examples from Star Trek to explain its electronic warfare program. Other records show the FBI was purchasing mobile phone extraction devices, malware and fiber network-tapping systems. A sign-in list shows the names and contact details of hundreds of cybersecurity contractors who turned up a Department of Homeland Security “Industry Day.” Yet another document, a heavily redacted contract, provides details of U.S. assistance with drone surveillance programs in Burundi, Kenya and Uganda.

But these aren’t top-secret records carefully leaked to journalists. They aren’t classified dossiers pasted haphazardly on the Internet by hacktivists. They weren’t even liberated through the Freedom of Information Act. No, these public documents are available to anyone who looks at the U.S. government’s contracting website, FBO.gov. In this case “anyone,” is usually just contractors looking to sell goods, services, or research to the government. But, because the government often makes itself more accessible to businesses than the general public, it’s also a useful tool for watchdogs. Every government program costs money, and whenever money is involved, there’s a paper trail.

Searching FBO.gov is difficult enough that there are firms that offer search services to assist contractors with locating business opportunities.

Collating FBO.gov data with topic maps (read adding non-FBO.gov data) will be a value-add to watchdogs, potential contractors (including yourself), or watchers watching watchers.

Dave’s post will get you started on your way.

July 28, 2016

U.S. Climate Resilience Toolkit

Filed under: Environment,Government Data — Patrick Durusau @ 8:28 pm

Bringing climate information to your backyard: the U.S. Climate Resilience Toolkit by Tamara Dickinson and Kathryn Sullivan.

From the post:

Climate change is a global challenge that will requires local solutions. Today, a new version of the Climate Resilience Toolkit brings climate information to your backyard.

The Toolkit, called for in the President’s Climate Action Plan and developed by the National Oceanic and Atmospheric Administration (NOAA), in collaboration with a number of Federal agencies, was launched in 2014. After collecting feedback from a diversity of stakeholders, the team has updated the Toolkit to deliver more locally-relevant information and to better serve the needs of its users. Starting today, Toolkit users will find:

  • A redesigned user interface that is responsive to mobile devices;
  • County-scale climate projections through the new version of the Toolkit’s Climate Explorer;
  • A new “Reports” section that includes state and municipal climate-vulnerability assessments, adaptation plans, and scientific reports; and
  • A revised “Steps to Resilience” guide, which communicates steps to identifying and addressing climate-related vulnerabilities.

Thanks to the Toolkit’s Climate Explorer, citizens, communities, businesses, and policy leaders can now visualize both current and future climate risk on a single interface by layering up-to-date, county-level, climate-risk data with maps. The Climate Explorer allows coastal communities, for example, to overlay anticipated sea-level rise with bridges in their jurisdiction in order to identify vulnerabilities. Water managers can visualize which areas of the country are being impacted by flooding and drought. Tribal nations can see which of their lands will see the greatest mean daily temperature increases over the next 100 years.  

A number of decision makers, including the members of the State, Local, and Tribal Leaders Task Force, have called on the Federal Government to develop actionable information at local-to-regional scales.  The place-based, forward-looking information now available through the Climate Explorer helps to meet this demand.

The Climate Resilience Toolkit update builds upon the Administration’s efforts to boost access to data and information through resources such as the National Climate Assessment and the Climate Data Initiative. The updated Toolkit is a great example of the kind of actionable information that the Federal Government can provide to support community and business resilience efforts. We look forward to continuing to work with leaders from across the country to provide the tools, information, and support they need to build healthy and climate-ready communities.

Check out the new capabilities today at toolkit.climate.gov!

I have only started to explore this resource but thought I should pass it along.

Of particular interest to me is the integration of data/analysis from this resource with other data.

Suggestions/comments?

July 25, 2016

Accessing IRS 990 Filings (Old School)

Filed under: Amazon Web Services AWS,Government Data,XML — Patrick Durusau @ 2:36 pm

Like many others, I was glad to see: IRS 990 Filings on AWS.

From the webpage:

Machine-readable data from certain electronic 990 forms filed with the IRS from 2011 to present are available for anyone to use via Amazon S3.

Form 990 is the form used by the United States Internal Revenue Service to gather financial information about nonprofit organizations. Data for each 990 filing is provided in an XML file that contains structured information that represents the main 990 form, any filed forms and schedules, and other control information describing how the document was filed. Some non-disclosable information is not included in the files.

This data set includes Forms 990, 990-EZ and 990-PF which have been electronically filed with the IRS and is updated regularly in an XML format. The data can be used to perform research and analysis of organizations that have electronically filed Forms 990, 990-EZ and 990-PF. Forms 990-N (e-Postcard) are not available withing this data set. Forms 990-N can be viewed and downloaded from the IRS website.

I could use AWS but I’m more interested in deep analysis of a few returns than analysis of the entire dataset.

Fortunately the webpage continues:


An index listing all of the available filings is available at s3://irs-form-990/index.json. This file includes basic information about each filing including the name of the filer, the Employer Identificiation Number (EIN) of the filer, the date of the filing, and the path to download the filing.

All of the data is publicly accessible via the S3 bucket’s HTTPS endpoint at https://s3.amazonaws.com/irs-form-990. No authentication is required to download data over HTTPS. For example, the index file can be accessed at https://s3.amazonaws.com/irs-form-990/index.json and the example filing mentioned above can be accessed at https://s3.amazonaws.com/irs-form-990/201541349349307794_public.xml (emphasis in original).

I open a terminal window and type:

wget https://s3.amazonaws.com/irs-form-990/index.json

which as of today, results in:

-rw-rw-r-- 1 patrick patrick 1036711819 Jun 16 10:23 index.json

A trial grep:

grep "NATIONAL RIFLE" index.json > nra.txt

Which produces:

{“EIN”: “530116130”, “SubmittedOn”: “2014-11-25”, “TaxPeriod”: “201312”, “DLN”: “93493309004174”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201423099349300417_public.xml”, “FormType”: “990”, “ObjectId”: “201423099349300417”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2013-12-20”, “TaxPeriod”: “201212”, “DLN”: “93493260005203”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201302609349300520_public.xml”, “FormType”: “990”, “ObjectId”: “201302609349300520”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2012-12-06”, “TaxPeriod”: “201112”, “DLN”: “93493311011202”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201203119349301120_public.xml”, “FormType”: “990”, “ObjectId”: “201203119349301120”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “396056607”, “SubmittedOn”: “2011-05-12”, “TaxPeriod”: “201012”, “FormType”: “990EZ”, “LastUpdated”: “2016-06-14T01:22:09.915971Z”, “OrganizationName”: “EAU CLAIRE NATIONAL RIFLE CLUB”, “IsElectronic”: false, “IsAvailable”: false},
{“EIN”: “530116130”, “SubmittedOn”: “2011-11-09”, “TaxPeriod”: “201012”, “DLN”: “93493270005081”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201132709349300508_public.xml”, “FormType”: “990”, “ObjectId”: “201132709349300508”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2016-01-11”, “TaxPeriod”: “201412”, “DLN”: “93493259005035”, “LastUpdated”: “2016-04-29T13:40:20”, “URL”: “https://s3.amazonaws.com/irs-form-990/201532599349300503_public.xml”, “FormType”: “990”, “ObjectId”: “201532599349300503”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},

We have one errant result, the “EAU CLAIRE NATIONAL RIFLE CLUB,” so let’s delete that, re-order by year and the NATIONAL RIFLE ASSOCIATION OF AMERICA result reads (most recent to oldest):

{“EIN”: “530116130”, “SubmittedOn”: “2016-01-11”, “TaxPeriod”: “201412”, “DLN”: “93493259005035”, “LastUpdated”: “2016-04-29T13:40:20”, “URL”: “https://s3.amazonaws.com/irs-form-990/201532599349300503_public.xml”, “FormType”: “990”, “ObjectId”: “201532599349300503”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2014-11-25”, “TaxPeriod”: “201312”, “DLN”: “93493309004174”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201423099349300417_public.xml”, “FormType”: “990”, “ObjectId”: “201423099349300417”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2013-12-20”, “TaxPeriod”: “201212”, “DLN”: “93493260005203”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201302609349300520_public.xml”, “FormType”: “990”, “ObjectId”: “201302609349300520”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2012-12-06”, “TaxPeriod”: “201112”, “DLN”: “93493311011202”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201203119349301120_public.xml”, “FormType”: “990”, “ObjectId”: “201203119349301120”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},
{“EIN”: “530116130”, “SubmittedOn”: “2011-11-09”, “TaxPeriod”: “201012”, “DLN”: “93493270005081”, “LastUpdated”: “2016-03-21T17:23:53”, “URL”: “https://s3.amazonaws.com/irs-form-990/201132709349300508_public.xml”, “FormType”: “990”, “ObjectId”: “201132709349300508”, “OrganizationName”: “NATIONAL RIFLE ASSOCIATION OF AMERICA”, “IsElectronic”: true, “IsAvailable”: true},

Of course, now you want the XML 990 returns, so extract the URLs for the 990s to a file, here nra-urls.txt (I would use awk if it is more than a handful):

https://s3.amazonaws.com/irs-form-990/201532599349300503_public.xml
https://s3.amazonaws.com/irs-form-990/201423099349300417_public.xml
https://s3.amazonaws.com/irs-form-990/201302609349300520_public.xml
https://s3.amazonaws.com/irs-form-990/201203119349301120_public.xml
https://s3.amazonaws.com/irs-form-990/201132709349300508_public.xml

Back to wget:

wget -i nra-urls.txt

Results:

-rw-rw-r– 1 patrick patrick 111798 Mar 21 16:12 201132709349300508_public.xml
-rw-rw-r– 1 patrick patrick 123490 Mar 21 19:47 201203119349301120_public.xml
-rw-rw-r– 1 patrick patrick 116786 Mar 21 22:12 201302609349300520_public.xml
-rw-rw-r– 1 patrick patrick 122071 Mar 21 15:20 201423099349300417_public.xml
-rw-rw-r– 1 patrick patrick 132081 Apr 29 10:10 201532599349300503_public.xml

Ooooh, it’s in XML! 😉

For the XML you are going to need: Current Valid XML Schemas and Business Rules for Exempt Organizations Modernized e-File, not to mention a means of querying the data (may I suggest XQuery?).

Once you have the index.json file, with grep, a little awk and wget, you can quickly explore IRS 990 filings for further analysis or to prepare queries for running on AWS (such as discovery of common directors, etc.).

Enjoy!

July 20, 2016

What’s the “CFR” and Why Is It So Important to Me?

Filed under: Government,Government Data,Law,Law - Sources — Patrick Durusau @ 7:40 pm

What’s the “CFR” and Why Is It So Important to Me? Government Printing Office (GPO) blog, GovernmentBookTalk.

From the post:

If you’re a GPO Online Bookstore regular or public official you probably know we’re speaking about the “Code of Federal Regulations.” CFRs are produced routinely by all federal departments and agencies to inform the public and government officials of regulatory changes and updates for literally every subject that the federal government has jurisdiction to manage.

For the general public these constantly updated federal regulations can spell fantastic opportunity. Farmer, lawyer, construction owner, environmentalist, it makes no difference. Within the 50 codes are a wide variety of regulations that impact citizens from all walks of life. Federal Rules, Regulations, Processes, or Procedures on the surface can appear daunting, confusing, and even may seem to impede progress. In fact, the opposite is true. By codifying critical steps to anyone who operates within the framework of any of these sectors, the CFR focused on a particular issue can clarify what’s legal, how to move forward, and how to ultimately successfully translate one’s projects or ideas into reality.

Without CFR documentation the path could be strewn with uncertainty, unknown liabilities, and lost opportunities, especially regarding federal development programs, simply because an interested party wouldn’t know where or how to find what’s available within their area of interest.

The authors of CFRs are immersed in the technical and substantive issues associated within their areas of expertise. For a private sector employer or entrepreneur who becomes familiar with the content of CFRs relative to their field of work, it’s like having an expert staff on board.

I like the CFRs but I stumbled on:

For a private sector employer or entrepreneur who becomes familiar with the content of CFRs relative to their field of work, it’s like having an expert staff on board.

I don’t doubt the expertise of the CFR authors, but their writing often requires an expert for accurate interpretation. If you doubt that statement, test your reading skills on any section of CFR Title 26, Internal Revenue.

Try your favorite NLP parser out on any of the CFRs.

The post lists a number of ways to acquire the CFRs but personally I would use the free Electronic Code of Federal Regulations unless you need to impress clients with the paper version.

Enjoy!

June 18, 2016

IRS E-File Bucket – Internet Archive

Filed under: Government,Government Data,Non-Profit — Patrick Durusau @ 4:41 pm

IRS E-File Bucket courtesy of Carl Malamud and Public.Resource.Org.

From the webpage:

This bucket contains a mirror of the IRS e-file release as of June 16, 2016. You may access the source files at https://aws.amazon.com/public-data-sets/irs-990/. The present bucket may or may not be updated in the future.

To access this bucket, use the download links.

Note that tarballs is image scans from 2002-2015 are also available in this IRS 990 Forms collection.

Many thanks to the Internal Revenue Service for making this information available. Here is their announcement on June 16, 2016. Here is a statement from Public.Resource.Org congratulating the IRS on a job well done.

As I noted in IRS 990 Filing Data (2001 to date):

990* disclosures aren’t detailed enough to pinch but when combined with other data, say leaked data, the results can be remarkable.

It’s up to you to see that public disclosures pinch.

June 16, 2016

IRS 990 Filing Data (2001 to date)

Filed under: Amazon Web Services AWS,Government Data,Non-Profit — Patrick Durusau @ 4:21 pm

IRS 990 Filing Data Now Available as an AWS Public Data Set

From the post:

We are excited to announce that over one million electronic IRS 990 filings are available via Amazon Simple Storage Service (Amazon S3). Filings from 2011 to the present are currently available and the IRS will add new 990 filing data each month.

(image omitted)

Form 990 is the form used by the United States Internal Revenue Service (IRS) to gather financial information about nonprofit organizations. By making electronic 990 filing data available, the IRS has made it possible for anyone to programmatically access and analyze information about individual nonprofits or the entire nonprofit sector in the United States. This also makes it possible to analyze it in the cloud without having to download the data or store it themselves, which lowers the cost of product development and accelerates analysis.

Each electronic 990 filing is available as a unique XML file in the “irs-form-990” S3 bucket in the AWS US East (N. Virginia) region. Information on how the data is organized and what it contains is available on the IRS 990 Filings on AWS Public Data Set landing page.

Some of the forms and instructions that will help you make sense of the data reported:

990 – Form 990 Return of Organization Exempt from Income Tax, Annual Form 990 Requirements for Tax-Exempt Organizations

990-EZ – 2015 Form 990-EZ, Instructions for IRS 990 EZ – Internal Revenue Service

990-PF – 2015 Form 990-PF, 2015 Instructions for Form 990-PF

As always, use caution with law related data as words may have unusual nuances and/or unexpected meanings.

These forms and instructions are only a tiny part of a vast iceberg of laws, regulations, rulings, court decisions and the like.

990* disclosures aren’t detailed enough to pinch but when combined with other data, say leaked data, the results can be remarkable.

June 6, 2016

Breaking Californication (An Act Performed On The Public)

Filed under: Government,Government Data,Transparency — Patrick Durusau @ 4:43 pm

Law Enforcement Lobby Succeeds In Killing California Transparency Bill by Kit O’Connell.

From the post:

A California Senate committee killed a bill to increase transparency in police misconduct investigations, hampering victims’ efforts to obtain justice.

Chauncee Smith, legislative advocate at the ACLU of California, told MintPress News that the state Legislature “caved to the tremendous influence and power of the law enforcement lobby” and “failed to listen to the demands and concerns of everyday Californian people.”

California has some of the most secretive rules in the country when it comes to investigations into police misconduct and excessive use of force. Records are kept sealed, regardless of the outcome, as the ACLU of Northern California explains on its website:

“In places like Texas, Kentucky, and Utah, peace officer records are made public when an officer is found guilty of misconduct. Other states make records public regardless of whether misconduct is found. This is not the case in California.”

“Right now, there is a tremendous cloud of secrecy that is unparalleled compared to many other states,” Smith added. “California is in the minority in which the public do not know basic information when someone is killed or potentially harmed by those are sworn to serve and protect them.”

In February, Sen. Mark Leno, a Democrat from San Francisco, introduced SB 1286, the “Enhance Community Oversight on Police Misconduct and Serious Uses of Force” bill. It would have allowed “public access to investigations, findings and discipline information on serious uses of force by police” and would have increased transparency in other cases of police misconduct, according to an ACLU fact sheet. Polling data cited by the ACLU suggests about 80 percent of Californians would support the measure.

But the bill’s progress through the legislature ended on May 27, when it failed to pass out of the Senate Appropriations committee.

“Today is a sad day for transparency, accountability, and justice in California,” said Peter Bibring, police practices director for the ACLU of California, in a May 27 press release.

Mistrust between police officers and citizens makes the job of police officers more difficult and dangerous, while denying citizens the full advantages of a trained police force, paid for by their tax dollars.

The state legislature, finding sowing and fueling mistrust between police officers and citizens has election upsides for them, fans those flames with secrecy over police misconduct investigations.

Open, not secret (read grand jury) proceedings where witnesses can be fairly examined (unlike the deliberately thrown Michael Brown investigation), can go a long way to re-establishing trust between the police and the public.

Members of the community know when someone was a danger to police officers and others. Whether their family members will admit it or not. Likewise, police officers know which officers are far to quick to escalate to deadly force. Want better community policing? What better citizen cooperation? That’s not going to happen with completely secret police misconduct investigations.

So the State of California is going to collect the evidence, statements, etc., in police misconduct investigations, but won’t share that information with the public. At least not willingly.

Official attempts to break illegitimate government secrecy failed. Even if it had succeeded you’d be paying least $0.25 per page plus a service fee.

Two observations about government networks:

  • Secret (and otherwise) government documents are usually printed on networked printers.
  • Passively capturing Ethernet traffic (network tap) captures printer traffic too.

Whistle blowers don’t have to hack heavily monitored systems, steal logins/passwords, leaking illegally withheld documents is within the reach of anyone who can plug in an Ethernet cable.

There’s a bit more to it than that, but remember all those network cables running through the ceiling, walls, closets, the next time your security consultant, assures you of your network’s security.

As a practical matter, if you start leaking party menus and football pools, someone will start looking for a network tap.

Leak when it makes a significant difference to public discussion and/or legal proceedings. Even then, look for ways to attribute the leak to factions within the government.

Remember the DoD’s amused reaction to State’s huffing and puffing over the Afghan diplomatic cables? That sort of rivalry exists at every level of government. You should use it to your advantage.

The State of California would have you believe that government information sharing is at its sufferance.

I beg to differ.

So should you.

March 3, 2016

11 Million Pages of CIA Files [+ Allen Dulles, war criminal]

Filed under: Government,Government Data,Topic Maps — Patrick Durusau @ 6:54 pm

11 Million Pages of CIA Files May Soon Be Shared By This Kickstarter by Joseph Cox.

From the post:

Millions of pages of CIA documents are stored in Room 3000. The CIA Records Search Tool (CREST), the agency’s database of declassified intelligence files, is only accessible via four computers in the National Archives Building in College Park, MD, and contains everything from Cold War intelligence, research and development files, to images.

Now one activist is aiming to get those documents more readily available to anyone who is interested in them, by methodically printing, scanning, and then archiving them on the internet.

“It boils down to freeing information and getting as much of it as possible into the hands of the public, not to mention journalists, researchers and historians,” Michael Best, analyst and freedom of information activist told Motherboard in an online chat.

Best is trying to raise $10,000 on Kickstarter in order to purchase the high speed scanner necessary for such a project, a laptop, office supplies, and to cover some other costs. If he raises more than the main goal, he might be able to take on the archiving task full-time, as well as pay for FOIAs to remove redactions from some of the files in the database. As a reward, backers will help to choose what gets archived first, according to the Kickstarter page.

“Once those “priority” documents are done, I’ll start going through the digital folders more linearly and upload files by section,” Best said. The files will be hosted on the Internet Archive, which converts documents into other formats too, such as for Kindle devices, and sometimes text-to-speech for e-books. The whole thing has echoes of Cryptome—the freedom of information duo John Young and Deborah Natsios, who started off scanning documents for the infamous cypherpunk mailing list in the 1990s.

Good news! Kickstarter has announced this project funded!

Additional funding will help make this archive of documents available sooner rather than later.

As opposed to an attempt to boil the ocean of 11 million pages of CIA files, what about smaller topic mapping/indexing projects that focus on bounded sub-sets of documents of interest to particular communities?

I don’t have any interest in the STAR GATE project (clairvoyance, precognition, or telepathy, continued now by the DHS at airport screening facilities) but would be very interested in the records of Allen Dulles, a war criminal of some renown.

Just so you know, Michael has already uploaded documents on Allen Dulles from the CIA Records Search Tool (CREST) tool:

History of Allen Welsh Dulles as CIA Director – Volume I: The Man

History of Allen Welsh Dulles as CIA Director – Volume II: Coordination of Intelligence

History of Allen Welsh Dulles as CIA Director – Volume III: Covert Activities

History of Allen Welsh Dulles as CIA Director – Volume IV: Congressional Oversight and Internal Administration

History of Allen Welsh Dulles as CIA Director – Volume V: Intelligence Support of Policy

To describe Allen Dulles as a war criminal is no hyperbole. Among his other crimes, overthrow of President Jacobo Arbenz Guzman of Guatemala (think United Fruit Company), removal of Mohammad Mossadeq, prime minister of Iran (think Shah of Iran), are only two of his crimes, the full extent of which will probably never be known.

Files are being uploaded to That 1 Archive.

Scripting FOIA Requests

Filed under: Government,Government Data — Patrick Durusau @ 4:17 pm

An Activist Wrote a Script to FOIA the Files of 7,000 Dead FBI Officials by Joseph Cox.

From the post:

One of the best times to file a Freedom of Information request with the FBI is when someone dies; after that, any files that the agency holds on them can be requested. Asking for FBI files on the deceased is therefore pretty popular, with documents released on Steve Jobs, Malcolm X and even the Insane Clown Posse.

One activist is turning this back onto the FBI itself, by requesting files on nearly 7,000 dead FBI employees en masse, and releasing a script that allows anyone else to do the same.

“At the very least, it’ll be like having an extensive ‘Who’s Who in the FBI’ to consult, without worrying that anyone in there is still alive and might face retaliation for being in law enforcement,” Michael Best told Motherboard in an online chat. “For some folks, they’ll probably show allegations of wrongdoing while others probably highlight some of the FBI’s best and brightest.”

On Monday, Best will file FOIAs for FBI records and files relating to 6,912 employees named in the FBI’s own “Dead List,” a list of people that the FBI understands to be deceased. A recent copy of the list, which includes special agents and section chiefs, was FOIA’d by MuckRock editor JPat Brown in January.

Points to remember:

  • Best’s script works for any FOIA office that accepts email requests (not just the FBI, be creative)
  • Get 3 or more people to file the same FOIA requests
  • Publicize your spreadsheet of FOIA targets

Don’t forget the need to scan, OCR and index (topic map) the results of your FOIA requests.

Information that cannot be found may as well still be concealed by the FBI (and others).

February 28, 2016

Earthdata Search – Smells Like A Topic Map?*

Filed under: Ecoinformatics,Government Data,Topic Maps — Patrick Durusau @ 3:06 pm

Earthdata Search

From the webpage:

Search NASA Earth Science data by keyword and filter by time or space.

After choosing tour:

Keyword Search

Here you can enter search terms to find relevant data. Search terms can be science terms, instrument names, or even collection IDs. Let’s start by searching for Snow Cover NRT to find near real-time snow cover data. Type Snow Cover NRT in the keywords box and press Enter.

Which returns a screen in three sections, left to right: Browse Collections, 21 Matching Collections (Add collections to your project to compare and retrieve their data), and the third section displays a world map (navigate by grabbing the view)

Under Browse Collections:

In addition to searching for keywords, you can narrow your search through this list of terms. Click Platform to expand the list of platforms (still in a tour box)

Next step:

Now click Terra to select the Terra satellite.

Comment: Wondering how I will know which “platform” or “instrument” to select? There may be more/better documentation but I haven’t seen it yet.

The data follows the Unified Metadata Model (UMM):

NASA’s Common Metadata Repository (CMR) is a high-performance, high-quality repository for earth science metadata records that is designed to handle metadata at the Concept level. Collections and Granules are common metadata concepts in the Earth Observation (EO) world, but this can be extended out to Visualizations, Parameters, Documentation, Services, and more. The CMR metadata records are supplied by a diverse array of data providers, using a variety of supported metadata standards, including:

umm-page-table-1-metadata-standards

Initially, designers of the CMR considered standardizing all CMR metadata to a single, interoperable metadata format – ISO 19115. However, NASA decided to continue supporting multiple metadata standards in the CMR — in response to concerns expressed by the data provider community over the expense involved in converting existing metadata systems to systems capable of generating ISO 19115. In order to continue supporting multiple metadata standards, NASA designed a method to easily translate from one supported standard to another and constructed a model to support the process. Thus, the Unified Metadata Model (UMM) for EOSDIS metadata was born as part of the EOSDIS Metadata Architecture Studies (MAS I and II) conducted between 2012 and 2013.

What is the UMM?

The UMM is an extensible metadata model which provides a ‘Rosetta stone’ or cross-walk for mapping between CMR-supported metadata standards. Rather than create mappings from each CMR-supported metadata standard to each other, each standard is mapped centrally to the UMM model, thus reducing the number of translations required from n x (n-1) to 2n.

Here the mapping graphic:

umm-page-benefits-diagram

Granting profiles don’t make the basis for mappings explicit, but the mappings have the same impact post mapping as a topic map would post merging.

The site could use better documentation for the interface and data, at least in the view of this non-expert in the area.

Thoughts on documentation for the interface or making the mapping more robust via use of a topic map?

I first saw this in a tweet by Kirk Borne.


*Smells Like A Topic Map – Sorry, culture bound reference to a routine on the first Cheech & Chong album. No explanation would do it justice.

February 20, 2016

No Patriotic Senators, Senate Staffers, Agency Heads – CIA Torture Report

Filed under: Government,Government Data — Patrick Durusau @ 3:05 pm

I highly recommend your reading The CIA torture report belongs to the public by Lauren Harper.

I have quoted from Lauren’s introduction below as an inducement to read the article in full, but she fails to explore why a patriotic senator, staffer or agency head has not already leaked the CIA Torture Report?

It has already been printed, bound, etc., and who knows how many people were involved in every step of that process.

Do you seriously believe that report has gone unread except for its authors?

So far as I know, member of Congress, that “other” branch of the government, are entitled to make their own decisions about the handling of their reports.

What America needs now is a Senator or even a Senate staffer with more loyalty to the USA than to the bed wetters and torturers (same group) in the DoJ.

If you remember a part of the constitution that grants the DoJ the role of censor for the American public, please point it out in comments below.

From the post:

The American public’s ability to read the Senate Intelligence Committee’s full, scathing report on the Central Intelligence Agency’s torture program is in danger because David Ferriero, the archivist of the United States, will not call the report what it is, a federal record. He is refusing to use his clear statutory authority to label the report a federal record, which would be subject to Freedom of Information Act (FOIA) disclosure requirements, because the Justice Department has told the National Archives and Records Administration (NARA) not to. The DOJ has a long history of breaking the law to avoid releasing information in response to FOIA requests. The NARA does not have such a legacy and should not allow itself to be bullied by the DOJ.

The DOJ instructed the NARA not to make any determination on the torture report’s status as a federal record, ostensibly because it would jeopardize the government’s position in a FOIA lawsuit seeking the report’s release. The DOJ, however, has no right to tell the NARA not to weigh in on the record’s status, and the Presidential and Federal Records Act Amendments of 2014 gives the archivist of the United States the binding legal authority to make precisely that determination.

Democratic Sens. Patrick Leahy of Vermont and Dianne Feinstein of California revealed the DOJ’s insistence that the archivist of the United States not faithfully fulfill his duty in a Nov. 5, 2015, letter to Attorney General Loretta Lynch. They protested the DOJ’s refusal to allow its officials as well as those of the Defense Department, the CIA and the State Department to read the report. Leahy and Feinstein’s letter notes that “personnel at the National Archives and Records Administration have stated that, based on guidance from the Department of Justice, they will not respond to questions about whether the study constitutes a federal record under the Federal Records Act because the FOIA case is pending.” Rather than try to win the FOIA case on a technicality and step on the NARA’s statutory toes, the DOJ should allow the FOIA review process to determine on the case’s merits whether the document may be released.

Not even officials with security clearances may read the document while its status as a congressional or federal record is debated. The New York Times reported in November 2015 that in December of the previous year, a Senate staffer delivered envelopes containing the 6,700-page top secret report to the DOJ, the State Department, the Federal Bureau of Investigation and the Pentagon. Yet a year later, none of the envelopes had been opened, and none of the country’s top officials had read the report’s complete findings. This is because the DOJ, the Times wrote, “prohibited officials from the government agencies that possess it from even opening the report, effectively keeping the people in charge of America’s counterterrorism future from reading about its past.” The DOJ contends that if any agency officials read the report, it could alter the outcome of the FOIA lawsuit.

American war criminals who are identified or who can be discovered because of the CIA Torture Report should be prosecuted to the full extent of national and international law.

Anyone who has participated in attempts to conceal those events or to prevent disclosure of the CIA Torture Report, should be tried as accomplices after the fact to those war crimes.

The facility at Guantanamo Bay can be converted into a holding facility for DoJ staffers who tried to conceal war crimes. Poetic justice I would say.

February 17, 2016

For What It’s Worth: CIA Releases Declassified Documents to National Archives

Filed under: Government,Government Data — Patrick Durusau @ 3:19 pm

CIA Releases Declassified Documents to National Archives

From the webpage:

Today, CIA released about 750,000 pages of declassified intelligence papers, records, research files and other content which are now accessible through CIA’s Records Search Tool (CREST) at the National Archives in College Park, MD. This release will include nearly 100,000 pages of analytic intelligence publication files, and about 20,000 pages of research and development files from CIA’s Directorate of Science and Technology, among others.

The newly available documents are being released in partnership with the National Geospatial Intelligence Agency (NGA) and are available by accessing CREST at the National Archives. This release continues CIA’s efforts to systematically review and release documents under Executive Order 13526. With this release, the CIA collection of records on the CREST system increases to nearly 13 million declassified pages.

That was posted on 16 February 2016.

Disclaimer: No warranty express or implied is made with regards to the accuracy of the notice quoted above or as to the accuracy of anything you may or may not find in the released documents, if they in fact exist.

I merely report that the quoted material was posted to the CIA website at the location and on the date recited.

February 2, 2016

Sunlight launches Hall of Justice… [ Topic Map “like” features?]

Filed under: Government,Government Data,Topic Maps — Patrick Durusau @ 6:53 pm

Sunlight launches Hall of Justice, a massive data inventory on criminal justice across the U.S. by Josh Stewart.

From the post:

Today, Sunlight is launching Hall of Justice, a robust, searchable data inventory of nearly 10,000 datasets and research documents from across all 50 states, the District of Columbia and the federal government. Hall of Justice is the culmination of 18 months of work gathering data and refining technology.

The process was no easy task: Building Hall of Justice required manual entry of publicly available data sources from a multitude of locations across the country.

Sunlight’s team went from state to state, meeting and calling local officials to inquire about and find data related to criminal justice. Some states like California have created a data portal dedicated to making criminal justice data easily accessible to the public; others had their data buried within hard to find websites. We also found data collected by state departments of justice, police forces, court systems, universities and everything in between.

“Data is shaping the future of how we address some of our most pressing problems,” said John Wonderlich, executive director of the Sunlight Foundation. “This new resource is an experiment in how a robust snapshot of data can inform policy and research decisions.”

In addition to being a great data collection, the Hall of Justice attempts to deliver topic map like capability for searches:

The resource attempts to consolidate different terminology across multiple states, which is far from uniform or standardized. For example, if you search solitary confinement you will return results for data around solitary confinement, but also for the terms “segregated housing unit,” “SHU,” “administrative segregation” and “restrictive housing.” This smart search functionality makes finding datasets much easier and accessible.

solitary

Looking at all thirteen results for a search on “solitary confinement,” I don’t see the mapping in question. Or certainly no mapping based on characteristics of the subject, “solitary confinement.”

As close as Georgia’s 2013 Juvenile Justice Reform is using the word “restrictive” as in:

Create a two-class system within the Designated Felony Act. Designated felony offenses are divided into two classes, based on severity—Class A and Class B—that continue to allow restrictive custody while also adjusting available sanctions to account for both offense severity and risk level.

Restrictive custody is what jail systems are about so that doesn’t trip the wire for “solitary confinement.”

Of course, the links are to entire reports/documents/data sets so each researcher will have to extract and collate content individually. When that happens, a means to contribute that collation/mapping to the Hall of Justice would be a boon for other researchers. (Can you say “topic map?”)

As I write this, you will need to prefer Mozilla over Chrome, at least on Ubuntu.

Trigger Warning: If you are sensitive to traumatic events and/or reports of traumatic events, you may want to ask someone less sensitive to review these data sources.

The only difference between a concentration camp and American prisons is the lack of mass gas chambers. Every horror and abuse that you can imagine and some you probably can’t, are visited on people in U.S. prisons everyday.

As Joan Baez says in Prison Triology:

Sunlight’s Hall of Justice is a great step forward in documenting the chambers of horror we call American prisons.

And we’re gonna raze, raze the prisons

To the ground

Help us raze, raze the prisons

To the ground

Are you ready?

December 28, 2015

Voter Record Privacy? WTF?

Filed under: Cybersecurity,Government,Government Data,Privacy — Patrick Durusau @ 10:23 pm

Leaky database tramples privacy of 191 million American voters by Dell Cameron.

From the post:

The voter information of more than 191 million Americans—including full names, dates of birth, home addresses, and more—was exposed online for anyone who knew the right IP address.

The misconfigured database, which was reportedly shut down at around 7pm ET Monday night, was discovered by security researcher Chris Vickery. Less than two weeks ago, Vickery also exposed a flaw in MacKeeper’s database, similarly exposing 13 million customer records.

What amazes me about this “leak” is the outrage is focused on the 191+ million records being online.

??

What about the six or seven organizations who denied being the owners of the IP address in question?

I take it none of them denied having possession of the same or essentially the same data, just that they didn’t “leak” it.

Quick question: Was voter privacy breached when these six or seven organizations got the same data or when it went online?

I would say when the Gang of Six or Seven got the same data.

You don’t have any meaningful voter privacy, aside from your actual ballot, and with your credit record (also for sale), you voting behavior can be nailed too.

You don’t have privacy but the Gang of Six or Seven do.

Attempting to protect lost privacy is pointless.

Making corporate overlords lose their privacy as well has promise.

PS: Torrents of corporate overlord data? Much more interesting than voter data.

December 14, 2015

Congress.gov Enhancements: Quick Search, Congressional Record Index, and More

Filed under: Government,Government Data,XML,XQuery — Patrick Durusau @ 9:12 pm

New End of Year Congress.gov Enhancements: Quick Search, Congressional Record Index, and More by Andrew Weber.

From the post:

In our quest to retire THOMAS, we have made many enhancements to Congress.gov this year.  Our first big announcement was the addition of email alerts, which notify users of the status of legislation, new issues of the Congressional Record, and when Members of Congress sponsor and cosponsor legislation.  That development was soon followed by the addition of treaty documents and better default bill text in early spring; improved search, browse, and accessibility in late spring; user driven feedback in the summer; and Senate Executive Communications and a series of Two-Minute Tip videos in the fall.

Today’s update on end of year enhancements includes a new Quick Search for legislation, the Congressional Record Index (back to 1995), and the History of Bills from the Congressional Record Index (available from the Actions tab).  We have also brought over the State Legislature Websites page from THOMAS, which has links to state level websites similar to Congress.gov.

Text of legislation from the 101st and 102nd Congresses (1989-1992) has been migrated to Congress.gov. The Legislative Process infographic that has been available from the homepage as a JPG and PDF is now available in Spanish as a JPG and PDF (translated by Francisco Macías). Margaret and Robert added Fiscal Year 2003 and 2004 to the Congress.gov Appropriations Table. There is also a new About page on the site for XML Bulk Data.

The Quick Search provides a form-based search with fields similar to those available from the Advanced Legislation Search on THOMAS.  The Advanced Search on Congress.gov is still there with many additional fields and ways to search for those who want to delve deeper into the data.  We are providing the new Quick Search interface based on user feedback, which highlights selected fields most likely needed for a search.

There’s an impressive summary of changes!

Speaking of practicing programming, are you planning on practicing XQuery on congressional data in the coming year?

December 12, 2015

Why the Open Government Partnership Needs a Reboot [Governments Too]

Filed under: Government,Government Data,Open Government,Transparency — Patrick Durusau @ 7:31 pm

Why the Open Government Partnership Needs a Reboot by Steve Adler.

From the post:

The Open Government Partnership was created in 2011 as an international forum for nations committed to implementing Open Government programs for the advancement of their societies. The idea of open government started in the 1980s after CSPAN was launched to broadcast U.S. Congressional proceedings and hearings to the American public on TV. While the galleries above the House of Representatives and Senate had been “open” to the “public” (if you got permission from your representative to attend) for decades, never before had all public democratic deliberations been broadcast on TV for the entire nation to behold at any time they wished to tune in.

I am a big fan of OGP and feel that the ideals and ambition of this partnership are noble and essential to the survival of democracy in this millennium. But OGP is a startup, and every startup business or program faces a chasm it must cross from early adopters and innovators to early majority market implementation and OGP is very much at this crossroads today. It has expanded membership at a furious pace the past three years and it’s clear to me that expansion is now far more important to OGP than the delivery of the benefits of open government to the hundreds of millions of citizens who need transparent transformation.

OGP needs a reboot.

The structure of a system produces its own behavior. OGP needs a new organizational structure with new methods for evaluating national commitments. But that reboot needs to happen within its current mission. We should see clearly that the current structure is straining due to the rapid expansion of membership. There aren’t enough support unit resources to manage the expansion. We have to rethink how we manage national commitments and how we evaluate what it means to be an open government. It’s just not right that countries can celebrate baby steps at OGP events while at the same time passing odious legislation, sidestepping OGP accomplishments, buckling to corruption, and cracking down on journalists.

Unlike Steve I didn’t and don’t have a lot of faith in governments being voluntarily transparent.

As I pointed out in Congress: More XQuery Fodder, sometime in 2016, full bill status data will be available for all legislation before the United States Congress.

A lot more data than is easy to access now but it is more smoke than fire.

With legislation status data, you can track the civics lesson progression of a bill through Congress, but that leaves you at least 3 to 4 degrees short of knowing who was behind the legislation.

Just a short list of what more would be useful:

  • Visitor/caller list for everyone who spoke to a member of Congress and their staff. With date and subject of the call.
  • All visitors and calls tied to particular legislation and/or classes of legislation
  • All fund raising calls made by members of Congress and/or their staffs, date, results, substance of call.
  • Representative conversations with reconciliation committee members or their staffers about legislation and requested “corrections.”
  • All conversations between a representative or member of their staff and agency staff, identifying all parties and the substance of the conversation
  • Notes, proposals, discussion notes for all agencies decisions

Current transparency proposals are sufficient to confuse the public with mounds of nearly useless data. None of it reflects the real decision making processes of government.

Before someone shouts “privacy,” I would point out that no citizen has a right to privacy when their request is for a government representative to favor them over other citizens of the same government.

Real government transparency will require breaking the mini-star chamber proceedings from the lowest to the highest levels of government.

What we need is a rebooting of governments.

December 8, 2015

Congress: More XQuery Fodder

Filed under: Government,Government Data,Law - Sources,XML,XQuery — Patrick Durusau @ 8:07 pm

Congress Poised for Leap to Open Up Legislative Data by Daniel Schuman.

From the post:

Following bills in Congress requires three major pieces of information: the text of the bill, a summary of what the bill is about, and the status information associated with the bill. For the last few years, Congress has been publishing the text and summaries for all legislation moving in Congress, but has not published bill status information. This key information is necessary to identify the bill author, where the bill is in the legislative process, who introduced the legislation, and so on.

While it has been in the works for a while, this week Congress confirmed it will make “Bill Statuses in XML format available through the GPO’s Federal Digital System (FDsys) Bulk Data repository starting with the 113th Congress,” (i.e. January 2013). In “early 2016,” bill status information will be published online in bulk– here. This should mean that people who wish to use the legislative information published on Congress.gov and THOMAS will no longer need to scrape those websites for current legislative information, but instead should be able to access it automatically.

Congress isn’t just going to pull the plug without notice, however. Through the good offices of the Bulk Data Task Force, Congress will hold a public meeting with power users of legislative information to review how this will work. Eight sample bill status XML files and draft XML User Guides were published on GPO’s GitHub page this past Monday. Based on past positive experiences with the Task Force, the meeting is a tremendous opportunity for public feedback to make sure the XML files serve their intended purposes. It will take place next Tuesday, Dec. 15, from 1-2:30. RSVP details below.

If all goes as planned, this milestone has great significance.

  • It marks the publication of essential legislative information in a format that supports unlimited public reuse, analysis, and republication. It will be possible to see much of a bill’s life cycle.
  • It illustrates the positive relationship that has grown between Congress and the public on access to legislative information, where there is growing open dialog and conversation about how to best meet our collective needs.
  • It is an example of how different components within the legislative branch are engaging with one another on a range of data-related issues, sometimes for the first time ever, under the aegis of the Bulk Data Task Force.
  • It means the Library of Congress and GPO will no longer be tied to the antiquated THOMAS website and can focus on more rapid technological advancement.
  • It shows how a diverse community of outside organizations and interests came together and built a community to work with Congress for the common good.

To be sure, this is not the end of the story. There is much that Congress needs to do to address its antiquated technological infrastructure. But considering where things were a decade ago, the bulk publication of information about legislation is a real achievement, the culmination of a process that overcame high political barriers and significant inertia to support better public engagement with democracy and smarter congressional processes.

Much credit is due in particular to leadership in both parties in the House who have partnered together to push for public access to legislative information, as well as the staff who worked tireless to make it happen.

If you look at the sample XML files, pay close attention to the <bioguideID> element and its contents. Is is the same value as you will find for roll-call votes, but there the value appears in the name-id attribute of the <legislator> element. See: http://clerk.house.gov/evs/2015/roll643.xml and do view source.

Oddly, the <bioguideID> element does not appear in the documentation on GitHub, you just have to know the correspondence to the name-id attribute of the <legislator> element

As I said in the title, this is going to be XQuery fodder.

December 3, 2015

Beta Testing eFOIA (FBI)

Filed under: Government,Government Data — Patrick Durusau @ 11:11 am

Want to Obtain FBI Records a Little Quicker? Try New eFOIA System

From the post:

The FBI recently began open beta testing of eFOIA, a system that puts Freedom of Information Act (FOIA) requests into a medium more familiar to an ever-increasing segment of the population. This new system allows the public to make online FOIA requests for FBI records and receive the results from a website where they have immediate access to view and download the released information.

Previously, FOIA requests have only been made through regular mail, fax, or e-mail, and all responsive material was sent to the requester through regular mail either in paper or disc format. “The eFOIA system,” says David Hardy, chief of the FBI’s Record/Information Dissemination Section, “is for a new generation that’s not paper-based.” Hardy also notes that the new process should increase FBI efficiency and decrease administrative costs.

The eFOIA system continues in an open beta format to optimize the process for requesters. The Bureau encourages requesters to try eFOIA and to e-mail foipaquestions@ic.fbi.gov with any questions or difficulties encountered while using it. In several months, the FBI plans to move eFOIA into full production mode.

The post gives a list of things you need to know/submit in order to help with beta testing of the eFOIA system.

Why help the FBI?

It’s true, I often chide the FBI for its padding of terrorism statistics by framing the mentally ill and certainly its project management skills are nothing to write home about.

Still, there are men and women in the FBI who do capture real criminals and not just the gullible or people who have offended the recording or movie industries. There are staffers, like the ones behind the eFOIA project, who are trying to do a public service, despite the bad apples in the FBI barrel.

Let’s give them a hand, even though decisions on particular FOIA requests may be quite questionable. Not the fault of the technology or the people who are trying to make it work.

What are you going to submit an FOIA about?

I first saw this in a tweet by Nieman Lab.

December 1, 2015

Progress on Connecting Votes and Members of Congress (XQuery)

Filed under: Government,Government Data,XQuery — Patrick Durusau @ 5:34 pm

Not nearly to the planned end point but I have corrected a file I generated with XQuery that provides the name-id numbers for members of the House and a link to their websites at house.gov.

It is a rough draft but you can find it at: http://www.durusau.net/publications/name-id-member-website-draft.html.

While I was casting about for the resources for this posting, I had the sinking feeling that I had wasted a lot of time and effort when I found: http://clerk.house.gov/xml/lists/MemberData.xml.

But, if you read that file carefully, what is the one thing it lacks?

A link to every members’s website at “….house.gov.”

Isn’t that interesting?

Of all the things to omit, why that one?

Especially since you can’t auto-generate the website names from the member names. What appear to be older names use just the last name of members. But, that strategy must have fallen pretty quickly when members with the same last names appeared.

The conflicting names and even some non-conflicting names follow a new naming protocol that appears to be firstname+lastname.house.gov.

That will work for a while until the next generation starts inheriting positions in the House.

Anyway, that is as far as I got today but at least it is a useful list for invoking the name-id of members of the House and obtaining their websites.

The next step will be hitting the websites to extract contact information.

Yes, I know that http://clerk.house.gov/xml/lists/MemberData.xml has the “official” contact information, along with their forms for email, etc.

If I wanted to throw my comment into a round file I could do that myself.

No, what I want to extract is their local office data so when they are “back home” meeting with constituents, the average voter has a better chance of being one of those constituents. Not just those who maxed out on campaign donations limits.

November 30, 2015

Connecting Roll Call Votes to Members of Congress (XQuery)

Filed under: Data Mining,Government,Government Data,XQuery — Patrick Durusau @ 10:29 pm

Apologies for the lack of posting today but I have been trying to connect up roll call votes in the House of Representatives to additional information on members of Congress.

In case you didn’t know, roll call votes are reported in XML and have this form:

<recorded-vote><legislator name-id="A000374" sort-field="Abraham" 
unaccented-name="Abraham" party="R" state="LA"
role="legislator">Abraham</legislator><
vote>Aye</vote></recorded-vote>
<recorded-vote><legislator name-id="A000370" sort-field="Adams" 
unaccented-name="Adams" party="D" state="NC" 
role="legislator">Adams</legislator
><vote>No</vote></recorded-vote>
<recorded-vote><legislator name-id="A000055" sort-field="Aderholt" 
unaccented-name="Aderholt" party="R" state="AL" 
role="legislator">Aderholt</legislator>
<vote>Aye</vote></recorded-vote>
<recorded-vote><legislator name-id="A000371" sort-field="Aguilar" 
unaccented-name="Aguilar" party="D" state="CA"
role="legislator">Aguilar</legislator><
vote>Aye</vote></recorded-vote>
...

For a full example: http://clerk.house.gov/evs/2015/roll643.xml

With the name-id attribute value, I can automatically construct URIs to the Biographical Directory of the United States Congress, for example, the entry on Abraham, Ralph.

More information than a poke with a sharp stick would give you but its only self-serving cant.

One of the things that would be nice to link up with roll call votes would be the homepages of those voting.

Continuing with Ralph Abraham, mapping A000374 to https://abraham.house.gov/ would be helpful in gathering other information, such as the various offices where Representative Abraham can be contacted.

If you are reading the URIs, you might think just prepending the last name of each representative to “house.gov” would be sufficient. Well, it would be except that there are eight-three cases where representatives share last names and/or a new naming scheme has more than the last name + house.gov.

After I was satisfied that there wasn’t a direct mapping between the current uses of name-id and House member websites, I started creating such a mapping that you can drop into XQuery as a lookup table and/or use as an external file.

The lookup table should be finished tomorrow so check back.

PS: Yes, I am aware there are tables of contact information for members of Congress but I have yet to see one that lists all their local offices. Moreover, a lookup table for XQuery may encourage people to connect more data to their representatives. Such as articles in local newspapers, property deeds and other such material.

October 7, 2015

Now over 1,000,000 Items to Search on Congress.gov [Cause to Celebrate?]

Filed under: Government,Government Data,Law,Law - Sources,Library — Patrick Durusau @ 4:08 pm

Now over 1,000,000 Items to Search on Congress.gov: Communications and More Added by Andrew Weber.

From the post:

This has been a great year as we continue our push to develop and refine Congress.gov.  There were email alerts added in February, treaties and better default text in March, the Federalist Papers and more browse options in May, and accessibility and user requested features in July.  With this October update, Senate Executive Communications from THOMAS have migrated to Congress.gov.  There is an About Executive Communications page that provides more detail about the scope of coverage, searching, viewing, and obtaining copies.

Not to mention a new video “help” series, Legislative Subject Terms and Popular and Short Titles.

All good and from one of the few government institutions that merits respect, the Library of Congress.

Why the “Cause to Celebrate?”

This is an excellent start and certainly Congress.gov has shown itself to be far more responsive to user requests than vendors are to reports of software vulnerabilities.

But we are still at the higher level of data, legislation, regulations, etc.

Where needs to follow is a dive downward to identify who obtains the benefits of legislation/regulations? Who obtains permits, for what and at what market value? Who obtains benefits, credits, allowances? Who wins contracts and where does that money go as it tracks down the prime contractor -> sub-prime contractor -> etc. pipeline?

It is ironic that when candidates for president talk about tax reform they tend to focus on the tax tables. Which are two (2) pages out of the current 6,455 pages of the IRC (in pdf, http://uscode.house.gov/download/releasepoints/us/pl/114/51/pdf_usc26@114-51.zip).

Knowing who benefits and by how much for the rest of the pages of the IRC isn’t going to make government any cleaner.

But, when paired with campaign contributions, it will give everyone an even footing on buying favors from the government.

Not unlike public disclosure enables a relatively fair stock exchange, in the case of government it will enable relative fairness in corruption.

August 21, 2015

Disclosing Government Contracts

Filed under: Government,Government Data,Public Data,Transparency — Patrick Durusau @ 4:37 pm

The More the Merrier? How much information on government contracts should be published and who will use it by Gavin Hayman.

From the post:

A huge bunch of flowers to Rick Messick for his excellent post asking two key questions about open contracting. And some luxury cars, expensive seafood and a vat or two of cognac.

Our lavish offerings all come from Slovakia, where in 2013 the Government Public Procurement Office launched a new portal publishing all its government contracts. All these items were part of the excessive government contracting uncovered by journalists, civil society and activists. In the case of the flowers, teachers investigating spending at the Department of Education uncovered florists’ bills for thousands of euros. Spending on all of these has subsequently declined: a small victory for fiscal probity.

The flowers, cars, and cognac help to answer the first of two important questions that Rick posed: Will anyone look at contracting information? In the case of Slovakia, it is clear that lowering the barriers to access information did stimulate some form of response and oversight.

The second question was equally important: “How much contracting information should be disclosed?”, especially in commercially sensitive circumstances.

These are two of key questions that we have been grappling with in our strategy at the Open Contracting Partnership. We thought that we would share our latest thinking below, in a post that is a bit longer than usual. So grab a cup of tea and have a read. We’ll be definitely looking forward to your continued thoughts on these issues.

Not a short read so do grab some coffee (outside of Europe) and settle in for a good read.

Disclosure: I’m financially interested in government disclosure in general and contracts in particular. With openness there comes more effort to conceal semantics and increase the need for topic maps to pierce the darkness.

I don’t think openness reduces the amount of fraud and misconduct in government, it only gives an alignment between citizens and the career interests of a prosecutor a sporting chance to catch someone out.

Disclosure should be as open as possible and what isn’t disclosed voluntarily, well, one hopes for brave souls who will leak the remainder.

Support disclosure of government contracts and leakers of the same.

If you need help “connecting the dots,” consider topic maps.

July 15, 2015

data.parliment.uk (beta)

Filed under: Government,Government Data — Patrick Durusau @ 4:58 pm

data.parliment.uk (beta)

From the announcement post:

In February this year we announced that we will be iteratively improving the user experience. Today we are launching the new Beta site. There are many changes and we hope you will like them.

  • Dataset pages have been greatly simplified so that you can get to your data within two clicks.
  • We have re-written many of the descriptions to simply explanations.
  • We have launched explore.data.parliament.uk which is aimed at non-developers to search and then download data.
  • We have also greatly improved and revised our API documentation. For example have a look here
  • We have added content from our blog and twitter feeds into the home page and I hope you agree that we are now presenting a more cohesive offering.

We are still working on datasets, and those in the pipeline waiting for release imminently are

  • Bills meta-data for bills going through Parliamentary process.)
  • Commons Select Committee meta-data.
  • Deposited Papers
  • Lords Attendance data

Let us know what you think.

There could be some connection between what the government says publicly and what it does privately. As they say, “anything is possible.”

Curious, what do you make of the Thesaurus?

Typing the “related” link to say how they are related would be a step in the right direction. Apparently there is an organization with the title: “‘Sdim Curo Plant!” (other sources report Welsh for “Children are Unbeatable”.) Which turns out to be the preferred label.

The entire set has 107,337 records and can be downloaded, albeit in 500 record chunks. That should improve over time according to: Downloading data from data.parliment.

I have always been interested in what terms other people use and this looks like an interesting data set, that is part of a larger interesting data set.

Enjoy!

July 13, 2015

Nominations by the U.S. President

Filed under: Government,Government Data,Politics — Patrick Durusau @ 3:42 pm

Nominations by the U.S. President

The Library of Congress created this resource which enables you to search for nominations by U.S. Presidents starting in 1981. There information about the nomination process, the records and related nomination resources at About Nominations of the U.S. Congress.

Unfortunately I did not find a link to bulk data for presidential nominations nor an API for the search engine behind this webpage.

I say that because matching up nominees and/or their sponsors with campaign contributions would help get a price range on becoming the ambassador to Uraguay, etc.

I wrote to Ask a Law Librarian to check on the status of bulk data and/or an API. Will amend this post when I get a response.

Oh, there will be a response. For all the ills and failures of the U.S. government, which are legion, it is capable of assembling vast amounts of information and training people to perform research on it. Not in every case but if it falls within the purview of the Law Library of Congress, I am confident of a useful answer.

June 24, 2015

World Factbook 2015 (paper, online, downloadable)

Filed under: Geography,Government,Government Data — Patrick Durusau @ 3:22 pm

World Factbook 2015 (GPO)

From the webpage:

The Central Intelligence Agency’s World Factbook provides brief information on the history, geography, people, government, economy, communications, transportation, military, and transnational issues for 267 countries and regions around world.

The CIA’s World Factbook also contains several appendices and maps of major world regions, which are located at the very end of the publication. The appendices cover abbreviations, international organizations and groups, selected international environmental agreements, weights and measures, cross-reference lists of country and hydrographic data codes, and geographic names.

For maps, it provides a country map for each country entry and a total of 12 regional reference maps that display the physical features and political boundaries of each world region. It also includes a pull-out Flags of the World, a Physical Map of the World, a Political Map of the World, and a Standard Time Zones of the World map.

Who should read The World Factbook? It is a great one-stop reference for anyone looking for an expansive body of international data on world statistics, and has been a must-have publication for:

  • US Government officials and diplomats
  • News organizations and researchers
  • Corporations and geographers
  • Teachers, professors, librarians, and students
  • Anyone who travels abroad or who is interested in foreign countries

The print version is $89.00 (U.S.), is 923 pages long and weighs in at 5.75 lb. in paperback.

A convenient and frequently updated alternative is the online CIA World Factbook.

I can’t compare the two versions because I am not going to spend $89.00 for an arm wrecker. 😉

You can also download a copy of the HTML version.

I downloaded and unzipped the file, only to find that the last update was in June, 2014.

That may be updated soon or it may not. I really don’t know.

If you just need background information that is unlikely to change or you want to avoid surveillance on what countries you look at and for how long, download the 2014 HTML version or pony up for the 2015 paper version.

« Newer PostsOlder Posts »

Powered by WordPress