Archive for the ‘Government Data’ Category

Semantics and Delivery of Useful Information [Bills Before the U.S. House]

Monday, October 21st, 2013

Lars Marius Garshol pointed out in Semantic Web adoption and the users the question of “What do semantic technologies do better than non-semantic technologies?” has yet to be answered.

Tim O’Reilly tweeted about Madison Federal today, a resource that raises the semantic versus non-semantic technology question.

In a nutshell, Madison Federal has all the bills pending before the U.S. House of Representatives online.

If you login with Facebook, you can:

  • Add a bill edit / comment
  • Enter a community suggestion
  • Enter a community comment
  • Subscribe to future edits/comments on a bill

So far, so good.

You can pick any bill but the one I chose as an example is: Postal Executive Accountability Act.

I will quote just a few lines of the bill:

2. Limits on executive pay

    (a) Limitation on compensation Section 1003 of title 39, United States Code,
         is amended:

         (1) in subsection (a), by striking the last sentence; and
         (2) by adding at the end the following:

                  (1) Subject to paragraph (2), an officer or employee of the Postal
                      Service may not be paid at a rate of basic pay that exceeds
                      the rate of basic pay for level II of the Executive Schedule
                      under section 5312 of title 5.

What would be the first thing you want to know?

Hmmm, what about subsection (a) of title 39 of the United States Code since we are striking the last sentence?

39 USC § 1003 – Employment policy [Legal Information Institute], which reads:

(a) Except as provided under chapters 2 and 12 of this title, section 8G of the Inspector General Act of 1978, or other provision of law, the Postal Service shall classify and fix the compensation and benefits of all officers and employees in the Postal Service. It shall be the policy of the Postal Service to maintain compensation and benefits for all officers and employees on a standard of comparability to the compensation and benefits paid for comparable levels of work in the private sector of the economy. No officer or employee shall be paid compensation at a rate in excess of the rate for level I of the Executive Schedule under section 5312 of title 5.

OK, so now we know that (1) is striking:

No officer or employee shall be paid compensation at a rate in excess of the rate for level I of the Executive Schedule under section 5312 of title 5.

Semantics? No, just a hyperlink.

For the added text, we want to know what is meant by:

… rate of basic pay that exceeds the rate of basic pay for level II of the Executive Schedule under section 5312 of title 5.

The Legal Information Institute is already ahead of Congress because their system provides the hyperlink we need: 5312 of title 5.

If you notice something amiss when you follow that link, congratulations! You have discovered your first congressional typo and/or error.

5312 of title 5 defines Schedule I of the Executive Schedule, which includes the Secretary of State, Secretary of the Treasury, Secretary of Defense, Attorney General and others. Base rate for Executive Schedule Level I is $199,700.

On the other hand, 5313 of title 5 defines Schedule II of the Executive Schedule, which includes Department of Agriculture, Deputy Secretary of Agriculture; Department of Defense, Deputy Secretary of Defense, Secretary of the Army, Secretary of the Navy, Secretary of the Air Force, Under Secretary of Defense for Acquisition, Technology and Logistics; Department of Education, Deputy Secretary of Education; Department of Energy, Deputy Secretary of Energy and others. Base rate for Executive Schedule Level II is $178,700.

Assuming someone catches or comments that 5312 should be 5313, top earners at the Postal Service may be about to take a $21,000.00 pay reduction.

We got all that from mechanical hyperlinks, no semantic technology required.

Where you might need semantic technology is when reading 39 USC § 1003 – Employment policy [Legal Information Institute] where it says (in part):

…It shall be the policy of the Postal Service to maintain compensation and benefits for all officers and employees on a standard of comparability to the compensation and benefits paid for comparable levels of work in the private sector of the economy….

Some questions:

Question: What are “comparable levels of work in the private sector of the economy?”

Question: On what basis is work for the Postal Service compared to work in the private economy?

Question: Examples of comparable jobs in the private economy and their compensation?

Question: What policy or guideline documents have been developed by the Postal Service for evaluation of Postal Service vs. work in the private economy?

Question: What studies have been done, by who, using what practices, on comparing compensation for Postal Service work to work in the private economy?

That would be a considerable amount of information with what I suspect would be a large amount of duplication as reports or studies are cited by numerous sources.

Semantic technology would be necessary for the purpose of deduping and navigating such a body of information effectively.

Pick a bill. Where would you put the divide between mechanical hyperlinks and semantic technologies?

PS: You may remember that the House of Representatives had their own “post office” which they ran as a slush fund. The thought of the House holding someone “accountable” is too bizarre for words.

G8 countries must work harder to open up essential data

Monday, June 17th, 2013

G8 countries must work harder to open up essential data by Rufus Pollock.

From the post:

Open data and transparency will be one of the three main topics at the G8 Summit in Northern Ireland next week. Today transparency campaigners released preview results from the global Open Data Census showing that G8 countries still have a long way to go in releasing essential information as open data.

The Open Data Census is run by the Open Knowledge Foundation, with the help of a network of local data experts around the globe. It measures the openness of data in ten key areas including those essential for transparency and accountability (such as election results and government spending data), and those vital for providing critical services to citizens (such as maps and transport timetables). Full results for the 2013 Open Data Census will be released later this year.

open data

The preview results show that while both the UK and the US (who top the table of G8 countries) have made significant progress towards opening up key datasets, both countries still have work to do. Postcode data, which is required for almost all location-based applications and services, remains a major issue for all G8 countries except Germany. No G8 country scored the top mark for company registry data. Russia is the only G8 country not to have published any of the information included in the census as open data. The full results for G8 countries are online at:

Apologies for the graphic, it is too small to read. See the original post for a more legible version.

The U.S. came in first with a score of 54 out of a possible 60.

I assume this evaluation was done prior the the revelation of the NSA data snooping?

The U.S. government has massive collections of data that not only isn’t visible, its existence is denied.

How is that for government transparency?

The most disappointing part is that other major players, China, Russia, you take your pick, has largely the small secret data as the United States. Probably not full sets of the day to day memos but the data that really counts, they all have.

So, who is it they are keeping information from?

Ah, that would be their citizens.

Who might not approve of their privileges, goals, tactics, and favoritism.

For example, despite the U.S. government’s disapproval/criticism of many other countries (or rather their governments), I can’t think of any reason for me to dislike unknown citizens of another country.

Whatever goals the U.S. government is pursuing in disadvantaging citizens of another country, it’s not on my behalf.

If the public knew who was benefiting from U.S. policy, perhaps new officials would change those policies.

But that isn’t the goal of the specter of government transparency that the United States leads.

The Banality of ‘Don’t Be Evil’

Monday, June 3rd, 2013

The Banality of ‘Don’t Be Evil’ by Julian Assange.

From the post:

“THE New Digital Age” is a startlingly clear and provocative blueprint for technocratic imperialism, from two of its leading witch doctors, Eric Schmidt and Jared Cohen, who construct a new idiom for United States global power in the 21st century. This idiom reflects the ever closer union between the State Department and Silicon Valley, as personified by Mr. Schmidt, the executive chairman of Google, and Mr. Cohen, a former adviser to Condoleezza Rice and Hillary Clinton who is now director of Google Ideas.

The authors met in occupied Baghdad in 2009, when the book was conceived. Strolling among the ruins, the two became excited that consumer technology was transforming a society flattened by United States military occupation. They decided the tech industry could be a powerful agent of American foreign policy.

The book proselytizes the role of technology in reshaping the world’s people and nations into likenesses of the world’s dominant superpower, whether they want to be reshaped or not. The prose is terse, the argument confident and the wisdom — banal. But this isn’t a book designed to be read. It is a major declaration designed to foster alliances.

“The New Digital Age” is, beyond anything else, an attempt by Google to position itself as America’s geopolitical visionary — the one company that can answer the question “Where should America go?” It is not surprising that a respectable cast of the world’s most famous warmongers has been trotted out to give its stamp of approval to this enticement to Western soft power. The acknowledgments give pride of place to Henry Kissinger, who along with Tony Blair and the former C.I.A. director Michael Hayden provided advance praise for the book.

In the book the authors happily take up the white geek’s burden. A liberal sprinkling of convenient, hypothetical dark-skinned worthies appear: Congolese fisherwomen, graphic designers in Botswana, anticorruption activists in San Salvador and illiterate Masai cattle herders in the Serengeti are all obediently summoned to demonstrate the progressive properties of Google phones jacked into the informational supply chain of the Western empire.


I am less concerned with privacy and more concerned with the impact of technological imperialism.

I see no good coming from the infliction of Western TV and movies on other cultures.

Or in making local farmers part of the global agriculture market.

Or infecting Iraq with sterile wheat seeds.

Compared to those results, privacy is a luxury of the bourgeois who worry about such issues.

I first saw this at Chris Blattman’s Links I liked.

CIA, Solicitation and Government Transparency

Monday, June 3rd, 2013

IBM battles Amazon over $600M CIA cloud deal by Frank Konkel, reports that IBM has protested a contract award for cloud computing by the CIA to Amazon.

The “new age” of government transparency looks a lot like the old age in that:

  • How Amazon obtained the award is not public.
  • The nature of the cloud to be built by Amazon is not public.
  • Whether Amazon has started construction on the proposed cloud is not public.
  • The basis for the protest by IBM is not public.

“Not public” means opportunities for incompetence in contract drafting and/or fraud by contractors.

How are members of the public or less well-heeled potential bidders suppose to participate in this discussion?

Or should I say “meaningfully participate” in the discussion over the cloud computing award to Amazon?

And what if others know the terms of the contract? CIA CTO Gus Hunt is reported as saying:

It is very nearly within our grasp to be able to compute on all human generated information,

If the proposed system is supposed to “compute on all human generated information,” so what?

How does knowing that aid any alleged enemies of the United States?

Other than the comfort that the U.S. makes bad technology decisions?

Keeping the content of such a system secret might disadvantage enemies of the U.S.

Keeping the contract for such a system secret disadvantages the public and other contractors.


White House Releases New Tools… [Bank Robber's Defense]

Sunday, June 2nd, 2013

White House Releases New Tools For Digital Strategy Anniversary by Caitlin Fairchild.

From the post:

The White House marked the one-year anniversary of its digital government strategy Thursday with a slate of new releases, including a catalog of government APIs, a toolkit for developing government mobile apps and a new framework for ensuring the security of government mobile devices.

Those releases correspond with three main goals for the digital strategy: make more information available to the public; serve customers better; and improve the security of federal computing.

Just scanning down the API list, it is a very mixed bag.

For example, there are four hundred and ten (410) individual APIs listed, the National Library of Medicine has twenty-four (24) and the U.S. Senate has one (1).

Defenders of this release will say we should not talk about the lack of prior efforts but focus on what’s coming.

I call that the bank robber’s defense.

All prosecutors want to talk about is what a bank robber did in the past. They never want to focus on the future.

Bank robbers would love to have the “let’s talk about tomorrow” defense.

As far as I know, it isn’t allowed anywhere.

Question: Why do we allow avoidance of responsibility with the “let’s talk about tomorrow” defense for government and others?

If you review the APIs for semantic diversity I would appreciate a pointer to your paper/post.


Saturday, May 25th, 2013

Cato’s “Deepbills” Project Advances Government Transparency by Jim Harper.

From the post:

But there’s no sense in sitting around waiting for things to improve. Given the incentives, transparency is something that we will have to force on government. We won’t receive it like a gift.

So with software we acquired and modified for the purpose, we’ve been adding data to the bills in Congress, making it possible to learn automatically more of what they do. The bills published by the Government Printing Office have data about who introduced them and the committees to which they were referred. We are adding data that reflects:

- What agencies and bureaus the bills in Congress affect;

- What laws the bills in Congress effect: by popular name, U.S. Code section, Statutes at Large citation, and more;

- What budget authorities bills include, the amount of this proposed spending, its purpose, and the fiscal year(s).

We are capturing proposed new bureaus and programs, proposed new sections of existing law, and other subtleties in legislation. Our “Deepbills” project is documented at

This data can tell a more complete story of what is happening in Congress. Given the right Web site, app, or information service, you will be able to tell who proposed to spend your taxpayer dollars and in what amounts. You’ll be able to tell how your member of Congress and senators voted on each one. You might even find out about votes you care about before they happen!

Two important points:

First, transparency must be forced upon government (I would add businesses).

Second, transparency is up to us.

Do you know something the rest of us should know?

On your mark!

Get set!


I first saw this at: Harper: Cato’s “Deepbills” Project Advances Government Transparency.

US rendition map: what it means, and how to use it

Wednesday, May 22nd, 2013

US rendition map: what it means, and how to use it by James Ball.

From the post:

The Rendition Project, a collaboration between UK academics and the NGO Reprieve, has produced one of the most detailed and illuminating research projects shedding light on the CIA’s extraordinary rendition project to date. Here’s how to use it.

Truly remarkable project to date, but could be even more successful with your assistance.

Not likely that any of the principals will wind up in the dock at the Hague.

On the other hand, exposing their crimes may deter others from similar adventures.

U.S. Senate Panel Discovers Nowhere Man [Apple As Tax Dodger]

Monday, May 20th, 2013

Forty-seven years after Nowhere Man by the Beatles, a U.S. Senate panel discovers several nowhere men.

A Wall Street Journal Technology Alert:

Apple has set up corporate structures that have allowed it to pay little or no corporate tax–in any country–on much of its overseas income, according to the findings of a U.S. Senate examination.

The unusual result is possible because the iPhone maker’s key foreign subsidiaries argue they are residents of nowhere, according to the investigators’ report, which will be discussed at a hearing Tuesday where Apple CEO Tim Cook will testify. The finding comes from a lengthy investigation into the technology giant’s tax practices by the Senate Permanent Subcommittee on Investigations, led by Sens. Carl Levin (D., Mich.) and John McCain (R., Ariz.).

In additional coverage, Apple says:

Apple’s testimony also includes a call to overhaul: “Apple welcomes an objective examination of the US corporate tax system, which has not kept pace with the advent of the digital age and the rapidly changing global economy.”

Tax reform will be useful only if “transparent” tax reform.

Transparent tax reform mean every provision with more than a $100,000 impact on any taxpayer, names all the taxpayers impacted. Whether more or less taxes.

We have the data, we need the will to apply the analysis.

A tax-impact topic map anyone?

UNESCO Publications and Data (Open Access)

Sunday, May 19th, 2013

UNESCO to make its publications available free of charge as part of a new Open Access policy

From the post:

The United Nations Education Scientific and Cultural Organisation (UNESCO) has announced that it is making available to the public free of charge its digital publications and data. This comes after UNESCO has adopted an Open Access Policy, becoming the first agency within the United Nations to do so.

The new policy implies that anyone can freely download, translate , adapt, and distribute UNESCO’s publications and data. The policy also states that from July 2013, hundreds of downloadable digital UNESCO publications will be available to users through a new Open Access Repository with a multilingual interface. The policy seeks also to apply retroactively to works that have been published.

There’s a treasure trove of information for mapping, say against the New York Times historical archives.

If presidential libraries weren’t concerned with helping former administration officials avoid accountability, digitizing presidential libraries for complete access, would be another great treasure trove.

Open Data and Wishful Thinking

Saturday, May 18th, 2013

BLM Fracking Rule Violates New Executive Order on Open Data by Sofia Plagakis.

From the post:

Today, the U.S. Department of the Interior’s Bureau of Land Management (BLM) released its revised proposed rule for natural gas drilling (commonly referred to as fracking) on federal and tribal lands. The much-anticipated rule violates President Obama’s recently issued executive order that requires new government information to be made available to the public in open, machine-readable formats.

Last week, President Obama signed an executive order requiring that all newly generated public data be pushed out in open, machine-readable formats. Concurrently, the Office of Management and Budget (OMB) and the Office of Science and Technology Policy (OSTP) released an Open Data Policy designed to make previously unavailable government data accessible to entrepreneurs, researchers, and the public.

The executive order and accompanying policy must have been in development for months, and agencies, including BLM, should have been fully aware of the new policy. But instead of establishing a modern example of government information collection and sharing, BLM’s proposed rule would allow drilling companies to report the chemicals used in fracking to a third-party, industry-funded website, called, which does not provide data in machine-readable formats. only allows users to download PDF files of reports on fracked wells. Because PDF files are not machine-readable, the site makes it very difficult for the public to use and analyze data on wells and chemicals that the government requires companies to collect and make available.

I wonder if Sofia simply overlooked:

When implementing the Open Data Policy, agencies shall incorporate a full analysis of privacy, confidentiality, and security risks into each stage of the information lifecycle to identify information that should not be released. These review processes should be overseen by the senior agency official for privacy. It is vital that agencies not release information if doing so would violate any law or policy, or jeopardize privacy, confidentiality, or national security. [From “We won’t get fooled again…”]

Or if her “…requires new government information to be made available to the public in open, machine-readable formats” is wishful thinking?

The Obama just released the Benghazi emails in PDF format. So we have an example of the Whitehouse violating its own “open data” policy.

We don’t need more “open data.”

What we need are more leakers. A lot more leakers.

Just be sure you leak or pass on leaks in “open, machine-readable formats.”

The foreign adventures, environmental pollution, failures in drug or food safety, etc., avoided by leaks may save your life, the lives of your children or grandchildren.

Leak today!

Open Government and Benghazi Emails

Thursday, May 16th, 2013

The controversy over the “Benghazi emails” is a good measure of what the Obama Administration means by “open government.”

News of the release of the Benghazi emails broke yesterday, NPR, USA Today, among others.

I saw the news at Benghazi Emails Released, Wall Street Journal. PDF of the emails

If you go to and search for “Benghazi emails,” can you find the White House release of the emails?

I thought not.

The emails show congressional concern over the “talking points” on Benghazi to be a tempest in a teapot, as many of us already suspected.

Early release of the emails would have avoided some of the endless discussion rooted in congressional ignorance and bigotry.

But, the Obama administration has so little faith in “open government” that it conceals information that would be to its advantage if revealed.

Now imagine how the Obama administration must view information that puts it at a disadvantage.

Does that help to clarify the commitment of the Obama administration to open government?

It does for me.

Search Nonprofit Tax Forms

Friday, May 10th, 2013

ProPublica Launches Online Tool to Search Nonprofit Tax Forms by Doug Donovan.

From the post:

The investigative-journalism organization ProPublica started a free online service today for searching the federal tax returns of more than 615,000 nonprofits.

ProPublica began building its Nonprofit Explorer tool on its Web site shortly after the Internal Revenue Service announced in April that it was making nonprofit tax returns available in a digital, searchable format.

ProPublica’s database provides nonprofit Form 990 information free back to 2001, including executive compensation, total revenue, and other critical financial data

Scott Klein, editor of news applications at ProPublica, said Nonprofit Explorer is not meant to replace GuideStar, the most familiar online service for searching nonprofit tax forms. Many search results on Nonprofit Explorer also offer links to GuideStar data.

“They have a much richer tool set,” Mr. Klein said.

For now, Nonprofit Explorer does not include the tax forms filed by private foundations but is expected to do so in a future update.

I guess copy limitations prevented reporting the URL for the ProPublica’s Nonprofit Explorer.

Another place to look for smoke even if you are unlikely to find fire.

“We won’t get fooled again…”

Friday, May 10th, 2013

Landmark Steps to Liberate Open Data

There is no shortage of discussion of President Obama’s executive order that is alleged to result in greater access to government data.

Except then you read:

Agencies shall implement the requirements of the Open Data Policy and shall adhere to the deadlines for specific actions specified therein. When implementing the Open Data Policy, agencies shall incorporate a full analysis of privacy, confidentiality, and security risks into each stage of the information lifecycle to identify information that should not be released. These review processes should be overseen by the senior agency official for privacy. It is vital that agencies not release information if doing so would violate any law or policy, or jeopardize privacy, confidentiality, or national security.

Gee, I wonder who is going to decide what information gets released?

How would we know when “open data” efforts succeed?

Here’s my test: When ordinary citizens can mine open data and their complaints result in the arrest and conviction of public officials or government staff.

Unless and until that sort of information is public data, you are being distracted from important data by platitudes and flattery.

Free Government Data… [Handicapping Congress?]

Wednesday, May 8th, 2013

Free Government Data: Access Sunlight Foundation APIs on a New Data Services Site by Liz Bartolomeo.

From the post:

The Sunlight Foundation is expanding its free data services with a new website – – to access our open government APIs. We offer APIs (a.k.a. application programming interfaces) for a number of our projects and tools and support a community of developers who create their own projects using this data.

Nonprofit organizations, political campaigns and media outlets use our collection of APIs, which cover topics such as the Congressional Record, lobbying records and state legislation. More than 7,000 people have registered for an API key, resulting in over 735 million API calls to date. Greenpeace uses congressional information available through Sunlight APIs on its activist tools, and the Wikimedia Foundation used Sunlight APIs to help people connect with their lawmakers in Congress during the SOPA debate last year. Those using Sunlight APIs run across the political spectrum, from the Obama-Biden campaign to the Tea Party Patriots.

From the API page:

Capitol Words API

The Capitol Words API is an API allowing access to the word frequency count data powering the Capitol Words project.

Congress API v3 API

A live JSON API for the people and work of Congress. Information on legislators, districts, committees, bills, votes, as well as real-time notice of hearings, floor activity and upcoming bills.

Influence Explorer API

The Influence Explorer API gives programmers and journalists the ability to easily create subsets of large data for their own research and development purposes. The API currently offers campaign contributions and lobbying records with more data sets coming soon.

Open States API

Information on the legislators and activities of all 50 state legislatures, Washington, D.C. and Puerto Rico.

Political Party Time API

Provides access to the underlying, raw data that the Sunlight Foundation creates based on fundraising invitations collected in Party Time. As we enter information on new invitations, the database updates automatically.

Commercial opportunity: The Sunlight Foundation data is a start towards public handicapping of members of Congress for votes on legislation.

Povcalnet – World Bank Poverty Stats

Sunday, May 5th, 2013

DIY: Measuring Global, Regional Poverty Using PovcalNet, the Online Computational Tool behind the World Bank’s Poverty Statistics by Shaohua Chen.

I’m surprised some Republican in the U.S. House or Senate isn’t citing Povcalnet as evidence there is no poverty in the United States.

The trick of course is in how you define “poverty.”

The World Bank uses $1, $1.25 and $2.00 a day as poverty lines.

While there is widespread global hunger and disease, is income sufficient to participate in the global economy really the best measure for poverty?

If the documentaries are to be believed, there are tribes of Indians who live in the rain forests of Brazil, quite healthily, without any form of money at all.

They are not buying iPods with foreign music to replace their own but that isn’t being impoverished. Is it?

There is the related issue that someone else is classifying people as impoverished.

I wonder how they would classify themselves?

Statistics could be made more transparent through the use of topic maps.

Spring Cleaning Data: 1 of 6… [Federal Reserve]

Tuesday, April 9th, 2013

Spring Cleaning Data: 1 of 6 – Downloading the Data & Opening Excel Files

From the post:

With spring in the air, I thought it would be fun to do a series on (spring) cleaning data. The posts will follow my efforts to to download the data, import into R, cleaned it up, merge the different files, add columns of information created, and then a master file exported. During the process I will be offering at times different ways to do things, this is an attempt to show how there is no one way of doing something, but there are several. When appropriate I will demonstrate as many as I can think of, given the data.

This series of posts will be focusing on the Discount Window of the Federal Reserve. I know I seem to be picking on the Feds, but I am genuinely interested in what they have. The fact that there is data on the discount window is, to be blunt, took legislation from congress to get. The first step in this project was to find the data. The data and additional information can be downloaded here.

I don’t have much faith in government data but if you are going to debate on the “data,” such as it is, you will need to clean it up and combine it with other data.

This is a good start in that direction for data from the Federal Reserve.

If you are interested in data from other government agencies, publishing the steps needed to clean/combine their data would move everyone forward.

A topic map of cleaning directions for government data could be a useful tool.

Not that clean data = government transparency but it might make it easier to spot the shadows.

Splitting a Large CSV File into…

Monday, April 8th, 2013

Splitting a Large CSV File into Separate Smaller Files Based on Values Within a Specific Column by Tony Hirst.

From the post:

One of the problems with working with data files containing tens of thousands (or more) rows is that they can become unwieldy, if not impossible, to use with “everyday” desktop tools. When I was Revisiting MPs’ Expenses, the expenses data I downloaded from IPSA (the Independent Parliamentary Standards Authority) came in one large CSV file per year containing expense items for all the sitting MPs.

In many cases, however, we might want to look at the expenses for a specific MP. So how can we easily split the large data file containing expense items for all the MPs into separate files containing expense items for each individual MP? Here’s one way using a handy little R script in RStudio

Just because data is “open,” doesn’t mean it will be easy to use. (Leaving the useful question to one side.)

We have been kicking around idea for a “killer” topic map application.

What about a plug-in for a browser that recognizes file types and suggests tools for processing them?

I am unlikely to remember this post a year from now when I have a CSV file from some site.

But if a browser plugin recognized the extension, .csv, and suggested a list of tools for exploring it….

Particularly if the plug-in called upon some maintained site of tools, so the list of tools is maintained.

Or for that matter, that it points to other data explorers who have examined the same file (voluntary disclosure).

Not the full monty of topic maps but a start towards collectively enhancing our experience with data files.

USPTO – New Big Data App [Value-Add Opportunity]

Monday, April 1st, 2013

U.S. Patent and Trademark Office Launches New Big Data Application on MarkLogic®

From the post:

Real-Time, Granular, Online Access to Complex Manuals Improves Efficiency and Transparency While Reducing Costs

MarkLogic Corporation, the provider of the MarkLogic® Enterprise NoSQL database, today announced that the U.S. Patent and Trademark Office (USPTO) has launched the Reference Document Management Service (RDMS), which uses MarkLogic for real-time searching of detailed, specific, up-to-date content within patent and trademark manuals. RDMS enables real-time search of the Manual of Patent Examining Procedure (MPEP) and the Trademark Manual of Examination Procedures (TMEP). These manuals provide a vital window into the complexities of U.S. patent and trademark laws for inventors, examiners, businesses, and patent and government attorneys.

The thousands of examiners working for USPTO need to be able to quickly locate relevant instructions and procedures to assist in their examinations. The RDMS is enabling faster, easier searches for these internal users.

Having the most current materials online also means that the government can reduce reliance on printed manuals that quickly go out of date. USPTO can also now create and publish revisions to its manuals more quickly, allowing them to be far more responsive to changes in legislation.

Additionally, for the first time ever, the tool has also been made available to the public increasing the MPEP and TMEP accessibility globally, furthering the federal government’s efforts to promote transparency and accountability to U.S. citizens. Patent creators and their trusted advisors can now search and reference the same content as the USPTO examiners, in real time — instead of having to thumb through a printed reference guide.

The date on this report was March 26, 2013.

I don’t know if the USPTO is just playing games but searching their site for “Reference Document Management Service” produces zero “hits.”

Searching for “RDMS” produces four (4) “hits,” none of which were pointers to an interface.

Maybe it was too transparent?

The value-add proposition I was going to suggest was mapping the results of searching into some coherent presentation, like TaxMap.

And/or linking the results of searches into current literature in rapidly developing fields of technology.

Guess both of those opportunities will have to wait for basic searching to be available.

If you have a status update on this announced but missing project please ping me.

Open Data for Africa Launched by AfDB

Thursday, March 28th, 2013

Open Data for Africa Launched by AfDB

From the post:

The African Development Bank Group has recently launched the ‘Open Data for Africa‘ as part of the bank’s goal to improve data management and dissemination in Africa. The Open Data for Africa is a user friendly tool for extracting data, creating and sharing own customized reports, and visualising data across themes, sectors and countries in tables, charts and maps. The platform currently holds data from 20 African countries : Algeria, Cameroon, Cape Verde, Democratic Republic of Congo, Ethiopia, Malawi, Morocco, Mozambique, Namibia, Nigeria, Ghana, Rwanda, Republic of Congo, Senegal, South Africa, South Sudan, Tanzania, Tunisia, Zambia and Zimbabwe.

Not a lot of resources but a beginning.

One trip to one country isn’t enough to form an accurate opinion of a continent but I must report my impression of South Africa from several years ago.

I was at a conference with mid-level government and academic types for a week.

In a country where “child head of household” is a real demographic category, I came away deeply impressed with the optimism of everyone I met.

You can just imagine the local news in the United States and/or Europe if a quarter of the population was dying.

Vows of to “…never let this happen again…,” blah, blah, would choke the channels.

Not in South Africa. They readily admit to having a variety of serious issues but are equally serious about developing ways to meet those challenges.

If you want to see optimism in the face of stunning odds, I would strongly recommend a visit.

Lobbyists 2012: Out of the Game or Under the Radar?

Sunday, March 24th, 2013

Lobbyists 2012: Out of the Game or Under the Radar?

Executive Summary:

Over the past several years, both spending on lobbying and the number of active lobbyists has declined. A number of factors may be responsible, including the lackluster economy, a gridlocked Congress and changes in lobbying rules.

CRP finds that the biggest players in the influence game — lobbying clients across nearly all sectors — increased spending over the last five years. The top 100 lobbying firms income declined only 6 percent between 2007 and 2012 but the number of registered lobbyists dropped by 25 percent.

The more precipitous drop in the number of lobbyists is likely due to changes in the rules. More than 46 percent of lobbyists who were active in 2011 but not in 2012 continue to work for the same employers, suggesting that many have simply avoided the reporting limits while still contributing to lobbying efforts.

Whatever the cause, it is important to understand whether the same activity continues apace with less disclosure and to strengthen the disclosure regimen to ensure that it is clear, enforceable — and enforced. If there is a general sense that the rules don’t matter, there could be erosion to disclosure and a sense that this is an “honor system” that isn’t being honored any longer. This is important because, if people who are in fact lobbying do not register, citizens will be unable to understand the forces at work in shaping federal policy, and therefore can’t effectively participate in policy debates and counter proposals that are not in their interest. At a minimum, the Center for Responsive Politics will continue to aggregate, publish and scrutinize the data that is being reported, in order to explain trends in disclosure — or its omission.

A caution on relying on public records/disclosure for topic maps of political influence.

You can see the full report here.

My surprise was the discovery that:

[the] “honor system” that isn’t being honored any longer.

Lobbying for private advantage at public expense is contrary to any notion of “honor.”

Why the surprise that lobbyists are dishonorable? (However faithful they may be to their employers. Once bought, they stay bought.)

I first saw this at Full Text Reports.

Open Data: The World Bank Data Blog

Wednesday, March 20th, 2013

Open Data: The World Bank Data Blog

In case you are following open data/government issues, you will want to add this blog to your RSS feed.

Not a high traffic blog but with twenty-seven contributing authors, you get a diversity of viewpoints.

Not to mention that the World Bank is a great source for general data.

I persist in thinking that transparency means identifying individuals responsible for decisions, expenditures and the beneficiaries of those decisions and expenditures.

That isn’t a popular position among those who make decisions and approve expenditures for unidentified beneficiaries.

You will either have to speculate on your own or ask someone else why that is an unpopular position.

The Biggest Failure of Open Data in Government

Monday, March 18th, 2013

Many open data initiatives forget to include the basic facts about the government itself by Philip Ashlock.

From the post:

In the past few years we’ve seen a huge shift in the way governments publish information. More and more governments are proactively releasing information as raw open data rather than simply putting out reports or responding to requests for information. This has enabled all sorts of great tools like the ones that help us find transportation or the ones that let us track the spending and performance of our government. Unfortunately, somewhere in this new wave of open data we forgot some of the most fundamental information about our government, the basic “who”, “what”, “when”, and “where”.

US map

Do you know all the different government bodies and districts that you’re a part of? Do you know who all your elected officials are? Do you know where and when to vote or when the next public meeting is? Now perhaps you’re thinking that this information is easy enough to find, so what does this have to do with open data? It’s true, it might not be too hard to learn about the highest office or who runs your city, but it usually doesn’t take long before you get lost down the rabbit hole. Government is complex, particularly in America where there can be a vast multitude of government districts and offices at the local level.

How can we have a functioning democracy when we don’t even know the local government we belong to or who our democratically elected representatives are? It’s not that Americans are simply too ignorant or apathetic to know this information, it’s that the system of government really is complex. With what often seems like chaos on the national stage it can be easy to think of local government as simple, yet that’s rarely the case. There are about 35,000 municipal governments in the US, but when you count all the other local districts there are nearly 90,000 government bodies (US Census 2012) with a total of more than 500,000 elected officials (US Census 1992). The average American might struggle to name their representatives in Washington D.C., but that’s just the tip of the iceberg. They can easily belong to 15 government districts with more than 50 elected officials representing them.

We overlook the fact that it’s genuinely difficult to find information about all our levels of government. We unconsciously assume that this information is published on some government website well enough that we don’t need to include it as part of any kind of open data program

Yes, the number of subdivisions of government and the number of elected officials are drawn from two different census reports, the first from the 2012 census and the second from the 1992 census, a gap of twenty (20) years.

The Census bureau has the 1992 list, saying:

1992 (latest available) 1992 Census of Governments vol. I no. 2 [PDF, 2.45MB] * Report has been discontinued

Makes me curious why such a report would be discontinued?

A report that did not address the various agencies, offices, etc. that are also part of various levels of government.

Makes me think you need an “insider” and/or a specialist just to navigate the halls of government.

Philip’s post illustrates that “open data” dumps from government are distractions from more effective questions of open government.

Questions such as:

  • Which officials have authority over what questions?
  • How to effectively contact those officials?
  • What actions are under consideration now?
  • Rules and deadlines for comments on actions?
  • Hearing and decision calendars?
  • Comments and submissions by others?
  • etc.

It never really is “…the local board of education (substitute your favorite board) decided….” but “…member A, B, D, and F decided that….”

Transparency means not allowing people and their agendas to hide behind the veil of government.

“Mixed Messages” on Cybersecurity [China ranks #12 among cyber-attackers]

Thursday, March 14th, 2013

Do you remember the “mixed messages” Dibert cartoon?

Mixed Messages

Where an “honest” answer meant “mixed messages?”

I had that feeling this morning when I read: Mark Rockwell’s post: German telecom company provides real-time map of Cyber attacks.

From the post:

In hopes of blunting mounting electronic assaults, a German telecommunications carrier unveiled a free online capability that shows where Cyber attacks are happening around the world in real time.

Deutsche Telekom, parent company of T-Mobile, put up what it calls its “Security dashboard” portal on March 6. The map, said the company, is based on attacks on its purpose-built network of decoy “honeypot” systems at 90 locations worldwide

Deutsche Telekom said it launched the online portal at the CeBIT telecommunications trade show in Hanover, Germany, to increase the visibility of advancing electronic threats.

“New cyber attacks on companies and institutions are found every day. Deutsche Telekom alone records up to 450,000 attacks per day on its honeypot systems and the number is rising. We need greater transparency about the threat situation. With its security radar, Deutsche Telekom is helping to achieve this,” said Thomas Kremer, board member responsible for Data Privacy, Legal Affairs and Compliance.

Which has a handy chart of the sources of attacks over the last month:

Top 15 of Source Countries (Last month)

Source of Attack Number of Attacks
Russia Russian Federation 2,402,722
Taiwan, Province of China 907,102
Germany 780,425
Ukraine 566,531
Hungary 367,966
United States 355,341
Romania 350,948
Brazil 337,977
Italy 288,607
Australia 255,777
Argentina 185,720
China 168,146
Poland 162,235
Israel 143,943
Japan 133,908

By measured “attacks,” the geographic location of China (not the Chinese government) is #12 as an origin of cyber-attacks.

After Russia, Taiwan (Province of China), Germany, Ukraine, Hungary, United States, and others.

Just in case you missed several recent news cycles, the Chinese government was being singled out as a cyber-attacker for policy or marketing reasons that are not clear.

This service makes the specious nature of those accusations apparent, although the motivations behind the reports remains unclear.

Before you incorporate any government data or report into a topic map, you should verify the information with at least two or more independent sources.

Man Bites Dog Story (EU Interest Groups and Legislation)

Friday, March 8th, 2013

Interest groups and the making of legislation

From the post:

How are the activities of interest groups related to the making of legislation? Does mobilization of interest groups lead to more legislation in the future? Alternatively, does the adoption of new policies motivate interest groups to get active? Together with Dave Lowery, Brendan Carroll and Joost Berkhout, we tackle these questions in the case of the European Union. What we find is that there is no discernible signal in the data indicating that the mobilization of interest groups and the volume of legislative production over time are significantly related. Of course, absence of evidence is the same as the evidence of absence, so a link might still exist, as suggested by theory, common wisdom and existing studies of the US (e.g. here). But using quite a comprehensive set of model specifications we can’t find any link in our time-series sample. The abstract of the paper is below and as always you can find at my website the data, the analysis scripts, and the pre-print full text. One a side-note – I am very pleased that we managed to publish what is essentially a negative finding. As everyone seems to agree, discovering which phenomena are not related might be as important as discovering which phenomena are. Still, there are few journals that would apply this principle in their editorial policy. So cudos for the journal of Interest Groups and Advocacy.

Different perspectives on the role of organized interests in democratic politics imply different temporal sequences in the relationship between legislative activity and the influence activities of organized interests. Unfortunately, lack of data has greatly limited any kind of detailed examination of this temporal relationship. We address this problem by taking advantage of the chronologically very precise data on lobbying activity provided by the door pass system of the European Parliament and data on EU legislative activity collected from EURLEX. After reviewing the several different theoretical perspectives on the timing of lobbying and legislative activity, we present a time-series analysis of the co-evolution of legislative output and interest groups for the period 2005-2011. Our findings show that, contrary to what pluralist and neo-corporatist theories propose, interest groups neither lead nor lag bursts in legislative activity in the EU.

You can read an earlier version of the paper at: Timing is Everything? Organized Interests and the Timing of Legislative Activity. (I say earlier version because the title is the same but the abstract is slightly different.)

Just a post or so ago, Untangling algorithmic illusions from reality in big data, the point was made that biases in data collection can make a significant difference in results.

The “negative” finding in this paper is an example of that hazard.

From the paper:

The European Parliament maintains a door pass system for lobbyists. Everyone entering the Parliament’s premises as a lobbyist is expected to register on this list ….

Now there’s a serious barrier to any special interest group that wants to influence the EU Parliament!

Certainly no special interest group would be so devious and under-handed as to meet with members of the EU Parliament away from the Parliament’s premises.

Say, in exotic vacation spots/spas? Or at meetings of financial institutions? Or just in the normal course of their day to day affairs?

The U.S. registers lobbyists, but like the EU “hall pass” system, it is the public side of influence.

People with actual influence don’t have to rely on anything as crude as lobbyists to insure their goals are met.

The data you collect may exclude the most important data.

Unless it is your goal for it to be excluded, then carry on.

State Sequester Numbers [Is This Transparency?]

Wednesday, March 6th, 2013

A great visualization of the impact of sequestration state by state.

And, a post on the process followed to produce the visualization.

The only caveat being that one person read the numbers from PDF files supplied by the White House and another person typed them into a spreadsheet.

Doable with a small data set such as this one, but why was it necessary at all?

Once you have the data in a machine readable form, putting faces in the local community to the abstract categories should be the next step.

Topic maps anyone?

Transparency and the Digital Oil Drop

Tuesday, March 5th, 2013

I left off yesterday pointing out three critical failures in the Digital
 Accountability and Transparency 

Those failures were:

  • Undefined goals with unrealistic deadlines.
  • Lack of incentives for performance.
  • Lack of funding for assigned duties.

 Accountability and Transparency 
 Act) [DOA]

Make no mistake, I think transparency, particularly in government spending is very important.

Important enough that proposals for transparency should take it seriously.

In broad strokes, here is my alternative to the Digital Accountability and Transparency Act (DATA Act) proposal:

  • Ask the GAO, the federal agency with the most experience auditing other federal agencies, to prepare an estimate for:
    • Cost/Time for preparing a program internal to the GAO to produce mappings of agency financial records to a common report form.
    • Cost/Time to train GAO personnel on the mapping protocol.
    • Cost/Time for additional GAO staff for the creation of the mapping protocol and permanent GAO staff as liaisons with particular agencies.
    • Recommendations for incentives to promote assistance from agencies.
  • Upon approval and funding of the GAO proposal, which should include at least two federal agencies as test cases, that:
    • Test case agencies are granted additional funding for training and staff to cooperate with the GAO mapping team.
    • Test case agencies are granted additional funding for training and staff to produce reports as specified by the GAO.
    • Staff in test case agencies are granted incentives to assist in the initial mapping effort and maintenance of the same. (Positive incentives.)
  • The program of mapping of accounts expand no more often than every two to three years and only if prior agencies have achieved and remain in conformance.

Some critical differences between my sketch of a proposal and the Digital
 Accountability and Transparency 

  1. Additional responsibilities and requirements will be funded for agencies, including additional training and personnel.
  2. Agency staff will have incentives to learn the new skills and procedures necessary for exporting their data as required by the GAO.
  3. Instead of trying to swallow the Federal whale, the project proceeds incrementally and with demonstrable results.

Topic maps can play an important role in such a project but we should be mindful that projects rarely succeed or fail because of technology.

Project fail because, like the DATA Act, they ignore basic human needs, experience in similar situations (9/11), and substitute abuse for legitimate incentives.

 Accountability and Transparency 
 Act) [DOA]

Monday, March 4th, 2013

I started this series of posts in: Digital
 Act) [The Details], where I concluded the Data Act had the following characteristics:

  • Secretary of the Treasury has one (1) year to design a common data format for unknown financial data in Federal agencies.
  • Federal agencies have one (1) year to comply with the common data format from the Secretary of the Treasure.
  • No penalties or bonuses for the Secretary of the Treasury.
  • No penalties or bonuses for Federal agencies failing to comply.
  • No funding for the Secretary of the Treasury to carry out the assigned duties.
  • No funding for Federal agencies to carry out the assigned duties.

As written, the Digital
 Act) will be DOA (Dead On Arrival) in the current or any future session of Congress.

There are three (3) main reasons why that is the case.

A Common Data Format

Let me ask a dumb question: Do you remember 9/11?

Of course you do. And the United States has been in a state of war on terrorism every since.

I point that out because intelligence sharing (read common data format) was identified as a reason why the 9/11 attacks weren’t stopped and has been a high priority to solve since then.

Think about that: Reason why the attacks weren’t stopped and a high priority to correct.

This next September 11th will be the twelfth anniversary of those attacks.

Progress on intelligence sharing: Progress Made and Challenges Remaining in Sharing Terrorism-Related Information which I gloss in Read’em and Weep, along with numerous other GAO reports on intelligence sharing.

The good news is that we are less than five (5) years away from some unknown level of intelligence sharing.

The bad news is that puts us sixteen (16) years after 9/11 with some unknown level of intelligence sharing.

And that is for a subset of the entire Federal government.

A smaller set than will be addressed by the Secretary of the Treasury.

Common data format in a year? Really?

To say nothing of the likelihood of agencies changing the multitude of systems they have in place in a year.

No penalties or bonuses

You can think of this as the proverbial carrot and stick if you like.

What incentive does either the Secretary of the Treasury and/or Federal agencies have to engage in this fool’s errand pursuing a common data format?

In case you have forgotten, both the Secretary of the Treasury and Federal agencies have obligations under their existing missions.

Missions which they are designed by legislation and habit to discharge before they turn to additional reporting duties.

And what happens if they discharge their primary mission but don’t do the reporting?

Oh, they get reported to Congress. And ranked in public.

As Ben Stein would say, “Wow.”

No Funding

To add insult to injury, there is no additional funding for either the Secretary of the Treasury or Federal agencies to engage in any of the activities specified by the Digital

As I noted above, the Secretary of the Treasury and Federal agencies already have full plates with their current missions.

Now they are to be asked to undertake unfamiliar tasks, creation of a chimerical “common data format” and submitting reports based upon it.

Without any addition staff, training, or other resources.

Directives without resources to fulfill them are directives that are going to fail. (full stop)

Tentative Conclusion

If you are asking yourself, “Why would anyone advocate the Digital
 Act)?,” five points for your house!

I don’t know of anyone who understands:

  1. the complexity of Federal data,
  2. the need for incentives,
  3. the need for resources to perform required tasks,

who thinks the Digital
 Act) is viable.

Why advocate non-viable legislation?

Its non-viability make it an attractive fund raising mechanism.

Advocates can email, fund raise, telethon, rant, etc., to their heart’s content.

Advocating non-viable transparency lines an organization’s pocket at no risk of losing its rationale for existence.

The third post in this series, suggesting a viable way forward, will appear tomorrow under: Transparency and the Digital Oil Drop.

 Act) [The Details]

Monday, March 4th, 2013

The Data Transparency Coalition, the Sunlight Foundation and others are calling for reintroduction of the Digital
 Act) in order to make U.S. government spending more transparent.

Transparency in government spending is essential for an informed electorate. An electorate that can call attention to spending that is inconsistent with policies voted for by the electorate. Accountability as it were.

But saying “transparency” is easy. Achieving transparency, not so easy.

Let’s look at some of the details in the DATA Act.


    ‘(A) IN GENERAL- The Secretary of the Treasury, in consultation with the Director of the Office of Management and Budget, the General Services Administration, and the heads of Federal agencies, shall establish Government-wide financial data standards for Federal funds, which may–

      ‘(i) include common data elements, such as codes, unique award identifiers, and fields, for financial and payment information required to be reported by Federal agencies;

      ‘(ii) to the extent reasonable and practicable, ensure interoperability and incorporate–

        ‘(I) common data elements developed and maintained by an international voluntary consensus standards body, as defined by the Office of Management and Budget, such as the International Organization for Standardization;

        ‘(II) common data elements developed and maintained by Federal agencies with authority over contracting and financial assistance, such as the Federal Acquisition Regulatory Council; and

        ‘(III) common data elements developed and maintained by accounting standards organizations; and

      ‘(iii) include data reporting standards that, to the extent reasonable and practicable–

        ‘(I) incorporate a widely accepted, nonproprietary, searchable, platform-independent computer-readable format;

        ‘(II) be consistent with and implement applicable accounting principles;

        ‘(III) be capable of being continually upgraded as necessary; and

        ‘(IV) incorporate nonproprietary standards in effect on the date of enactment of the Digital Accountability and Transparency Act of 2012.


      ‘(i) GUIDANCE- The Secretary of the Treasury, in consultation with the Director of the Office of Management and Budget, shall issue guidance on the data standards established under subparagraph (A) to Federal agencies not later than 1 year after the date of enactment of the Digital Accountability and Transparency Act of 2012.

      ‘(ii) AGENCIES- Not later than 1 year after the date on which the guidance under clause (i) is issued, each Federal agency shall collect, report, and maintain data in accordance with the data standards established under subparagraph (A).

OK, I have a confession to make: I was a lawyer for ten years and reading this sort of thing is second nature to me. Haven’t practiced law in decades but I still read legal stuff for entertainment. ;-)

First, read section A and write down the types of data you would have to collect for each of those items.

Don’t list the agencies/organizations you would have to contact, you probably don’t have enough paper in your office for that task.

Second, read section B and notice that the Secretary of the Treasury has one (1) years to issue guidance for all the data you listed under Section A.

That means gathering, analyzing, testing and designing a standard for all that data, most of which is unknown. Even to the GAO.

And, if they meet that one (1) year deadline, the various agencies have only one (1) year to comply with the guidance from the Secretary of the Treasury.

Do I need to comment on the likelihood of success?

As far as the Secretary of the Treasury, what happens if they don’t meet the one year deadline? Do you see any penalties?

Assuming some guidance emerges, what happens to any Federal agency that does not comply? Any penalties for failure? Any incentives to comply?

My reading is:

  • Secretary of the Treasury has one (1) year to design a common data format for unknown financial data in Federal agencies.
  • Federal agencies have one (1) year to comply with the common data format from the Secretary of the Treasure.
  • No penalties or bonuses for the Secretary of the Treasury.
  • No penalties or bonuses for Federal agencies failing to comply.
  • No funding for the Secretary of the Treasury to carry out the assigned duties.
  • No funding for Federal agencies to carry out the assigned duties.

Do you disagree with that reading of the Digital

My analysis of that starting point appears in Digital
 Act) [DOA]

$1.55 Trillion 
 in 2011

Monday, March 4th, 2013


Updating Senator Dirksen for inflation: “A trillion here, a trillon there, and pretty soon you’re talking real money.” (Attributed to Senator Dirksen but not documented.)

From the press release:

The Data
 Foundation’s Clearspending 

analyzes the
other data 
26 percent
 The DATA 
 requiring full 

 grants, contracts, or
the grantees 
the law 
and as 
shows, many 

Note the limitation of the report to grant information.

That is does not include non-grant spending, such as defense contracts and similar 21st century follies.

I have questions about the feasibility of universal, even within the U.S. government, data standards for spending. But I will address those in a separate post.

Data models for version management…

Sunday, March 3rd, 2013

Data models for version management of legislative documents by María Hallo Carrasco, M. Mercedes Martínez-González, and Pablo de la Fuente Redondo.


This paper surveys the main data models used in projects including the management of changes in digital normative legislation. Models have been classified based on a set of criteria, which are also proposed in the paper. Some projects have been chosen as representative for each kind of model. The advantages and problems of each type are analysed, and future trends are identified.

I first saw this at Legal Informatics, which had already assembled the following resources:

The legislative metadata models discussed in the paper include:

Useful as models of change tracking should you want to express that in a topic map.

To say nothing of overcoming the semantic impedance between these model.