## Archive for the ‘Government Data’ Category

### State Sequester Numbers [Is This Transparency?]

Wednesday, March 6th, 2013

A great visualization of the impact of sequestration state by state.

And, a post on the process followed to produce the visualization.

The only caveat being that one person read the numbers from PDF files supplied by the White House and another person typed them into a spreadsheet.

Doable with a small data set such as this one, but why was it necessary at all?

Once you have the data in a machine readable form, putting faces in the local community to the abstract categories should be the next step.

Topic maps anyone?

### Transparency and the Digital Oil Drop

Tuesday, March 5th, 2013

I left off yesterday pointing out three critical failures in the Digital  Accountability and Transparency  Act  (DATA  Act)

Those failures were:

• Undefined goals with unrealistic deadlines.
• Lack of incentives for performance.
• Lack of funding for assigned duties.

Digital  Accountability and Transparency  Act  (DATA  Act) [DOA]

Make no mistake, I think transparency, particularly in government spending is very important.

Important enough that proposals for transparency should take it seriously.

In broad strokes, here is my alternative to the Digital Accountability and Transparency Act (DATA Act) proposal:

• Ask the GAO, the federal agency with the most experience auditing other federal agencies, to prepare an estimate for:
• Cost/Time for preparing a program internal to the GAO to produce mappings of agency financial records to a common report form.
• Cost/Time to train GAO personnel on the mapping protocol.
• Cost/Time for additional GAO staff for the creation of the mapping protocol and permanent GAO staff as liaisons with particular agencies.
• Recommendations for incentives to promote assistance from agencies.
• Upon approval and funding of the GAO proposal, which should include at least two federal agencies as test cases, that:
• Test case agencies are granted additional funding for training and staff to cooperate with the GAO mapping team.
• Test case agencies are granted additional funding for training and staff to produce reports as specified by the GAO.
• Staff in test case agencies are granted incentives to assist in the initial mapping effort and maintenance of the same. (Positive incentives.)
• The program of mapping of accounts expand no more often than every two to three years and only if prior agencies have achieved and remain in conformance.

Some critical differences between my sketch of a proposal and the Digital  Accountability and Transparency  Act  (DATA  Act):

1. Additional responsibilities and requirements will be funded for agencies, including additional training and personnel.
2. Agency staff will have incentives to learn the new skills and procedures necessary for exporting their data as required by the GAO.
3. Instead of trying to swallow the Federal whale, the project proceeds incrementally and with demonstrable results.

Topic maps can play an important role in such a project but we should be mindful that projects rarely succeed or fail because of technology.

Project fail because, like the DATA Act, they ignore basic human needs, experience in similar situations (9/11), and substitute abuse for legitimate incentives.

### Digital  Accountability and Transparency  Act  (DATA  Act) [DOA]

Monday, March 4th, 2013

I started this series of posts in: Digital  Accountability  and  Transparency  Act  (DATA  Act) [The Details], where I concluded the Data Act had the following characteristics:

• Secretary of the Treasury has one (1) year to design a common data format for unknown financial data in Federal agencies.
• Federal agencies have one (1) year to comply with the common data format from the Secretary of the Treasure.
• No penalties or bonuses for the Secretary of the Treasury.
• No penalties or bonuses for Federal agencies failing to comply.
• No funding for the Secretary of the Treasury to carry out the assigned duties.
• No funding for Federal agencies to carry out the assigned duties.

As written, the Digital  Accountability  and  Transparency  Act  (DATA  Act) will be DOA (Dead On Arrival) in the current or any future session of Congress.

There are three (3) main reasons why that is the case.

A Common Data Format

Let me ask a dumb question: Do you remember 9/11?

Of course you do. And the United States has been in a state of war on terrorism every since.

I point that out because intelligence sharing (read common data format) was identified as a reason why the 9/11 attacks weren’t stopped and has been a high priority to solve since then.

Think about that: Reason why the attacks weren’t stopped and a high priority to correct.

This next September 11th will be the twelfth anniversary of those attacks.

Progress on intelligence sharing: Progress Made and Challenges Remaining in Sharing Terrorism-Related Information which I gloss in Read’em and Weep, along with numerous other GAO reports on intelligence sharing.

The good news is that we are less than five (5) years away from some unknown level of intelligence sharing.

The bad news is that puts us sixteen (16) years after 9/11 with some unknown level of intelligence sharing.

And that is for a subset of the entire Federal government.

A smaller set than will be addressed by the Secretary of the Treasury.

Common data format in a year? Really?

To say nothing of the likelihood of agencies changing the multitude of systems they have in place in a year.

No penalties or bonuses

You can think of this as the proverbial carrot and stick if you like.

What incentive does either the Secretary of the Treasury and/or Federal agencies have to engage in this fool’s errand pursuing a common data format?

In case you have forgotten, both the Secretary of the Treasury and Federal agencies have obligations under their existing missions.

Missions which they are designed by legislation and habit to discharge before they turn to additional reporting duties.

And what happens if they discharge their primary mission but don’t do the reporting?

Oh, they get reported to Congress. And ranked in public.

As Ben Stein would say, “Wow.”

No Funding

To add insult to injury, there is no additional funding for either the Secretary of the Treasury or Federal agencies to engage in any of the activities specified by the Digital  Accountability  and  Transparency  Act  (DATA  Act).

As I noted above, the Secretary of the Treasury and Federal agencies already have full plates with their current missions.

Now they are to be asked to undertake unfamiliar tasks, creation of a chimerical “common data format” and submitting reports based upon it.

Without any addition staff, training, or other resources.

Directives without resources to fulfill them are directives that are going to fail. (full stop)

Tentative Conclusion

If you are asking yourself, “Why would anyone advocate the Digital  Accountability  and  Transparency  Act  (DATA  Act)?,” five points for your house!

I don’t know of anyone who understands:

1. the complexity of Federal data,
2. the need for incentives,
3. the need for resources to perform required tasks,

who thinks the Digital  Accountability  and  Transparency  Act  (DATA  Act) is viable.

Its non-viability make it an attractive fund raising mechanism.

Advocates can email, fund raise, telethon, rant, etc., to their heart’s content.

Advocating non-viable transparency lines an organization’s pocket at no risk of losing its rationale for existence.

The third post in this series, suggesting a viable way forward, will appear tomorrow under: Transparency and the Digital Oil Drop.

### Digital  Accountability  and  Transparency  Act  (DATA  Act) [The Details]

Monday, March 4th, 2013

The Data Transparency Coalition, the Sunlight Foundation and others are calling for reintroduction of the Digital  Accountability  and  Transparency  Act  (DATA  Act) in order to make U.S. government spending more transparent.

Transparency in government spending is essential for an informed electorate. An electorate that can call attention to spending that is inconsistent with policies voted for by the electorate. Accountability as it were.

But saying “transparency” is easy. Achieving transparency, not so easy.

Let’s look at some of the details in the DATA Act.

(2) DATA STANDARDS-

‘(A) IN GENERAL- The Secretary of the Treasury, in consultation with the Director of the Office of Management and Budget, the General Services Administration, and the heads of Federal agencies, shall establish Government-wide financial data standards for Federal funds, which may–

‘(i) include common data elements, such as codes, unique award identifiers, and fields, for financial and payment information required to be reported by Federal agencies;

‘(ii) to the extent reasonable and practicable, ensure interoperability and incorporate–

‘(I) common data elements developed and maintained by an international voluntary consensus standards body, as defined by the Office of Management and Budget, such as the International Organization for Standardization;

‘(II) common data elements developed and maintained by Federal agencies with authority over contracting and financial assistance, such as the Federal Acquisition Regulatory Council; and

‘(III) common data elements developed and maintained by accounting standards organizations; and

‘(iii) include data reporting standards that, to the extent reasonable and practicable–

‘(I) incorporate a widely accepted, nonproprietary, searchable, platform-independent computer-readable format;

‘(II) be consistent with and implement applicable accounting principles;

‘(III) be capable of being continually upgraded as necessary; and

‘(IV) incorporate nonproprietary standards in effect on the date of enactment of the Digital Accountability and Transparency Act of 2012.

‘(i) GUIDANCE- The Secretary of the Treasury, in consultation with the Director of the Office of Management and Budget, shall issue guidance on the data standards established under subparagraph (A) to Federal agencies not later than 1 year after the date of enactment of the Digital Accountability and Transparency Act of 2012.

‘(ii) AGENCIES- Not later than 1 year after the date on which the guidance under clause (i) is issued, each Federal agency shall collect, report, and maintain data in accordance with the data standards established under subparagraph (A).

OK, I have a confession to make: I was a lawyer for ten years and reading this sort of thing is second nature to me. Haven’t practiced law in decades but I still read legal stuff for entertainment.

First, read section A and write down the types of data you would have to collect for each of those items.

Don’t list the agencies/organizations you would have to contact, you probably don’t have enough paper in your office for that task.

Second, read section B and notice that the Secretary of the Treasury has one (1) years to issue guidance for all the data you listed under Section A.

That means gathering, analyzing, testing and designing a standard for all that data, most of which is unknown. Even to the GAO.

And, if they meet that one (1) year deadline, the various agencies have only one (1) year to comply with the guidance from the Secretary of the Treasury.

Do I need to comment on the likelihood of success?

As far as the Secretary of the Treasury, what happens if they don’t meet the one year deadline? Do you see any penalties?

Assuming some guidance emerges, what happens to any Federal agency that does not comply? Any penalties for failure? Any incentives to comply?

• Secretary of the Treasury has one (1) year to design a common data format for unknown financial data in Federal agencies.
• Federal agencies have one (1) year to comply with the common data format from the Secretary of the Treasure.
• No penalties or bonuses for the Secretary of the Treasury.
• No penalties or bonuses for Federal agencies failing to comply.
• No funding for the Secretary of the Treasury to carry out the assigned duties.
• No funding for Federal agencies to carry out the assigned duties.

Do you disagree with that reading of the Digital  Accountability  and  Transparency  Act  (DATA  Act)?

My analysis of that starting point appears in Digital  Accountability  and  Transparency  Act  (DATA  Act) [DOA]

### $1.55 Trillion in Federal Spending Misreported in 2011 Monday, March 4th, 2013 With$1.55  Trillion  in  Federal  Spending  Misreported  in  2011,  Data Transparency  Coalition  Renews  Call  for  Congressional  Action

Updating Senator Dirksen for inflation: “A trillion here, a trillon there, and pretty soon you’re talking real money.” (Attributed to Senator Dirksen but not documented.)

From the press release:

The Data  Transparency  Coalition,  the  only  group  unifying  the  technology  industry  in  support  of  federal  data  reform,  applauded  the  release  today  of  the  Sunlight  Foundation’s Clearspending  report  and  called  for  the  U.S.  Congress  to  reintroduce  and  pass  the  Digital  Accountability  and  Transparency  Act  (DATA  Act)  in  order  to  rectify  the  misreporting  of  trillions  of  dollars  in  federal  spending  each  year.

No Joy in Vindication Seventh post, Confirmation by the GAO that the problem I describe in the 560+ $Billion Shell Game exists in the DoD. (January 21, 2013) Update: Refried Numbers from the OMB In its current attempt at sequester obfuscation, the OMB combined the approaches used in Appendices A and B of its earlier report and reduced the percentage of sequestration. See: OMB REPORT TO THE CONGRESS ON THE JOINT COMMITTEE SEQUESTRATION FOR FISCAL YEAR 2013. ### From President Obama, The Opaque Thursday, February 28th, 2013 Leaked BLM Draft May Hinder Public Access to Chemical Information From the post: On Feb. 8, EnergyWire released a leaked draft proposal from the U.S. Department of the Interior’s Bureau of Land Management on natural gas drilling and extraction on federal public lands. If finalized, the proposal could greatly reduce the public’s ability to protect our resources and communities. The new draft indicates a disappointing capitulation to industry recommendations. The draft rule affects oil and natural gas drilling operations on the 700 million acres of public land administered by BLM, plus 56 million acres of Indian lands. This includes national forests, which are the sources of drinking water for tens of millions of Americans, national wildlife refuges, and national parks, which are widely used for recreation. The Department of the Interior estimates that 90 percent of the 3,400 wells drilled each year on public and Indian lands use natural gas fracking, a process that pumps large amounts of water, sand, and toxic chemicals into gas wells at very high pressure to cause fissures in shale rock that contains methane gas. Fracking fluid is known to contain benzene (which causes cancer), toluene, and other harmful chemicals. Studies link fracking-related activities to contaminated groundwater, air pollution, and health problems in animals and humans. If the leaked draft is finalized, the changes in chemical disclosure requirements would represent a major concession to the oil and gas industry. The rule would allow drilling companies to report the chemicals used in fracking to an industry-funded website, called FracFocus.org. Though the move by the federal government to require online disclosure is encouraging, the choice of FracFocus as the vehicle is problematic for many reasons. First, the site is not subject to federal laws or oversight. The site is managed by the Ground Water Protection Council (GWPC) and the Interstate Oil and Gas Compact Commission (IOGCC), nonprofit intergovernmental organizations comprised of state agencies that promote oil and gas development. However, the site is paid for by the American Petroleum Institute and America’s Natural Gas Alliance, industry associations that represent the interests of member companies. BLM would have little to no authority to ensure the quality and accuracy of the data reported directly to such a third-party website. Additionally, the data will not be accessible through the Freedom of Information Act since BLM is not collecting the information. The IOGCC has already declared that it is not subject to federal or state open records laws, despite its role in collecting government-mandated data. Second, FracFocus.org makes it difficult for the public to use the data on wells and chemicals. The leaked BLM proposal fails to include any provisions to ensure minimum functionality on searching, sorting, downloading, or other mechanisms to make complex data more usable. Currently, the site only allows users to download PDF files of reports on fracked wells, which makes it very difficult to analyze data in a region or track chemical use. Despite some plans to improve searching on FracFocus.org, the oil and gas industry opposes making chemical data easier to download or evaluate for fear that the public “might misinterpret it or use it for political purposes.” Don’t you feel safer? Knowing the oil and gas industry is working so hard to protect you from misinterpreting data? Why the government is helping the oil and gas industry protect us from data I cannot say. I mention this an example of testing for “transparency.” Anything the government freely makes available with spreadsheet capabilities, isn’t transparency. It’s distraction. Any data that the government tries to hide, that data has potential value. The Center for Effective Government points out these are draft rules and when published, you need to comment. Not a bad plan but not very reassuring given the current record of President Obama, the Opaque. Alternatives? Suggestions for how data mining could expose those who own floors of the BLM, who drill the wells, etc? ### EU Commission – Open Data Portal Open Tuesday, February 26th, 2013 EU Commission – Open Data Portal Open From the post: The European Union Commission has unveiled a new Open Data Portal, with over 5,580 data sets – the majority of which comes from the Eurostat (the statistical office of the European Union). The portal is the result of the Commission’s ‘Open Data Strategy for Europe’, and will publish data from the European Commission and other bodies of the European Union; it already holds data from the European Environment Agency. The portal has a SPARQL endpoint to provide linked data, and will also feature applications that use this data. The published data can be downloaded by everyone interested to facilitate reuse, linking and the creation of innovative services. This shows the commitment of the Commission to the principles of openness and transparency. For more information https://ec.europa.eu/digital-agenda/en/blog/eu-open-data-portal-here. If the Commission is committed to “principles of openness and transparency, when can we expect to see: 1. Rosters of the institutions and individual participants in EU funded research from 1980 to present? 2. Economic analysis of the results of EU funded projects, on a project by project basis, from 1980 to present? Noting from 1984 – 2013, the total research funding exceeds EUR 118 billion. To be fair, CORDIS: Community Research and Development Information Service has report summaries and project reports for FP5, FP6 and FP7. And CORDIS Search Service provides coverage back to the early 1980′s. About Projects on Cordis has a wealth of information to guide searching into EU funded research. While a valuable resource, CORDIS requires the extraction of detailed information on a project by project basis, making large scale analysis difficult if not prohibitively expensive. PS: Of the 5855 datasets, some 5680 datasets, were previously published by EuroStat. European Environmental Agency, 106 datasets. Perhaps a net increase of 59 datasets over those previously available. ### U.S. Statutes at Large 1951-2009 Saturday, February 23rd, 2013 From the post: The GPO’s recent electronic publication of all legislation enacted by Congress from 1951-2009 is noteworthy for several reasons. It makes available nearly 40 years of lawmaking that wasn’t previously available online from any official source, narrowing part of a much larger information gap. It meets one of three long-standing directives from Congress’s Joint Committee on Printing regarding public access to important legislative information. And it has published the information in a way that provides a platform for third-party providers to cleverly make use of the information. While more work is still needed to make important legislative information available to the public, this online release is a useful step in the right direction. Narrowing the Gap In mid-January 2013, GPO published approximately 32,000 individual documents, along with descriptive metadata, including all bills enacted into law, joint concurrent resolutions that passed both chambers of Congress, and presidential proclamations from 1951-2009. The documents have traditionally been published in print in volumes known as the “Statutes at Large,” which commonly contain all the materials issued during a calendar year. The Statutes at Large are literally an official source for federal laws and concurrent resolutions passed by Congress. The Statutes at Large are compilations of “slip laws,” bills enacted by both chambers of Congress and signed by the President. By contrast, while many people look to the US Code to find the law, many sections of the Code in actuality are not the “official” law. A special office within the House of Representatives reorganizes the contents of the slip laws thematically into the 50 titles that make up the US Code, but unless that reorganized document (the US Code) is itself passed by Congress and signed into law by the President, it remains an incredibly helpful but ultimately unofficial source for US law. (Only half of the titles of the US Code have been enacted by Congress, and thus have become law themselves.) Moreover, if you want to see the intact text of the legislation as originally passed by Congress — before it’s broken up and scattered throughout the US Code — the place to look is the Statutes at Large. Policy wonks and trivia experts will have a field day but the value of the Statutes at Large isn’t apparent to me. I assume there are cases where errors can be found between the U.S.C. (United States Code) and the Statutes at Large. The significance of those errors is unknown. Like my comments on the SEC Midas program, knowing a law was passed isn’t the same as knowing who benefits from it. Or who paid for its passage. Knowing which laws were passed is useful. Knowing who benefited or who paid, priceless. ### Failure By Design Saturday, February 23rd, 2013 Did you know the Security and Exchange Commission (SEC) is now collecting 400 gigabytes of market data daily? Midas [Market Information Data Analytics System], which is costing the SEC$2.5 million a year, captures data such as time, price, trade type and order number on every order posted on national stock exchanges, every cancellation and modification, and every trade execution, including some off-exchange trades. Combined it adds up to billions of daily records.

So, what’s my complaint?

Midas won’t be able to fill in all of the current holes in SEC’s vision. For example, the SEC won’t be able to see the identities of entities involved in trades and Midas doesn’t look at, for example, futures trades and trades executed outside the system in what are known as “dark pools.” (emphasis added)

What?

The one piece of information that could reveal patterns of insider trading, churning, and a whole host of other securities crimes, is simply not collected.

I wonder who would benefit from the SEC not being able to track insider trading, churning, etc.?

People engaged in insider trading, churning, etc. would be my guess.

You?

Maybe someone should ask SEC chairman Elisse Walter or Gregg Berman (who oversees MIDAS) if tracking entities would help with SEC enforcement?

If they agree, then ask why not now?

For that matter, why not open up the data + entities so others can help the SEC with analysis of the data?

Obvious questions J. Nicholas Hoover should have asked for SEC Makes Big Data Push To Analyze Markets.

### G-8 International Conference on Open Data for Agriculture

Tuesday, February 19th, 2013

G-8 International Conference on Open Data for Agriculture

April 29-30, 2013 Washington, D.C.

Deadline for proposals: Midnight, February 28, 2013.

From the call for ideas:

Are you interested in addressing global challenges, such as food security, by providing open access to information? Would you like the opportunity to present to leaders from around the world?

We are seeking innovative products and ideas that demonstrate the potential of using open data to increase food security. This April 29-30th in Washington, D.C., the G-8 International Conference on Open Data for Agriculture will host policy makers, thought leaders, food security stakeholders, and data experts to build a strategy to share agriculture data and make innovation more accessible. As part of the conference, we are giving innovators a chance to showcase innovative uses of open data for food security in a lightning presentation or in the exhibit hall. This call for ideas is a chance to demonstrate the potential that open data can have in ensuring food security, and can inform an unprecedented global collaboration. Visit data.gov to see what agricultural data is already available and connect to other G-8 open data sites!

We are seeking top innovators to show the world what can be done with open data through:

• Lightning Presentations: brief (3-5 minute), image rich presentations intended to convey an idea
• Exhibit Hall: an opportunity to convey an idea through an image-rich exhibit.

Presentations should inspire others to share their data or imagine how open data could be used to increase food security. Presentations may include existing, new, or proposed applications of open data and should meet one or more of the following criteria:

• Demonstrate the impact of open data on food security.
• Demonstrate the impact of access to agriculturally-relevant data on developed and/or developing countries.
• Demonstrate the impact of bringing multiple sources of agriculturally-relevant public and/or private open data together (think about the creation of an agriculture equivalent of weather.com)

For those with a new idea, we invite you to submit your proposal to present it to leading experts in food security, technology and data innovation. Proposals should identify which data is needed that is publicly available, for free, on the internet. Proposals must also include a design of the application including relevance to the target audience and plans for beta testing. A successful prototype will be mobile, interactive, and scalable. Proposals to showcase existing products or pitch new ideas will be reviewed by a global panel of technical experts from the G-8 countries.

Short notice but from the submission form on the website, you only get 75-100 words to summarize your proposal.

Hell, I have trouble identifying myself in 75-100 words.

Still, if you are in D.C. and interested, it could be a good way to meet people in this area.

The nine flags for the G-8 are confusing at first. Not an example of government committee counting. The EU has a representative at G-8 meetings.

### “Improving Critical Infrastructure Cybersecurity” Executive Order

Thursday, February 14th, 2013

Unless you have been asleep for the last couple of days, you have heard about President Obama’s “Improving Critical Infrastructure Cybersecurity” Executive Order.

Wanted to point you to one of the lesser discussed provisions of the order:

In order to maximize the utility of cyber threat information sharing with the private sector, the Secretary shall expand the use of programs that bring private sector subject-matter experts into Federal service on a temporary basis. These subject matter experts should provide advice regarding the content, structure, and types of information most useful to critical infrastructure owners and operators in reducing and mitigating cyber risks.

I didn’t know which “…programs that bring private sector subject-matter experts into Federal Service…” he meant.

So, I wrote to the GSA (General Services Administration) and they said to look at schedules 70 and 874 at www.gsaelibrary.gsa.gov.

I won’t try to advise you on the steps to register for government contract work.

But this is an opportunity for building bridges across the semantic divides in any inter-agency effort.

Do remember where you heard the news!

### datacatalogs.org [San Francisco, for example]

Wednesday, February 13th, 2013

datacatalogs.org

From the homepage:

a comprehensive list of open data catalogs curated by experts from around the world.

Cited in Simon Roger’s post: Competition: visualise open government data and win $2,000. As of today, 288 registered data catalogs. The reservation I have about “open” government data is that when it is “open,” it’s not terribly useful. I am sure there is useful “open” government data but let me give you an example of non-useful “open” government data. Consider San Francisco, CA and cases of police misconduct against it citizens. A really interesting data visualization would be to plot those incidents against the neighborhoods of San Francisco. Where the neighborhoods are colored by economic status. The maps of San Francisco are available at DataSF, specifically, Planning Neighborhoods. What about the police data? I found summaries like: OCC Caseload/Disposition Summary – 1993-2009 Which listed: • Opened • Closed • Pending • Sustained Not exactly what is needed for neighborhood by neighborhood mapping. Note: No police misconduct since 2009 according to these data sets. (I find that rather hard to credit.) How would you vote on this data set from San Francisco? Open, Opaque, Semi-Transparent? ### Need to Pad Your Resume? Innovation Fellows Round 2 Wednesday, February 6th, 2013 White House Seeks Tech Innovation Fellows by Elena Malykhina. The White House death march farce I covered in A Competent CTO Can Say No continues. Next group of six to twelve month projects are: • – Disaster Response and Recovery: The project will “pre-position” tech tools for disaster readiness in order to diminish economic damage and save lives. • – Cyber-Physical Systems: A new generation of cyber-physical “smart systems” will be developed to help the economy and job creation. These systems will combine distributed sensing, control and data analytics. • – 21st Century Financial Systems: The 21st Century Financial Systems initiative will transition agency-specific federal financial accounting systems to a more modular, scalable and cost-effective model. • – Innovation Toolkit: A suite of tools will be created for federal workers, allowing them to become more responsive and efficient in their jobs. • – Development Innovation Ventures: The Development Innovation Ventures project will address tough global problems by allowing the U.S. government to identify, test and scale new technologies. Sound like six to twelve month projects? Yes? I know, I know, I should be lining up to participate in this fraud on the public and be paid for doing it. Looks nice on the resume. Successful solutions will not be developed on fixed timelines before problems are defined or understood. Some will say, “So what? So long as you are paid for time, travel, etc., why would you care if the solution is successful?” That must be why there are no links in the Round 2 announcement to “successes” of the first round of innovation. Take the first one on the list from round one: Open Data Initiatives have unleashed data from the vaults of the government as fuel for entrepreneurs and innovators to create new apps, products, and services that benefit the American people in myriad ways and contribute to job growth. Can you name one? Just one. Sequestration data (except for my releases) continues to be dead PDF files. And the data in those files is too incomplete for useful analysis. Is that “…unleash[ing] data from the vaults of the government…?” Or did the sequestration debate escaped their attention? The number of people willing to defraud the public even in these hard economic times was encouragingly low. Only 700 people applied for round one. Out of hundreds of thousands of highly qualified IT people who could have applied. Is defrauding the public becoming unfashionable? Perhaps there is hope. Lest there be some misunderstanding, government at all levels is filled with public servants. But you have to get away from elected/appointed positions to find them. They mostly don’t appear on Sunday talk shows but tirelessly do the public’s business out of the limelight. Public servants I would gladly help, public parasites, not so much. ### Green Book – Semantic and Governmental Failure Tuesday, February 5th, 2013 Full Text Reports carried a report today about the House Ways and Means Committee — 2012 Green Book (released November 2012). I am always looking for data that might be of interesting for topic maps and the quoted blurb: Since 1981, the Committee on Ways and Means has published the Green Book, which presents background material and statistical data on the major entitlement programs and other activities within the Committee’s jurisdiction. Over the decades, the Green Book has become a valuable resource and standard reference on American social policy. It is widely used by Members of Congress and their staffs, analysts in congressional and administrative agencies, members of the media, scholars, and citizens interested in the Nation’s social policy. Seemed to fill the bill. I sh*t you not. That is really the title. No wonder they call it the “Green Book.” When I got to the book itself, stop laughing!, you are ahead of me, all the tables are in PDF files. No, I’m not going to convert them this time. Why they don’t share machine readable files?, is a question you should ask your representative. Thinking there may be a machine readable copy elsewhere, I searched for the “Green Book.” Did you know the Department of Defense has a “Green Book?” Or that Financial Management Services (Treasury) has a Greek Book? Or that the Treasury has another Greek Book? Or the U.S. Army Green Books? (apparently there are later ones than cited here) Or that Obama has a Green Book. Counting the one from Congress, that’s six and I suspect there are many more that any search will turn up. Don’t suppose it ever occurred to anyone in government that distinguishing any of these for search purposes would be useful? ### Alpha.data.gov: From Open Data Provider to Open Data Hub Saturday, February 2nd, 2013 Alpha.data.gov: From Open Data Provider to Open Data Hub by Andrea Di Maio. From the post: Those who happen to read my blog know that I am rather cynical about many enthusiastic pronouncements around open data. One of the points I keep banging on is that the most common perspective is that open data is just something that governments ought to publish for businesses and citizens to use it. This perspective misses both the importance of open data created elsewhere – such as by businesses or by people in social networks – and the impact of its use inside government. Also, there is a basic confusion between open and public data: not all open data is public and not all public data may be open (although they should, in the long run). In this respect the new experimental site alpha.data.gov is a breath of fresh air. Announced in a recent post on the White House blog, it does not contain data, but explains which categories of open data can be used for which sort of purposes. A step in the right direction. Simply gathering the relevant data sets for any given project is a project in and of itself. Followed by documenting the semantics of the relevant data sets. Data hubs are a precursor to collections of semantic documentation for data found at data hubs. You know what should follow from collections of semantic documentation. (Can you say topic maps?) ### Sunlight Congress API [Shifting the Work for Transparency?] Friday, February 1st, 2013 Sunlight Congress API From the webpage: A live JSON API for the people and work of Congress, provided by the Sunlight Foundation. Features Lots of features and data for members of Congress: • Look up legislators by location or by zip code. • Official Twitter, YouTube, and Facebook accounts. • Committees and subcommittees in Congress, including memberships and rankings. We also provide Congress' daily work: • All introduced bills in the House and Senate, and what occurs to them (updated daily). • Full text search over bills, with powerful Lucene-based query syntax. • Real time notice of votes, floor activity, and committee hearings, and when bills are scheduled for debate. All data is served in JSON, and requires a Sunlight API key. An API key is free to register and has no usage limits. We have an API mailing list, and can be found on Twitter at @sunlightlabs. Bugs and feature requests can be made on Github Issues. Important not to confuse this effort with transparency. As the late Aaron Swartz remarked in the O’Reilly “Open Government” text: …When you create a regulatory agency, you put together a group of people whose job is to solve some problem. They’re given the power to investigate who’s breaking the law and the authority to punish them. Transparency, on the other hand, simply shifts the work from the government to the average citizen, who has neither the time nor the ability to investigate these questions in any detail, let alone do anything about it. It’s a farce: a way for Congress to look like it has done something on some pressing issue without actually endangering its corporate sponsors. Here is an interface that: …shifts the work from the [Sunlight Foundation] to the average citizen, who has neither the time nor the ability to investigate these questions in any detail, let alone do anything about it. It’s a farce: a way for [Sunlight Foundation] to look like it has done something on some pressing issue without actually endangering its corporate sponsors. (O’Reilly’s Open Government book ["...more equal than others" pigs] Suggestions for ending the farce? I first saw this at the Legal Informatics Blog, Mill: Sunlight Foundation releases Congress API. ### Docket Wrench: Exposing Trends in Regulatory Comments [Apparent Transparency] Friday, February 1st, 2013 Docket Wrench: Exposing Trends in Regulatory Comments by Nicko Margolies. From the post: Today the Sunlight Foundation unveils Docket Wrench, an online research tool to dig into regulatory comments and uncover patterns among millions of documents. Docket Wrench offers a window into the rulemaking process where special interests and individuals can wield their influence without the level of scrutiny traditional lobbying activities receive. Before an agency finalizes a proposed rule that Congress and the president have mandated that they enforce, there is a period of public commenting where the agency solicits feedback from those affected by the rule. The commenters can vary from company or industry representatives to citizens concerned about laws that impact their environment, schools, finances and much more. These comments and related documents are grouped into “dockets” where you can follow the actions related to each rule. Every rulemaking docket has its own page on Docket Wrench where you can get a graphical overview of the docket, drill down into the rules and notices it contains and read the comments on those rules. We’ve pulled all this information together into one spot so you can more easily research trends and extract interesting stories from the data. Sunlight’s Reporting Group has done just that, looking into regulatory comment trends and specific comments by the Chamber of Commerce and the NRA. An “apparent” transparency offering from the Sunlight Foundation. Imagine that you follow their advice and do discover “form letters,” horror, that have been submitted in a rule making process. What are you going to do? Whistle up the agency’s former assistant director who is on your staff to call his buds at the agency to complain? Get yourself a cardboard sign and march around your town square? Start a letter writing campaign of your own? Rules are drafted, debated and approved in the dark recesses of agencies, former agency staff, lobbyists and law firms. Want transparency? Real transparency? That would require experts in law and policy who have equal access to the agency as its insiders and an obligation to report to the public who wins and who loses from particular rules. An office like the public editor of the New York Times. Might offend donors if you did that. Best just to expose the public to a tiny part of the quagmire so you can claim people had an opportunity to participate. Not a meaningful one, but an opportunity none the less. I first saw this at the Legal Informatics Blog, Sunlight Foundation Releases Docket Wrench: Tool for Analyzing Comments to Proposed Regulations ### No Joy in Vindication Monday, January 21st, 2013 You may have seen the news about the latest GAO report on auditing the U.S. government: U.S. Government’s Fiscal Years 2012 and 2011 Consolidated Financial Statements, GAO-13-271R, Jan 17, 2013, http://www.gao.gov/products/GAO-13-271R. The reasons why the GAO can’t audit the U.S. government: (1) serious financial management problems at DOD that have prevented its financial statements from being auditable, (2) the federal government’s inability to adequately account for and reconcile intragovernmental activity and balances between federal agencies, and (3) the federal government’s ineffective process for preparing the consolidated financial statements. Number 2 reminds me of: The 560+$Billion Shell Game, where I provided data files based on the OMB Sequestration report, detailing that over 560 $billion in agency transfers could not be tracked. That problem has now been confirmed by the GAO. I am sure my analysis was not original and has been known to insiders at the GAO and others for years. But did you know that I mailed that analysis to both of my U.S. Senators and got no response? I did get a “bug letter” from my representative, Austin Scott: Washington continues to spend at unsustainable levels. That is why I voted against H.R. 8, the American Taxpayer Relief Act when it passed Congress on January 1, 2013. This plan does not address the real driver of our debt – spending. President Obama’s unwillingness to address this continues to cripple our efforts to find a long-term solution. We cannot tax our way out of this fiscal situation. The President himself has said on multiple occasions that spending cuts must be part of the solution. In fact, on April 13, 2011 he remarked, “So any serious plan to tackle our deficit will require us to put everything on the table, and take on excess spending wherever it exists in the budget.” However, his words have seldom matched his actions. We owe it to our children and grandchildren to make the tough choices and devise a long-term solution that gets our economy back on track and reduces our deficits. I remain hopeful that the President will join us in this effort. Thank you for contacting me. It’s an honor to represent the Eighth Congressional District of Georgia. Non-responsive would be a polite word for it. My original point has been vindicated by the GAO but that brings no joy. My request to the officials I have contacted was simple: All released government financial data must be available in standard spreadsheet formats (Excel, CSV, ODF). There are a whole host of other issues that will arise from such data but the first step is to get it in a crunchable format. ### O’Reilly’s Open Government book ["...more equal than others" pigs] Monday, January 21st, 2013 From the post: I’ve read many eloquent eulogies from people who knew Aaron Swartz better than I did, but he was also a Foo and contributor to Open Government. So, we’re doing our part at O’Reilly Media to honor Aaron by posting the Open Government book files for free for anyone to download, read and share. The files are posted on the O’Reilly Media GitHub account as PDF, Mobi, and EPUB files for now. There is a movement on the Internet (#PDFtribute) to memorialize Aaron by posting research and other material for the world to access, and we’re glad to be able to do this. You can find the book here: github.com/oreillymedia/open_government Daniel Lathrop, my co-editor on Open Government, says “I think this is an important way to remember Aaron and everything he has done for the world.” We at O’Reilly echo Daniel’s sentiment. Be sure to read Chapter 25, “When Is Transparency Useful?”, by the late Aaron Swartz. It includes this passage: …When you create a regulatory agency, you put together a group of people whose job is to solve some problem. They’re given the power to investigate who’s breaking the law and the authority to punish them. Transparency, on the other hand, simply shifts the work from the government to the average citizen, who has neither the time nor the ability to investigate these questions in any detail, let alone do anything about it. It’s a farce: a way for Congress to look like it has done something on some pressing issue without actually endangering its corporate sponsors. As a tribute to Aaron, are you going to dump data on the WWW or enable the calling of “more equal than others” pigs to account? ### Operation Asymptote – [PlainSite / Aaron Swartz] Sunday, January 20th, 2013 Operation Asymptote Operation Asymptote’s goal is to make U.S. federal court data freely available to everyone. The data is available now, but free only up to$15 worth every quarter.

Serious legal research hits that limit pretty quickly.

The project does not cost you any money, only some of your time.

The result will be another source of data to hold the system accountable.

So, how real is your commitment to doing something effective in memory of Aaron Swartz?

### Freeing the Plum Book

Friday, January 18th, 2013

Freeing the Plum Book by Derek Willis.

From the post:

The federal government produces reams of publications, ranging from the useful to the esoteric. Pick a topic, and in most cases you’ll find a relevant government publication: for example, recent Times articles about presidential appointments draw on the Plum Book. Published annually by either the House or the Senate (the task alternates between committees), the Plum Book is a snapshot of appointments throughout the federal government.

The Plum Book is clearly a useful resource for reporters. But like many products of the Government Printing Office, its two main publication formats are print and PDF. That means the digital version isn’t particularly searchable, unless you count Ctrl-F as a legitimate search mechanism. And that’s a shame, because the Plum Book is basically a long list of names, positions and salary information. It’s data.

Derek describes freeing the Plum Book from less than useful formats.

It is now available in JSON and YAML formats at Github and in Excel.

Curious, what other public datasets would you want to match up to the Plum Book?

### U.S. GPO releases House bills in bulk XML

Sunday, January 13th, 2013

U.S. GPO releases House bills in bulk XML

Another bulk data source from the U.S. Congress.

Integration of the legislative sources will be none trivial but it has been done before, manually.

What will be more interesting will be tracking the more complex interpersonal relationships that underlie the surface of legislative sources.

### Lost: House Floor Record for 1 January 2013. If found please call…

Thursday, January 10th, 2013

U.S. House of Representatives floor proceedings for the 109th Congress, 1st Session (2005) to 113th Congress, 1st Session-to-Date (2013) are now available for download in XML. (House Floor Activities Download)

One obvious test of the data, the House vote on the “fiscal cliff” legislation.

In fact, the Clerk of the House for January 01, 2013, has posted a web version of that day.

Question: If you download 112th Congress, 2nd Session (2012), will you find the vote on the “fiscal cliff” legislation?

The entire legislative day in the House of Representatives is missing from the 112th Congress, 2nd Session (2012) file.

See for yourself: I have uploaded the 112th Congress, 2nd Session (2012) and the Clerk of the House of Representatives file for January 1, 2013, in the file: Missing1January2013.

Search for: “On motion that the House agree to the Senate amendments Agreed to by recorded vote: 257 – 167″ in HDoc-112-2-FloorProceedings.xml. (112th Congress, 2nd Session).

Typos and errors happen all the time. To everyone. But missing an entire day is more than just a typo. It indicates a lack of concern for quality control.

### Center for Effective Government Announces Launch [Name Change]

Wednesday, January 9th, 2013

Center for Effective Government Announces Launch

The former OMB Watch is now the Center for Effective Government (www.foreffectivegov.org).

A change to reflect a broader expertise on government effectiveness in general.

From the post:

The Center for Effective Government will continue to offer expert analysis, in-depth reports, and news updates on the issues it has been known for in the past. Specifically, the organization will:

• Analyze federal tax and spending choices and advocate for progressive revenue options and transparency in federal spending;
• Defend and improve national standards and safeguards and the regulatory systems that produce and enforce them;
• Expose undue special interest influence in federal policymaking and advocate for open government reforms that ensure public officials put the public interest first; and
• Encourage more active citizen engagement in our democracy by ensuring people have access to easy-to-understand, contextualized, meaningful public information and understand how they can participate in public policy decision making processes.

If you have been running a topic map in this area, reflect the name change to the OMB Watch topic.

Beyond simple semantic impedance, which is always present, government is replete with examples of intentional impedance if not outright deception.

A fertile field for topic map practitioners!

### The 560+ $Billion Shell Game Tuesday, January 1st, 2013 I have completed another round of analysis on the OMB Report Pursuant to the Sequestration Transparency Act of 2012 (P. L. 112–155), which I started with Fiscal Cliff + OMB or Fool Me Once/Twice (Appendix A) and Over The Fiscal Cliff – Blindfolded (Appendix B). 560,157 (in$millions) or 560.157 $billion are hidden in the O (Opaque) MB report. How? “Hidden” in the sense that money is taken from an unknown government account and transferred to another government account (the one that says “Exempt, 255(g)(1)(A) — intragovernmental”). In other words, we know where the money went, but not where it came from. Let’s walk through the first account in Appendix A to illustrate how “Exempt, 255(g)(1)(A) — intragovernmental” would be calculated: Senate, 001-05-0110 Salaries, Officers and Employees, Sequestrable BA 176 Sequester Percentage 8.2 Sequester Amount 14 If 50$million is paid out of the Senate account into another government account, it is “Exempt, 255(g)(1)(A) — intragovernmental” and listed at the “other” account as exempt (in Appendix A, Appendix B identifies it as exempt under the Balanced Budget and Emergency Deficit Control Act of 1985, as amended (BBEDCA).)

In the O (Opaque) MB, approximately 560.157 $billion, was transferred from one government account to another, but it isn’t possible to trace the transfers. Open the data file, Appendix-A-Exempt-Intragovernmental-With-Appendix-B-Pages. It only contains accounts with “intragovernmental” exemptions. Filter column J for amounts > 0 and sum the results. (I have included page numbers for Appendix A and Appendix B to assist in your verification of the data.) Another 1.066$billion is contained in negative exemptions under “Exempt, 255(g)(1)(A) — intragovernmental.”

Filter column J for amounts < 0 and sum the results.

What a negative exemption means? You guess is as good as mine.

Twenty One accounts where Appendix A says exempt (but no reason) and Appendix B says not exempt.

1. 001-15-0100
2. 001-15-4296
3. 001-25-0101
4. 001-25-4325
5. 001-25-4346
6. 005-65-1955
7. 006-48-5583
8. 006-48-5584
9. 006-60-0551
10. 006-60-0552
11. 006-60-5396
12. 019-20-0233
13. 009-38-0118
14. 024-70-0701
15. 025-09-0206
16. 010-22-1700
17. 010-95-1127
18. 015-25-4159
19. 015-45-0947
20. 026-00-0112
21. 028-00-4156

Filter on column R = x. I used “x” to denote that Appendix A says exempt but Appendix B disagrees.

Finally, there are four (4) accounts in Appendix A that don’t appear in Appendix B.

1. 202-00-3112
2. 026-00-0109
3. 027-00-0100
4. 028-00-0100

Filter on column S = x. I used “x” to denote Appendix A has an account number not found in Appendix B (apparent typos in B.)

Totals are for “Exempt, 255(g)(1)(A) — intragovernmental” accounts only. The actual count on missing accounts, etc., is higher on the full data set.

Text of

255(g)(1)(A) of the Balanced Budget and Emergency Deficit Control Act of 1985, as amended (BBEDCA):

Intragovernmental funds, including those from which the outlays are derived primarily from resources paid in from other government accounts, except to the extent such funds are augmented by direct appropriations for the fiscal year during which an order is in effect.

Budget “Sequestration” and Selected Program Exemptions and Special Rules (Congressional Research Service) by Karen Spar is heavy reading but very helpful.

Update:

Refried Numbers from the OMB

In its current attempt at sequester obfuscation, the OMB combined the approaches used in Appendices A and B of its earlier report and reduced the percentage of sequestration. See: OMB REPORT TO THE CONGRESS ON THE JOINT COMMITTEE SEQUESTRATION FOR FISCAL YEAR 2013.

### Over The Fiscal Cliff – Blindfolded

Saturday, December 29th, 2012

The United States government is about to go over the “fiscal cliff.”

The really sad part is that the people of the United States are going with it, but they are blindfolded.

Intentionally blindfolded by their own government.

The OMB (“o” stands for opaque) report: OMB Report Pursuant to the Sequestration Transparency Act of 2012 (P. L. 112–155), Appendix B. Preliminary Sequestrable / Exempt Classification, classifies accounts as sequestrable, exempt, etc.

One reason to be “exempt” is funds were already sequestered elsewhere in the budget. Makes sense on the face of it.

But for 487 entries out of 2126 in Appendix B, or 22.9%, are being sequestered from some unstated part of the government.

Totally opaque.

Unlike the OMB, I am willing to share an electronic version of the files: OMB-Sequestration-Data-Appendix-B.zip. Satisfy yourself if I am right or wrong.

You can make it the last time the US government puts a blindfold on the American people.

Contact the White House, your Senator or Representative.

### Political Data Yearbook interactive

Monday, December 24th, 2012

Political Data Yearbook interactive

From the webpage:

Political Data Yearbook captures election results, national referenda, changes in government, and institutional reforms for a range of countries, within and beyond the EU.

Particularly useful if your world consists of the EU + Australia, Canada, Iceland, Israel, Norway, Switzerland and the USA.

To put that into perspective, only the third ranking country in terms of population, the USA, gets listed.

Omitted are (in population order): China, India, Indonesia, Brazil, Pakistan, Bangladesh, Nigeria, Russia and Japan. Or about 60% of the world’s population.

Africa, South America, the Middle East (except for Israel), Mexico and Latin America are omitted as well.

Suggestions of resources to suggest on rapidly expanding markets?

### Crowdsourcing campaign spending: …

Thursday, December 13th, 2012

From the post:

This fall, ProPublica set out to Free the Files, enlisting our readers to help us review political ad files logged with Federal Communications Commission. Our goal was to take thousands of hard-to-parse documents and make them useful, helping to reveal hidden spending in the election.

Nearly 1,000 people pored over the files, logging detailed ad spending data to create a public database that otherwise wouldn’t exist. We logged as much as \$1 billion in political ad buys, and a month after the election, people are still reviewing documents. So what made Free the Files work?

A quick backstory: Free the Files actually began last spring as an effort to enlist volunteers to visit local TV stations and request access to the “public inspection file.” Stations had long been required to keep detailed records of political ad buys, but they were only available on paper and required actually traveling to the station.

In August, the FCC ordered stations in the top 50 markets to begin posting the documents online. Finally, we would be able to access a stream of political ad data based on the files. Right?

Wrong. It turns out the FCC didn’t require stations to submit the data in anything that approaches an organized, standardized format. The result was that stations sent in a jumble of difficult to search PDF files. So we decided if the FCC or stations wouldn’t organize the information, we would.

Enter Free the Files 2.0. Our intention was to build an app to help translate the mishmash of files into structured data about the ad buys, ultimately letting voters sort the files by market, contract amount and candidate or political group (which isn’t possible on the FCC’s web site), and to do it with the help of volunteers.

In the end, Free the Files succeeded in large part because it leveraged data and community tools toward a single goal. We’ve compiled a bit of what we’ve learned about crowdsourcing and a few ideas on how news organizations can adapt a Free the Files model for their own projects.

The team who worked on Free the Files included Amanda Zamora, engagement editor; Justin Elliott, reporter; Scott Klein, news applications editor; Al Shaw, news applications developer, and Jeremy Merrill, also a news applications developer. And thanks to Daniel Victor and Blair Hickman for helping create the building blocks of the Free the Files community.

The entire story is golden but a couple of parts shine brighter for me than the others.

Design consideration:

The success of Free the Files hinged in large part on the design of our app. The easier we made it for people to review and annotate documents, the higher the participation rate, the more data we could make available to everyone. Our maxim was to make the process of reviewing documents like eating a potato chip: “Once you start, you can’t stop.”

Let me re-say that: The easier it is for users to author topic maps, the more topic maps they will author.

Yes?

Semantic Diversity:

But despite all of this, we still can’t get an accurate count of the money spent. The FCC’s data is just too dirty. For example, TV stations can file multiple versions of a single contract with contradictory spending amounts — and multiple ad buys with the same contract number means radically different things to different stations. But the problem goes deeper. Different stations use wildly different contract page designs, structure deals in idiosyncratic ways, and even refer to candidates and groups differently.

All true but knowing the semantics vary ahead of time, station to station, why not map the semantics in the markets ahead of time?

Granting I second their request to the FCC to request standardized data but having standardized blocks doesn’t mean the information has the same semantics.

The OMB can’t keep the same semantics for a handful of terms in one document.

What chance is there with dozens and dozens of players in multiple documents?