Archive for the ‘Government Data’ Category

New Data Sets Available in Census Bureau API

Monday, June 9th, 2014

New Data Sets Available in Census Bureau API

From the post:

Today the U.S. Census Bureau added several data sets to its application programming interface, including 2013 population estimates and 2012 nonemployer statistics.

The Census Bureau API allows developers to create a variety of apps and tools, such as ones that allow homebuyers to find detailed demographic information about a potential new neighborhood. By combining Census Bureau statistics with other data sets, developers can create tools for researchers to look at a variety of topics and how they impact a community.

Data sets now available in the API are:

  • July 1, 2013, national, state, county and Puerto Rico population estimates
  • 2012-2060 national population projections
  • 2007 Economic Census national, state, county, place and region economy-wide key statistics
  • 2012 Economic Census national economy-wide key statistics
  • 2011 County Business Patterns at the national, state and county level (2012 forthcoming)
  • 2012 national, state and county nonemployer statistics (businesses without paid employees)

The API also includes three decades (1990, 2000 and 2010) of census statistics and statistics from the American Community Survey covering one-, three- and five-year periods of data collection. Developers can access the API online and share ideas through the Census Bureau’s Developers Forum. Developers can use the Discovery Tool to examine the variables available in each dataset.

In case you are looking for census data to crunch!

Enjoy!

UK Houses of Parliament launches Open Data portal

Friday, June 6th, 2014

UK Houses of Parliament launches Open Data portal

From the webpage:

Datasets related to the UK Houses of Parliament are now available via data.parliament.uk – the institution’s new dedicated Open Data portal.

Site developers are currently seeking feedback on the portal ahead of the next release, details of how to get in touch can be found by clicking here.

From the alpha release of the portal:

Welcome to the first release of data.parliament.uk – the home of Open Data from the UK Houses of Parliament. This is an alpha release and contains a limited set of features and data. We are seeking feedback from users about the platform and the data on it so please contact us.

I would have to agree that the portal presently contains “limited data.” ;-)

What would be helpful for non-U.K. data miners as well as ones in the U.K., would be some sense of what data is available?

A PDF file listing data that is currently maintained on the UK Houses of Parliament, their members, record of proceedings, transcripts, etc. would be a good starting point.

Pointers anyone?

openFDA

Monday, June 2nd, 2014

openFDA

Not all the news out of government is bad.

Consider openFDA which is putting

More than 3 million adverse drug event reports at your fingertips.

From the “about” page:

OpenFDA is an exciting new initiative in the Food and Drug Administration’s Office of Informatics and Technology Innovation spearheaded by FDA’s Chief Health Informatics Officer. OpenFDA offers easy access to FDA public data and highlight projects using these data in both the public and private sector to further regulatory or scientific missions, educate the public, and save lives.

What does it do?

OpenFDA provides API and raw download access to a number of high-value structured datasets. The platform is currently in public beta with one featured dataset, FDA’s publically available drug adverse event reports.

In the future, openFDA will provide a platform for public challenges issued by the FDA and a place for the community to interact with each other and FDA domain experts with the goal of spurring innovation around FDA data.

We’re currently focused on working on datasets in the following areas:

  • Adverse Events: FDA’s publically available drug adverse event reports, a database that contains millions of adverse event and medication error reports submitted to FDA covering all regulated drugs.
  • Recalls (coming soon): Enforcement Report and Product Recalls Data, containing information gathered from public notices about certain recalls of FDA-regulated products
  • Documentation (coming soon): Structured Product Labeling Data, containing detailed product label information on many FDA-regulated product

We’ll be releasing a number of updates and additional datasets throughout the upcoming months.

OK, I’m Twitter follower #522 @openFDA.

What’s your @openFDA number?

A good experience, i.e., people making good use of released data, asking for more data, etc., is what will drive more open data. Make every useful government data project count.

A New Nation Votes

Thursday, May 15th, 2014

A New Nation Votes: American Election Returns 1787-1825

From the webpage:

A New Nation Votes is a searchable collection of election returns from the earliest years of American democracy. The data were compiled by Philip Lampi. The American Antiquarian Society and Tufts University Digital Collections and Archives have mounted it online for you with funding from the National Endowment for the Humanities.

Currently there are 18040 elections that have been digitized.

Interesting data set and certainly one that could be supplemented with all manner of other materials.

Among other things, the impact or lack thereof from extension of the voting franchise would make an interesting study.

Enjoy!

EnviroAtlas

Monday, May 12th, 2014

EnviroAtlas

From the homepage:

What is EnviroAtlas?

EnviroAtlas is a collection of interactive tools and resources that allows users to explore the many benefits people receive from nature, often referred to as ecosystem services. Key components of EnviroAtlas include the following:


Why is EnviroAtlas useful?

Though critically important to human well-being, ecosystem services are often overlooked. EnviroAtlas seeks to measure and communicate the type, quality, and extent of the goods and services that humans receive from nature so that their true value can be considered in decision-making processes.

Using EnviroAtlas, many types of users can access, view, and analyze diverse information to better understand how various decisions can affect an array of ecological and human health outcomes. EnviroAtlas is available to the public and houses a wealth of data and research.

EnvironAtlas integrates over 300 data layers listed in: Available EnvironAtlas data.

News about the cockroaches infesting the United States House/Senate makes me forget there are agencies laboring to provide benefits to citizens.

Whether this environmental goldmine will be enough to result in a saner environmental policy remains to be seen.

I first saw this in a tweet by Margaret Palmer.

R Client for the U.S. Federal Register API

Thursday, May 8th, 2014

R Client for the U.S. Federal Register API by Thomas Leeper.

From the webpage:

This package provides access to the API for the United States Federal Register. The API provides access to all Federal Register contents since 1994, including Executive Orders by Presidents Clinton, Bush, and Obama and all “Public Inspection” Documents made available prior to publication in the Register. The API returns basic details about each entry in the Register and provides URLs for HTML, PDF, and plain text versions of the contents thereof, and the data are fully searchable. The federalregister package provides access to all version 1 API endpoints.

If you are interested in law, policy development, or just general awareness of government activity, this is an important client for you!

More than 30 years ago I had a hard copy subscription to the Federal Register. Even then it was a mind numbing amount of detail. Today it is even worse.

This API enables any number of business models based upon quick access to current and historical Federal Register data.

Enjoy!

IRS Data?

Wednesday, April 9th, 2014

New, Improved IRS Data Available on OpenSecrets.org by Robert Maguire.

From the post:

Among the more than 160,000 comments the IRS received recently on its proposed rule dealing with candidate-related political activity by 501(c)(4) organizations, the Center for Responsive Politics was the only organization to point to deficiencies in a critical data set the IRS makes available to the public.

This month, the IRS released the newest version of that data, known as 990 extracts, which have been improved considerably. Now, the data is searchable and browseable on OpenSecrets.org.

“Abysmal” IRS data

Back in February, CRP had some tough words for the IRS concerning the information. In the closing pages of our comment on the agency’s proposed guidelines for candidate-related political activity, we wrote that “the data the IRS provides to the public — and the manner in which it provides it — is abysmal.”

While I am glad to see better access to 501(c) 990 data, in a very real sense, this isn’t “IRS data” is it?

This is data that the government collected under penalty of law from tax entities in the United States.

Granting it was sent in “voluntarily” but there is a lot of data that entities and individuals send to local, state and federal government “voluntarily.” Not all of it is data that most of us would want handed out because other people are curious.

As I said, I like better access to 990 data but we need to distinguish between:

  1. Government sharing data it collected from citizens or other entities, and
  2. Government sharing data about government meetings, discussions, contacts with citizens/contractors, policy making, processes and the like.

If I’m not seriously mistaken, most of the open data from government involves a great deal of #1 and very little of #2.

Is that your impression as well?

One quick example. The United States Congress, with some reluctance, seems poised on delivery of near real-time information on legislative proposals before Congress. Which is a good thing.

But there has been no discussion of tracking the final editing of bills to trace the insertion or deletion of language by who and with whose agreement? Which is a bad thing.

It makes no difference how public the process is up to final edits, if the final version is voted upon before changes can be found and charged to those responsible.

USGS Maps!

Tuesday, March 25th, 2014

USGS Maps (Google Map Gallery)

Wicked cool!

Followed a link from this post:

Maps were made for public consumption, not for safekeeping under lock and key. From the dawn of society, people have used maps to learn what’s around us, where we are and where we can go.

Since 1879, the U.S. Geological Survey (USGS) has been dedicated to providing reliable scientific information to better understand the Earth and its ecosystems. Mapping is an integral part of what we do. From the early days of mapping on foot in the field to more modern methods of satellite photography and GPS receivers, our scientists have created over 193,000 maps to understand and document changes to our environment.

Government agencies and NGOs have long used our maps everything from community planning to finding hiking trails. Farmers depend on our digital elevation data to help them produce our food. Historians look to our maps from years past to see how the terrain and built environment have changed over time.

While specific groups use USGS as a resource, we want the public at-large to find and use our maps, as well. The content of our maps—the information they convey about our land and its heritage—belongs to all Americans. Our maps are intended to serve as a public good. The more taxpayers use our maps and the more use they can find in the maps, the better.

We recognize that our expertise lies in mapping, so partnering with Google, which has expertise in Web design and delivery, is a natural fit. Google Maps Gallery helps us organize and showcase our maps in an efficient, mobile-friendly interface that’s easy for anyone to find what they’re looking for. Maps Gallery not only publishes USGS maps in high-quality detail, but makes it easy for anyone to search for and discover new maps.

My favorite line:

Maps were made for public consumption, not for safekeeping under lock and key.

Very true. Equally true for all the research and data that is produced at the behest of the government.

XDATA – DARPA

Sunday, March 23rd, 2014

XDATA – DARPA

From the about page:

The DARPA Open Catalog is a list of DARPA-sponsored open source software products and related publications. Each resource link shown on this site links back to information about each project, including links to the code repository and software license information.

This site reorganizes the resources of the Open Catalog (specifically the XDATA program) in a way that is easily sortable based on language, project or team. More information about XDATA’s open source software toolkits and peer-reviewed publications can be found on the DARPA Open Catalog, located at http://www.darpa.mil/OpenCatalog/.

For more information about this site, e-mail us at piim@newschool.edu.

A great public service for anyone interested in DARPA XDATA projects.

You could view this as encouragement to donate time to government hackathons.

I disagree.

Donating services to an organization that pays for IT and then accepts crap results, encourages poor IT management.

Possible Elimination of FR and CFR indexes (Pls Read, Forward, Act)

Saturday, March 22nd, 2014

Possible Elimination of FR and CFR indexes

I don’t think I have ever posted with (Pls Read, Forward, Act) in the headline, but this merits it.

From the post:

Please see the following message from Emily Feltren, Director of Government Relations for AALL, and contact her if you have any examples to share.

Hi Advocates—

Last week, the House Oversight and Government Reform Committee reported out the Federal Register Modernization Act (HR 4195). The bill, introduced the night before the mark up, changes the requirement to print the Federal Register and Code of Federal Regulations to “publish” them, eliminates the statutory requirement that the CFR be printed and bound, and eliminates the requirement to produce an index to the Federal Register and CFR. The Administrative Committee of the Federal Register governs how the FR and CFR are published and distributed to the public, and will continue to do so.

While the entire bill is troubling, I most urgently need examples of why the Federal Register and CFR indexes are useful and how you use them. Stories in the next week would be of the most benefit, but later examples will help, too. I already have a few excellent examples from our Print Usage Resource Log – thanks to all of you who submitted entries! But the more cases I can point to, the better.

Interestingly, the Office of the Federal Register itself touted the usefulness of its index when it announced the retooled index last year: https://www.federalregister.gov/blog/2013/03/new-federal-register-index.

Thanks in advance for your help!

Emily Feltren
Director of Government Relations

American Association of Law Libraries

25 Massachusetts Avenue, NW, Suite 500

Washington, D.C. 20001

202/942-4233

efeltren@aall.org

This is seriously bad news so I decided to look up the details.

Federal Register

Title 44, Section 1504 Federal Register, currently reads in part:

Documents required or authorized to be published by section 1505 of this title shall be printed and distributed immediately by the Government Printing Office in a serial publication designated the ”Federal Register.” The Public Printer shall make available the facilities of the Government Printing Office for the prompt printing and distribution of the Federal Register in the manner and at the times required by this chapter and the regulations prescribed under it. The contents of the daily issues shall be indexed and shall comprise all documents, required or authorized to be published, filed with the Office of the Federal Register up to the time of the day immediately preceding the day of distribution fixed by regulations under this chapter. (emphasis added)

By comparison, H.R. 4195 — 113th Congress (2013-2014) reads in relevant part:

The Public Printer shall make available the facilities of the Government Printing Office for the prompt publication of the Federal Register in the manner and at the times required by this chapter and the regulations prescribed under it. (Missing index language here.) The contents of the daily issues shall constitute all documents, required or authorized to be published, filed with the Office of the Federal Register up to the time of the day immediately preceding the day of publication fixed by regulations under this chapter.

Code of Federal Regulations (CFRs)

Title 44, Section 1510 Code of Federal Regulations, currently reads in part:

(b) (b) A codification published under subsection (a) of this section shall be printed and bound in permanent form and shall be designated as the ”Code of Federal Regulations.” The Administrative Committee shall regulate the binding of the printed codifications into separate books with a view to practical usefulness and economical manufacture. Each book shall contain an explanation of its coverage and other aids to users that the Administrative Committee may require. A general index to the entire Code of Federal Regulations shall be separately printed and bound. (emphasis added)

By comparison, H.R. 4195 — 113th Congress (2013-2014) reads in relevant part:

(b) Code of Federal Regulations.–A codification prepared under subsection (a) of this section shall be published and shall be designated as the `Code of Federal Regulations’. The Administrative Committee shall regulate the manner and forms of publishing this codification. (Missing index language here.)

I would say that indexes for the Federal Register and the Code of Federal Regulations are history should this bill pass as written.

Is this a problem?

Consider the task of tracking the number of pages in the Federal Register versus the pages in the Code of Federal Regulations that may be impacted:

Federal Register – > 70,000 pages per year.

The page count for final general and permanent rules in the 50-title CFR seems less dramatic than that of the oft-cited Federal Register, which now tops 70,000 pages each year (it stood at 79,311 pages at year-end 2013, the fourth-highest level ever). The Federal Register contains lots of material besides final rules. (emphasis added) (New Data: Code of Federal Regulations Expanding, Faster Pace under Obama by Wayne Crews.)

Code of Federal Regulations – 175,496 pages (2013) plus 1,170 page index.

Now, new data from the National Archives shows that the CFR stands at 175,496 at year-end 2013, including the 1,170-page index. (emphasis added) (New Data: Code of Federal Regulations Expanding, Faster Pace under Obama by Wayne Crews.)

The bottom line is there are 175,496 pages being impacted by more than 70,000 pages per year, published in a week-day publication.

We don’t need indexes to access that material?

Congress, I don’t think “access” means what you think it means.

PS: As a research guide, you are unlikely to do better than: A Research Guide to the Federal Register and the Code of Federal Regulations by Richard J. McKinney at the Law Librarians’ Society of Washington, DC website.

I first saw this in a tweet by Aaron Kirschenfeld.

UK statistics and open data…

Tuesday, March 18th, 2014

UK statistics and open data: MPs’ inquiry report published Owen Boswarva.

From the post:

This morning the Public Administration Select Committee (PASC), a cross-party group of MPs chaired by Bernard Jenkin, published its report on Statistics and Open Data.

This report is the product of an inquiry launched in July 2013. Witnesses gave oral evidence in three sessions; you can read the transcripts and written evidence as well.

Useful if you are looking for rhetoric and examples of use of government data.

Ironic that just last week the news broke that Google has given British security the power to censor “unsavory” (but legal) content from Youtube. UK gov wants to censor legal but “unsavoury” YouTube content by Lisa Vaas.

Lisa writes:

Last week, the Financial Times revealed that Google has given British security the power to quickly yank terrorist content offline.

The UK government doesn’t want to stop there, though – what it really wants is the power to pull “unsavoury” content, regardless of whether it’s actually illegal – in other words, it wants censorship power.

The news outlet quoted UK’s security and immigration minister, James Brokenshire, who said that the government must do more to deal with material “that may not be illegal but certainly is unsavoury and may not be the sort of material that people would want to see or receive.”

I’m not sure why the UK government wants to block content that people don’t want to see or receive. They simply won’t look at it. Yes?

But, intellectual coherence has never been a strong point of most governments and the UK in particular of late.

Is this more evidence for my contention that “open data” for government means only the data government wants you to have?

The FIRST Act, Retro Legislation?

Tuesday, March 11th, 2014

Language in FIRST act puts United States at Severe Disadvantage Against International Competitors by Ranit Schmelzer.

From the press release:

The Scholarly Publishing and Academic Research Coalition (SPARC), an international alliance of nearly 800 academic and research libraries, today announced its opposition to Section 303 of H.R. 4186, the Frontiers in Innovation, Research, Science and Technology (FIRST) Act. This provision would impose significant barriers to the public’s ability to access the results of taxpayer-funded research.

Section 303 of the bill would undercut the ability of federal agencies to effectively implement the widely supported White House Directive on Public Access to the Results of Federally Funded Research and undermine the successful public access program pioneered by the National Institutes of Health (NIH) – recently expanded through the FY14 Omnibus Appropriations Act to include the Departments Labor, Education and Health and Human Services. Adoption of Section 303 would be a step backward from existing federal policy in the directive, and put the U.S. at a severe disadvantage among our global competitors.

“This provision is not in the best interests of the taxpayers who fund scientific research, the scientists who use it to accelerate scientific progress, the teachers and students who rely on it for a high-quality education, and the thousands of U.S. businesses who depend on public access to stay competitive in the global marketplace,” said Heather Joseph, SPARC Executive Director. “We will continue to work with the many bipartisan members of the Congress who support open access to publicly funded research to improve the bill.”

[the parade of horribles follows]

SPARC‘s press release never quotes a word from H.R. 4186. Not one. Commentary but nary a part of its object.

I searched at Thomas (the Congressional information service at the Library of Congress), for H.R. 4186 and came up empty by bill number. Switching to the Congressional Record for Monday, March 10, 2014, I did find the bill being introduced and the setting of a hearing on it. The GPO as not (as of today) posted the text of H.R. 4186, but when it does, follow this link: H.R. 4186.

Even more importantly, SPARC doesn’t point out who is responsible for the objectionable section appearing in the bill. Bills don’t write themselves and as far as I know, Congress doesn’t have a random bill generator.

The bottom line is that someone, an identifiable someone, asked for longer embargo wording to be included. If the SPARC press release is accurate, the most likely someone’s asked are Chairman Lamar Smith (R-TX 21st District) or Rep. Larry Bucshon (R-IN 8th District).

The Wikipedia page on the 8th Congressional District in Illinois needs to be updated but it also fails to mention that the 8th district is to the West and North-West of Chicago. You might want to check Bucshon‘s page at Wikipedia and links there to other resources.

Wikipedia on the 21st Congressional District of Texas, places it north of San Antonio, the seventh largest city in the United States. Lamar Smith‘s page at Wikipedia has some interested reading.

Odds are in and around Chicago and San Antonio there are people interested in longer embargo periods on federally funded research.

Those are at least some starting points for effective opposition to this legislation, assuming it was reported accurately by SPARC. Let’s drop the pose of disinterested legislators trying valiantly to serve the public good. Not impossible, just highly unlikely. Let’s argue about who is getting paid and for what benefits.

Or as Captain Ahab advises:

All visible objects, man, are but as pasteboard masks. But in each event –in the living act, the undoubted deed –there, some unknown but still reasoning thing puts forth the mouldings of its features from behind the unreasoning mask. If man will strike, strike through the mask! [Melville, Moby Dick, Chapter XXXVI]

Legislation as a “pasteboard mask” is a useful image. There is not a contour, dimple, shade or expression that wasn’t bought and paid for by someone. You have to strike through the mask to discover who.

Are you game?

PS: Curious, where would you go next (data wise, I don’t have the energy to lurk in garages) in terms of searching for the buyers of longer embargoes in H.R. 4186?

Visualising UK Ministerial Lobbying…

Thursday, March 6th, 2014

Visualising UK Ministerial Lobbying & “Buddying” Over Eight Months by Roland Dunn.

From the post:

barclays

[This is a companion piece to our visualisation of ministerial lobbying – open it up and take a look!].

Eight Months Worth of Lobbying Data

Turns out that James Ball, together with the folks at Who’s Lobbying had collected together all the data regarding ministerial meetings from all the different departments across the UK’s government (during May to December 2010), tidied the data up, and put them together in one spreadsheet: https://docs.google.com/spreadsheet/ccc?key=0AhHlFdx-QwoEdENhMjAwMGxpb2kyVnlBR2QyRXJVTFE.

It’s important to understand that despite the current UK government stating that it is the most open and transparent ever, each department publishes its ministerial meetings in ever so slightly different formats. On that page for example you can see Dept of Health Ministerial gifts, hospitality, travel and external meetings January to March 2013, and DWP ministers’ meetings with external organisations: January to March 2013. Two lists containing slightly different sets of data. So, the work that Who’s Lobbying and James Ball did in tallying this data up is considerable. But not many people have the time to tie such data-sets together, meaning the data contained in them is somewhat more opaque than you might at first be led to believe. What’s needed is one pan-governmental set of data.

An example to follow in making “open” data a bit more “transparent.”

Not entirely transparent for as the author notes, minutes from the various meetings are not available.

Or I suppose when minutes are available, their completeness would be questionable.

I first saw this in a tweet by Steve Peters.

Data Science – Chicago

Monday, March 3rd, 2014

OK, I shortened the headline.

The full headline reads: Accenture and MIT Alliance in Business Analytics launches data science challenge in collaboration with Chicago: New annual contest for MIT students to recognize best data analytics and visualization ideas.: The Accenture and MIT Alliance in Business Analytics

Don’t try that without coffee in the morning.

From the post:

The Accenture and MIT Alliance in Business Analytics have launched an annual data science challenge for 2014 that is being conducted in collaboration with the city of Chicago.

The challenge invites MIT students to analyze Chicago’s publicly available data sets and develop data visualizations that will provide the city with insights that can help it better serve residents, visitors, and businesses. Through data visualization, or visual renderings of data sets, people with no background in data analysis can more easily understand insights from complex data sets.

The headline is longer than the first paragraph of the story.

I didn’t see an explanation for why the challenge is limited to:

The challenge is now open and ends April 30. Registration is free and open to active MIT students 18 and over (19 in Alabama and Nebraska). Register and see the full rule here: http://aba.mit.edu/challenge.

Find a sponsor and setup an annual data mining challenge for your school or organization.

Although I would suggest you take a pass on Bogata, Mexico City, Rio de Janeiro, Moscow, Washington, D.C. and similar places where truthful auditing could be hazardous to your health.

Or as one of my favorite Dilbert cartoons had the pointy-haired boss observing:

When you find a big pot of crazy it’s best not to stir it.

One Thing Leads To Another (#NICAR2014)

Sunday, March 2nd, 2014

A tweet this morning read:

overviewproject ‏@overviewproject 1h
.@djournalismus talking about handling 2.5 million offshore leaks docs. Content equivalent to 50,000 bibles. #NICAR14

That sound interesting! Can’t ever tell when a leaked document will prove useful. But where to find this discussion?

Following #NICAR14 leaves you with the impression this is a conference. (I didn’t recognize the hashtag immediately.)

Searching on the web, the hashtag lead me to: 2014 Computer-Assisted Reporting Conference. (NICAR = National Institute for Computer-Assisted Reporting)

The handle @djournalismus offers the name Sebastian Mondia.

Checking the speakers list, I found this presentation:

Inside the global offshore money maze
Event: 2014 CAR Conference
Speakers: David Donald, Mar Cabra, Margot Williams, Sebastian Mondial
Date/Time: Saturday, March 1 at 2 p.m.
Location: Grand Ballroom West
Audio file: No audio file available.

The International Consortium of Investigative Journalists “Secrecy For Sale: Inside The Global Offshore Money Maze” is one of the largest and most complex cross-border investigative projects in journalism history. More than 110 journalists in about 60 countries analyzed a 260 GB leaked hard drive to expose the systematic use of tax havens. Learn how this multinational team mined 2.5 million files and cracked open the impenetrable offshore world by creating a web app that revealed the ownership behind more than 100,000 anonymous “shell companies” in 10 offshore jurisdictions.

Along the way I discovered the speakers list, who cover a wide range of subjects of interest to anyone mining data.

Another treasure is the Tip Sheets and Tutorial page. Here are six (6) selections out of sixty-one (61) items to pique your interest:

  • Follow the Fracking
  • Maps and charts in R: real newsroom examples
  • Wading through the sea of data on hospitals, doctors, medicine and more
  • Free the data: Getting government agencies to give up the goods
  • Campaign Finance I: Mining FEC data
  • Danger! Hazardous materials: Using data to uncover pollution

Not to mention that NICAR2012 and NICAR2013 are also accessible from the NICAR2014 page, with their own “tip” listings.

If you find this type of resource useful, be sure to check out Investigative Reporters and Editors (IRE)

About the IRE:

Investigative Reporters and Editors, Inc. is a grassroots nonprofit organization dedicated to improving the quality of investigative reporting. IRE was formed in 1975 to create a forum in which journalists throughout the world could help each other by sharing story ideas, newsgathering techniques and news sources.

IRE provides members access to thousands of reporting tip sheets and other materials through its resource center and hosts conferences and specialized training throughout the country. Programs of IRE include the National Institute for Computer Assisted Reporting, DocumentCloud and the Campus Coverage Project

Learn more about joining IRE and the benefits of membership.

Sounds like a win-win offer to me!

You?

SEC Filings for Humans

Tuesday, February 25th, 2014

SEC Filings for Humans by Meris Jensen.

After a long and sad history of the failure of the SEC to make EDGAR useful:

Rank and Filed gathers data from EDGAR, indexes it, and returns it in formats meant to help investors research, investigate and discover companies on their own. I started googling ‘How to build a website’ seven months ago. The SEC has web developers, software developers, database administrators, XBRL experts, legions of academics who specialize in SEC filings, and all this EDGAR data already cached in the cloud. The Commission’s mission is to protect investors, maintain fair, orderly and efficient markets, and facilitate capital formation. Why did I have to build this? (emphasis added)

I don’t know the answer to Meris’ question but I can tell you that Rank and Filed is an incredible resource for financial information.

And yet another demonstration that government should not manage open data. Make it available. (full stop)

I first saw this at Nathan Yau’s A human-readable explorer for SEC filings.

R Markdown:… [Open Analysis, successor to Open Data?]

Tuesday, February 25th, 2014

R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics by Ben Baumer, et.al.

Abstract:

Nolan and Temple Lang argue that “the ability to express statistical computations is an essential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation. (emphasis in original)

The author’s third point for R Markdown I would have made the first:

Third, the separation of computing from presentation is not necessarily honest… More subtly and less perniciously, the copy-and-paste paradigm enables, and in many cases even encourages, selective reporting. That is, the tabular output from R is admittedly not of presentation quality. Thus the student may be tempted or even encouraged to prettify tabular output before submitting. But while one is fi ddling with margins and headers, it is all too tempting to remove rows or columns that do not suit the student’s purpose. Since the commands used to generate the table are not present, the reader is none the wiser.

Although I have to admit that reproducibility has a lot going for it.

Can you imagine reproducible analysis from the OMB? Complete with machine readable data sets? Or for any other agency reports. Or for that matter, for all publications by registered lobbyists. That could be real interesting.

Open Analysis (OA) as a natural successor to Open Data.

That works for me.

You?

PS: More resources:

Create Dynamic R Statistical Reports Using R Markdown

R Markdown

Using R Markdown with RStudio

Writing papers using R Markdown

If journals started requiring R Markdown as a condition for publication, some aspects of research would become more transparent.

Some will say that authors will resistl

Assume Science or Nature has accepted your article on the condition of your use of R Markdown.

Honestly, are you really going to say no?

I first saw this in a tweet by Scott Chamberlain.

OpenRFPs:…

Saturday, February 22nd, 2014

OpenRFPs: Open RFP Data for All 50 States by Clay Johnson.

From the post:

Tomorrow at CodeAcross we’ll be launching our first community-based project, OpenRFPs. The goal is to liberate the data inside of every state RFP listing website in the country. We hope you’ll find your own state’s RFP site, and contribute a parser.

The Department of Better Technology’s goal is to improve the way government works by making it easier for small, innovative businesses to provide great technology to government. But those businesses can barely make it through the front door when the RFPs themselves are stored in archaic systems, with sloppy user interfaces and disparate data formats, or locked behind paywalls.

I have posted to the announcement suggesting they use UBL. But in any event, mapping the semantics of RFPs, to enable wider participation would make an interesting project.

I first saw this in a tweet by Tim O’Reilly.

Fiscal Year 2015 Budget (US) Open Government?

Friday, February 21st, 2014

Fiscal Year 2015 Budget

From the description:

Each year, the Office of Management and Budget (OMB) prepares the President’s proposed Federal Government budget for the upcoming Federal fiscal year, which includes the Administration’s budget priorities and proposed funding.

For Fiscal Year (FY) 2015– which runs from October 1, 2014, through September 30, 2015– OMB has produced the FY 2015 Federal Budget in four print volumes plus an all-in-one CD-ROM:

  1. the main “Budget” document with the Budget Message of the President, information on the President’s priorities and budget overviews by agency, and summary tables;
  2. “Analytical Perspectives” that contains analyses that are designed to highlight specified subject areas;
  3. “Historical Tables” that provides data on budget receipts, outlays, surpluses or deficits, Federal debt over a time period
  4. an “Appendix” with detailed information on individual Federal agency programs and appropriation accounts that constitute the budget.
  5. A CD-ROM version of the Budget is also available which contains all the FY 2015 budget documents in PDF format along with some additional supporting material in spreadsheet format.

You will also want a “Green Book,” the 2014 version carried this description:

Each February when the President releases his proposed Federal Budget for the following year, Treasury releases the General Explanations of the Administration’s Revenue Proposals. Known as the “Green Book” (or Greenbook), the document provides a concise explanation of each of the Administration’s Fiscal Year 2014 tax proposals for raising revenue for the Government. This annual document clearly recaps each proposed change, reviewing the provisions in the Current Law, outlining the Administration’s Reasons for Change to the law, and explaining the Proposal for the new law. Ideal for anyone wanting a clear summary of the Administration’s policies and proposed tax law changes.

Did I mention that the four volumes for the budget in print with CD-ROM are $250? And last year the Green Book was $75?

For $325.00, you can have a print and pdf of the Budget plus a print copy of the Green Book.

Questions:

  1. Would machine readable versions of the Budget + Green Book make it easier to explore and compare the information within?
  2. Are PDFs and print volumes what President Obama considers to be “open government?”
  3. Who has the advantage in policy debates, the OMB and Treasury with machine readable versions of these documents or the average citizen who has the PDFs and print?
  4. Do you think OMB and Treasury didn’t get the memo? Open Data Policy-Managing Information as an Asset

Public policy debates cannot be fairly conducted without meaningful access to data on public policy issues.

Islamic Finance: A Quest for Publically Available Bank-level Data

Wednesday, February 12th, 2014

Islamic Finance: A Quest for Publically Available Bank-level Data by Amin Mohseni-Cheraghlou.

From the post:

Attend a seminar or read a report on Islamic finance and chances are you will come across a figure between $1 trillion and $1.6 trillion, referring to the estimated size of the global Islamic assets. While these aggregate global figures are frequently mentioned, publically available bank-level data have been much harder to come by.

Considering the rapid growth of Islamic finance, its growing popularity in both Muslim and non-Muslim countries, and its emerging role in global financial industry, especially after the recent global financial crisis, it is imperative to have up-to-date and reliable bank-level data on Islamic financial institutions from around the globe.

To date, there is a surprising lack of publically available, consistent and up-to-date data on the size of Islamic assets on a bank-by-bank basis. In fairness, some subscription-based datasets, such Bureau Van Dijk’s Bankscope, do include annual financial data on some of the world’s leading Islamic financial institutions. Bank-level data are also compiled by The Banker’s Top Islamic Financial Institutions Report and Ernst & Young’s World Islamic Banking Competitiveness Report, but these are not publically available and require subscription premiums, making it difficult for many researchers and experts to access. As a result, data on Islamic financial institutions are associated with some level of opaqueness, creating obstacles and challenges for empirical research on Islamic finance.

The recent opening of the Global Center for Islamic Finance by World Bank Group President Jim Young Kim may lead to exciting venues and opportunities for standardization, data collection, and empirical research on Islamic finance. In the meantime, the Global Financial Development Report (GFDR) team at the World Bank has also started to take some initial steps towards this end.

I can think of two immediate benefits from publicly available data on Islamic financial institutions:

First, hopefully it will increase demands for meaningful transparency in Western financial institutions.

Second, it will blunt government hand waving and propaganda about the purposes of Islamic financial institutions. Which on a par with financial institutions everywhere want to remain solvent, serve the needs of their customers and play active roles in their communities. Nothing more sinister than that.

Perhaps the best way to vanquish suspicion is with transparency. Except for the fringe cases who treat lack of evidence as proof of secret evil doing.

…Desperately Seeking Data Integration

Tuesday, January 21st, 2014

Why the US Government is Desperately Seeking Data Integration by David Linthicum.

From the post:

“When it comes to data, the U.S. federal government is a bit of a glutton. Federal agencies manage on average 209 million records, or approximately 8.4 billion records for the entire federal government, according to Steve O’Keeffe, founder of the government IT network site, MeriTalk.”

Check out these stats, in a December 2013 MeriTalk survey of 100 federal records and information management professionals. Among the findings:

  • Only 18 percent said their agency had made significant progress toward managing records and email in electronic format, and are ready to report.
  • One in five federal records management professionals say they are “completely prepared” to handle the growing volume of government records.
  • 92 percent say their agency “has a lot of work to do to meet the direction.”
  • 46 percent say they do not believe or are unsure about whether the deadlines are realistic and obtainable.
  • Three out of four say the Presidential Directive on Managing Government Records will enable “modern, high-quality records and information management.”

I’ve been working with the US government for years, and I can tell that these facts are pretty accurate. Indeed, the paper glut is killing productivity. Even the way they manage digital data needs a great deal of improvement.

I don’t doubt a word of David’s post. Do you?

What I do doubt is the ability of the government to integrate its data. At least unless and until it makes some fundamental choices about the route it will take to data integration.

First, replacement of existing information systems is a non-goal. Unless that is an a prioriassumption, the politics, both on Capital Hill and internal to any agency, program, etc. will doom a data integration effort before it begins.

The first non-goal means that the ROI of data integration must be high enough to be evident even with current systems in place.

Second, integration of the most difficult cases is not the initial target for any data integration project. It would be offensive to cite all the “boil the ocean” projects that have failed in Washington, D.C. Let’s just agree that judicious picking of high value and reasonable effort integration cases are a good proving ground.

Third, the targets and costs for meeting those targets of data integration, along with expected ROI, will be agreed upon by all parties before any work starts. Avoidance of mission creep is essential to success. Not to mention that public goals and metrics will enable everyone to decide if the goals have been meet.

Fourth, employment of traditional vendors, unemployed programmers, geographically dispersed staff, etc. are also non-goals of the project. With the money that can be saved by robust data integration, departments can feather their staffs as much as they like.

If you need proof of the fourth requirement, consider the various Apache projects that are now the the underpinnings for “big data” in its many forms.

It is possible to solve the government’s data integration issues. But not without some hard choices being made up front about the project.

Sorry, forgot one:

Fifth, the project leader should seek a consensus among the relevant parties but ultimately has the authority to make decisions for the project. If every dispute can have one or more parties running to their supervisor or congressional backer, the project is doomed before it starts. The buck stops with the project manager and no where else.

Extracting Insights – FBO.Gov

Tuesday, January 21st, 2014

Extracting Insights from FBO.Gov data – Part 1

Extracting Insights from FBO.Gov data – Part 2

Extracting Insights from FBO.Gov data – Part 3

Dave Fauth has written a great three part series on extracting “insights” from large amounts of data.

From the third post in the series:

Earlier this year, Sunlight foundation filed a lawsuit under the Freedom of Information Act. The lawsuit requested solication and award notices from FBO.gov. In November, Sunlight received over a decade’s worth of information and posted the information on-line for public downloading. I want to say a big thanks to Ginger McCall and Kaitlin Devine for the work that went into making this data available.

In the first part of this series, I looked at the data and munged the data into a workable set. Once I had the data in a workable set, I created some heatmap charts of the data looking at agencies and who they awarded contracts to. In part two of this series, I created some bubble charts looking at awards by Agency and also the most popular Awardees.

In the third part of the series, I am going to look at awards by date and then displaying that information in a calendar view. Then we will look at the types of awards.

For the date analysis, we are going to use all of the data going back to 2000. We have six data files that we will join together, filter on the ‘Notice Type’ field, and then calculate the counts by date for the awards. The goal is to see when awards are being made.

The most compelling lesson from this series is that data doesn’t always easily give up its secrets.

If you make it to the end of the series, you will find the government, on occasion, does the right thing. I’ll admit it, I was very surprised. ;-)

Medicare Spending Data…

Sunday, January 19th, 2014

Medicare Spending Data May Be Publicly Available Under New Policy by Gavin Baker.

From the post:

On Jan. 14, the Centers for Medicare & Medicaid Services (CMS) announced a new policy that could bring greater transparency to Medicare, one of the largest programs in the federal government. CMS revoked its long-standing policy not to release publicly any information about Medicare’s payments to doctors. Under the new policy, the agency will evaluate requests for such information on a case-by-case basis. Although the impact of the change is not yet clear, it creates an opportunity for a welcome step forward for data transparency and open government.

Medicare’s tremendous size and impact – expending an estimated $551 billion and covering roughly 50 million beneficiaries in 2012 – mean that increased transparency in the program could have big effects. Better access to Medicare spending data could permit consumers to evaluate doctor quality, allow journalists to identify waste or fraud, and encourage providers to improve health care delivery.

Until now, the public hasn’t been able to learn how much Medicare pays to particular medical businesses. In 1979, a court blocked Medicare from releasing such information after doctors fought to keep it secret. However, the court lifted the injunction in May 2013, freeing CMS to consider whether to release the data.

In turn, CMS asked for public comments about what it should do and received more than 130 responses. The Center for Effective Government was among the organizations that filed comments, calling for more transparency in Medicare spending and urging CMS to revoke its previous policy implementing the injunction. After considering those comments, CMS adopted its new policy.

The change may allow the public to examine the reimbursement amounts paid to medical providers under Medicare. Under the new approach, CMS will not release those records wholesale. Instead, the agency will wait for specific requests for the data and then evaluate each to consider if disclosure would invade personal privacy. While information about patients is clearly off-limits, it’s not clear what kind of information about doctors CMS will consider private, so it remains to be seen how much information is ultimately disclosed under the new policy. It should be noted, however, that the U.S. Supreme Court has held that businesses don’t have “personal privacy” under the Freedom of Information Act (FOIA), and the government already discloses the amounts it pays to other government contractors.

The announcement from CMS: Modified Policy on Freedom of Information Act Disclosure of Amounts Paid to Individual Physicians under the Medicare Program

The case by case determination of a physician’s privacy rights is an attempt to discourage requests for public information.

If all physician payment data, say by procedure, were available in state by state data sets, local residents in a town of 500 would know a 2,000 x-rays a year is on the high side. Without every knowing any patient’s identity.

If you are a U.S. resident, take this opportunity to push for greater transparency in Medicare spending. Be polite and courteous but also be persistent. You need no more reason than an interest in how Medicare is being spent.

Let’s have an FOIA (Freedom of Information Act) request pending for every physician in the United States within 90 days of the CMS rule becoming final.

It’s not final yet, but when it is, let slip the lease on the dogs of FOAI.

Data Analytic Recidivism Tool (DART) [DAFT?]

Sunday, December 29th, 2013

Data Analytic Recidivism Tool (DART)

From the website:

The Data Analytic Recidivism Tool (DART) helps answer questions about recidivism in New York City.

  • Are people that commit a certain type of crime more likely to be re-arrested?
  • What about people in a certain age group or those with prior convictions?

DART lets users look at recidivism rates for selected groups defined by characteristics of defendants and their cases.

A direct link to the DART homepage.

After looking at the interface, which groups recidivists in groups of 250, I’m not sure DART is all that useful.

It did spark an idea that might help with the federal government’s acquisition problems.

Why not create the equivalent of DART but call it:

Data Analytic Failure Tool (DAFT).

And in DAFT track federal contractors, their principals, contracts, and the program officers who play any role in those contracts.

So that when contractors fail, as so many of them do, it will be easy to track the individuals involved on both sides of the failure.

And every contract will have a preamble that recites any prior history of failure and the people involved in that failure, on all sides.

Such that any subsequent supervisor has to sign off with full knowledge of the prior lack of performance.

If criminal recidivism is to be avoided, shouldn’t failure recidivism be avoided as well?

Discover Your Neighborhood with Census Explorer

Wednesday, December 25th, 2013

Discover Your Neighborhood with Census Explorer by Michael Ratcliffe.

From the post:

Our customers often want to explore neighborhood-level statistics and see how their communities have changed over time. Our new Census Explorer interactive mapping tool makes this easier than ever. It provides statistics on a variety of topics, such as percent of people who are foreign-born, educational attainment and homeownership rate. Statistics from the 2008 to 2012 American Community Survey power Census Explorer.

While you may be familiar with other ways to find neighborhood-level statistics, Census Explorer provides an interactive map for states, counties and census tracts. You can even look at how these neighborhoods have changed over time because the tool includes information from the 1990 and 2000 censuses in addition to the latest American Community Survey statistics. Seeing these changes is possible because the annual American Community Survey replaced the decennial census long form, giving communities throughout the nation more timely information than just once every 10 years.

Topics currently available in Census Explorer:

  • Total population
  • Percent 65 and older
  • Foreign-born population percentage
  • Percent of the population with a high school degree or higher
  • Percent with a bachelor’s degree or higher
  • Labor force participation rate
  • Home ownership rate
  • Median household income

Fairly coarse (census tract level) data but should be useful for any number of planning purposes.

For example, you could cross this data with traffic ticket and arrest data to derive “police presence” statistics.

Or add “citizen watcher” data from tweets about police car # and locations.

Different data sets often use different boundaries for areas.

Consider creating topic map based filters so when the boundaries change (a favorite activity of local governments) so will your summaries of that data.

…2013 World Ocean Database…

Sunday, December 22nd, 2013

NOAA releases 2013 World Ocean Database: The largest collection of scientific information about the oceans

From the post:

NOAA has released the 2013 World Ocean Database, the largest, most comprehensive collection of scientific information about the oceans, with records dating as far back as 1772. The 2013 database updates the 2009 version and contains nearly 13 million temperature profiles, compared with 9.1 in the 2009 database, and just fewer than six million salinity measurements, compared with 3.5 in the previous database. It integrates ocean profile data from approximately 90 countries around the world, collected from buoys, ships, gliders, and other instruments used to measure the “pulse” of the ocean.

Profile data of the ocean are measurements taken at many depths, from the surface to the floor, at a single location, during the time it takes to lower and raise the measuring instruments through the water. “This product is a powerful tool being used by scientists around the globe to study how changes in the ocean can impact weather and climate,” said Tim Boyer, an oceanographer with NOAA’s National Oceanographic Data Center.

In addition to using the vast amount of temperature and salinity measurements to monitor changes in heat and salt content, the database captures other measurements, including: oxygen, nutrients, chlorofluorocarbons and chlorophyll, which all reveal the oceans’ biological structure.

For the details on this dataset see: WOD Introduction.

The introduction notes under 1.1.5 Data Fusion:

It is not uncommon in oceanography that measurements of different variables made from the same sea water samples are often maintained as separate databases by different principal investigators. In fact, data from the same oceanographic cast may be located at different institutions in different countries. From its inception, NODC recognized the importance of building oceanographic databases in which as much data from each station and each cruise as possible are placed into standard formats, accompanied by appropriate metadata that make the data useful to future generations of scientists. It was the existence of such databases that allowed the International Indian Ocean Expedition Atlas (Wyrtki, 1971) and Climatological Atlas of the World Ocean (Levitus, 1982) to be produced without the time-consuming, laborious task of gathering data from many different sources. Part of the development of WOD13 has been to expand this data fusion activity by increasing the number of variables that NODC/WDC makes available as part of standardized databases.

As the NODC (National Oceanographic Data Center) demonstrates, it is possible to curate data sources in order to present a uniform data collection.

But curated data set remains inconsistent with data sets not curated by the same authority.

And combining curated data with non-curated data requires effort with the curated data, again.

Hard to map towards a destination without knowing its location.

Topic maps can capture the basis for curation, which will enable faster and more accurate integration of foreign data sets in the future.

UNESCO Open Access Publications [Update]

Thursday, December 19th, 2013

UNESCO Open Access Publications

From the webpage:

Building peaceful, democratic and inclusive knowledge societies across the world is at the heart of UNESCO’s mandate. Universal access to information is one of the fundamental conditions to achieve global knowledge societies. This condition is not a reality in all regions of the world.

In order to help reduce the gap between industrialized countries and those in the emerging economy, UNESCO has decided to adopt an Open Access Policy for its publications by making use of a new dimension of knowledge sharing – Open Access.

Open Access means free access to scientific information and unrestricted use of electronic data for everyone. With Open Access, expensive prices and copyrights will no longer be obstacles to the dissemination of knowledge. Everyone is free to add information, modify contents, translate texts into other languages, and disseminate an entire electronic publication.

For UNESCO, adopting an Open Access Policy means to make thousands of its publications freely available to the public. Furthermore, Open Access is also a way to provide the public with an insight into the work of the Organization so that everyone is able to discover and share what UNESCO is doing.

You can access and use our resources for free by clicking here.

In May of 2013 UNESCO announced its Open Access policy.

Many organizations profess a belief in “Open Access.”

The real test is whether they practice “Open Access.”

DataViva

Thursday, December 19th, 2013

DataViva

I don’t know enough about the Brazilian economy to say if the visualizations are helpful or not.

What I can tell you is the visualizations are impressive!

Thoughts on the site as an interface to open data?

PS: This appears to be a government supported website so not all government sponsored websites are poor performers.

Aberdeen – 1398 to Present

Sunday, December 15th, 2013

A Text Analytic Approach to Rural and Urban Legal Histories

From the post:

Aberdeen has the earliest and most complete body of surviving records of any Scottish town, running in near-unbroken sequence from 1398 to the present day. Our central focus is on the ‘provincial town’, especially its articulations and interactions with surrounding rural communities, infrastructure and natural resources. In this multi-disciplinary project, we apply text analytical tools to digitised Aberdeen Burgh Records, which are a UNESCO listed cultural artifact. The meaningful content of the Records is linguistically obscured, so must be interpreted. Moreover, to extract and reuse the content with Semantic Web and linked data technologies, it must be machine readable and richly annotated. To accomplish this, we develop a text analytic tool that specifically relates to the language, content, and structure of the Records. The result is an accessible, flexible, and essential precursor to the development of Semantic Web and linked data applications related to the Records. The applications will exploit the artifact to promote Aberdeen Burgh and Shire cultural tourism, curriculum development, and scholarship.

The scholarly objective of this project is to develop the analytic framework, methods, and resource materials to apply a text analytic tool to annotate and access the content of the Burgh records. Amongst the text analytic issues to address in historical perspective are: the identification and analysis of legal entities, events, and roles; and the analysis of legal argumentation and reasoning. Amongst the legal historical issues are: the political and legal culture and authority in the Burgh and Shire, particularly pertaining to the management and use of natural resources. Having an understanding of these issues and being able to access them using Semantic Web/linked data technologies will then facilitate exploitation in applications.

This project complements a distinct, existing collaboration between the Aberdeen City & Aberdeenshire Archives (ACAA) and the University (Connecting and Projecting Aberdeen’s Burgh Records, jointly led by Andrew Mackillop and Jackson Armstrong) (the RIISS Project), which will both make a contribution to the project (see details on application form). This multi-disciplinary application seeks funding from Dot.Rural chiefly for the time of two specialist researchers: a Research Fellow to interpret the multiple languages, handwriting scripts, archaic conventions, and conceptual categories emerging from these records; and subcontracting the A-I to carry out the text analytic and linked data tasks on a given corpus of previously transcribed council records, taking the RF’s interpretation as input.

Now there’s a project for tracking changing semantics over the hills and valleys of time!

Will be interesting to see how they capture semantics that are alien to our own.

Or how they preserve relationships between ancient semantic concepts.

Requesting Datasets from the Federal Government

Friday, December 13th, 2013

Requesting Datasets from the Federal Government by Eruditio Loginquitas.

From the post:

Much has been made of “open government” of late, with the U.S.’s federal government releasing tens of thousands of data sets from pretty much all public-facing offices. Many of these sets are available off of their respective websites. Many are offered in a centralized way at DATA.gov. I finally spent some time on this site in search of datasets with location data to continue my learning of Tableau Public (with an eventual planned move to ArcMap).

I’ve been appreciating how much data are required to govern effectively but also how much data are created in the work of governance, particularly in an open and transparent society. There are literally billions of records and metrics required to run an efficient modern government. In a democracy, the tendency is to make information available—through sunshine laws and open meetings laws and data requests. The openness is particularly pronounced in cases of citizen participation, academic research, and journalistic requests. These are all aspects of a healthy interchange between citizens and their government…and further, digital government.

Public Requests for Data

One of the more charming aspects of the site involves a public thread which enables people to make requests for the creation of certain data sets by developers. People would make the case for the need for certain information. Some would offer “trades” by making promises about how they would use the data and what they would make available to the larger public. Others would simply make a request for the data. Still others would just post “requests,” which were actually just political or personal statements. (The requests site may be viewed here: https://explore.data.gov/nominate?&page;=1 .)

What datasets would you like to see?

The rejected requests can interesting, for example:

Properties Owned by Congressional Members Rejected

Congressional voting records Rejected

I don’t think the government has detailed information sufficient to answer the one about property owned by members of Congress.

On the other hand there are only 535 members so manual data mining in each state should turn up most of the public information fairly easily. The not public information could be more difficult.

The voting records request is puzzling since that is public record. And various rant groups print up their own analysis of voting records.

I don’t know, given the number of requests “Under Review” if it would be a good use of time but requesting the data behind opaque reports might illuminate the areas being hidden from transparency.