Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 13, 2012

Crowdsourcing campaign spending: …

Filed under: Crowd Sourcing,Government Data,Journalism — Patrick Durusau @ 3:43 pm

Crowdsourcing campaign spending: What ProPublica learned from Free the Files by Amanda Zamora.

From the post:

This fall, ProPublica set out to Free the Files, enlisting our readers to help us review political ad files logged with Federal Communications Commission. Our goal was to take thousands of hard-to-parse documents and make them useful, helping to reveal hidden spending in the election.

Nearly 1,000 people pored over the files, logging detailed ad spending data to create a public database that otherwise wouldn’t exist. We logged as much as $1 billion in political ad buys, and a month after the election, people are still reviewing documents. So what made Free the Files work?

A quick backstory: Free the Files actually began last spring as an effort to enlist volunteers to visit local TV stations and request access to the “public inspection file.” Stations had long been required to keep detailed records of political ad buys, but they were only available on paper and required actually traveling to the station.

In August, the FCC ordered stations in the top 50 markets to begin posting the documents online. Finally, we would be able to access a stream of political ad data based on the files. Right?

Wrong. It turns out the FCC didn’t require stations to submit the data in anything that approaches an organized, standardized format. The result was that stations sent in a jumble of difficult to search PDF files. So we decided if the FCC or stations wouldn’t organize the information, we would.

Enter Free the Files 2.0. Our intention was to build an app to help translate the mishmash of files into structured data about the ad buys, ultimately letting voters sort the files by market, contract amount and candidate or political group (which isn’t possible on the FCC’s web site), and to do it with the help of volunteers.

In the end, Free the Files succeeded in large part because it leveraged data and community tools toward a single goal. We’ve compiled a bit of what we’ve learned about crowdsourcing and a few ideas on how news organizations can adapt a Free the Files model for their own projects.

The team who worked on Free the Files included Amanda Zamora, engagement editor; Justin Elliott, reporter; Scott Klein, news applications editor; Al Shaw, news applications developer, and Jeremy Merrill, also a news applications developer. And thanks to Daniel Victor and Blair Hickman for helping create the building blocks of the Free the Files community.

The entire story is golden but a couple of parts shine brighter for me than the others.

Design consideration:

The success of Free the Files hinged in large part on the design of our app. The easier we made it for people to review and annotate documents, the higher the participation rate, the more data we could make available to everyone. Our maxim was to make the process of reviewing documents like eating a potato chip: “Once you start, you can’t stop.”

Let me re-say that: The easier it is for users to author topic maps, the more topic maps they will author.

Yes?

Semantic Diversity:

But despite all of this, we still can’t get an accurate count of the money spent. The FCC’s data is just too dirty. For example, TV stations can file multiple versions of a single contract with contradictory spending amounts — and multiple ad buys with the same contract number means radically different things to different stations. But the problem goes deeper. Different stations use wildly different contract page designs, structure deals in idiosyncratic ways, and even refer to candidates and groups differently.

All true but knowing the semantics vary ahead of time, station to station, why not map the semantics in the markets ahead of time?

Granting I second their request to the FCC to request standardized data but having standardized blocks doesn’t mean the information has the same semantics.

The OMB can’t keep the same semantics for a handful of terms in one document.

What chance is there with dozens and dozens of players in multiple documents?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress