Archive for the ‘Journalism’ Category

Computational Journalism

Monday, February 25th, 2013

Computational Journalism by Jonathan Stray.

From the webpage:

Maybe it’s not obvious that computer science and journalism go together, but they do!

Computational journalism combines classic journalistic values of storytelling and public accountability with techniques from computer science, statistics, the social sciences, and the digital humanities.

This course, given at the University of Hong Kong during January-February 2013, is an advanced look at how techniques from visualization, natural language processing, social network analysis, statistics, and cryptography apply to four different areas of journalism: finding stories through data mining, communicating what you’ve learned, filtering an overwhelming volume of information, and tracking the spread of information and effects.

The course assumes knowledge of computer science, including standard algorithms and linear algebra. The assignments are in Python and require programming experience. But this introductory video, which explains the topics covered, is for everyone.

For more, see the syllabus, or jump directly to a lecture:

  1. Basics. Feature vectors, clustering, projections.
  2. Text analysis. Tokenization, TF-IDF, topic modeling.
  3. Algorithmic filters. Information overload. Newsblaster and Google News.
  4. Hybrid filters. Social networks as filters. Collaborative Filtering.
  5. Social network analysis. Using it in journalism. Centrality algorithms.
  6. Knowledge representation. Structured data. Linked open data. General Q&A.
  7. Drawing conclusions. Randomness. Competing hypotheses. Causation.
  8. Security, surveillance, and privacy. Cryptography. Threat modeling.

CS knowledge and programming experience still required.

Interfaces will lessen that need over time but that knowledge/experience will help you question when interfaces have given odd results.

I would settle for journalists who question reports, like the Mandiant advertisement on cybersecurity last week. (Crowdsourcing Cybersecurity: A Proposal (Part 1))

Even the talking heads on the PBS Sunday morning news treated it as serious content. It was poorly written/researched ad copy, nothing more.

Of course, you would have to read the first couple of pages to discover that, not just skim the press release.

I first saw this at Christophe Lalanne’s A bag of tweets / February 2013.

Finding tools vs. making tools:…

Sunday, February 17th, 2013

Finding tools vs. making tools: Discovering common ground between computer science and journalism by Nick Diakopoulos.

From the post:

The second Computation + Journalism Symposium convened recently at the Georgia Tech College of Computing to ask the broad question: What role does computation have in the practice of journalism today and in the near future? (I was one of its organizers.) The symposium attracted almost 150 participants, both technologists and journalists, to discuss and debate the issues and to forge a multi-disciplinary path forward around that question.

Topics for panels covered the gamut, from precision and data journalism, to verification of visual content, news dissemination on social media, sports and health beats, storytelling with data, longform interfaces, the new economic landscape of content, and the educational needs of aspiring journalists. But what made these sessions and topics really pop was that participants on both sides of the computation and journalism aisle met each other in a conversational format where intersections and differences in the ways they viewed these topics could be teased apart through dialogue. (Videos of the sessions are online.)

While the panelists were all too civilized for any brawls to break out, mixing two disciplines as different as computing and journalism nonetheless did lead to some interesting discussions, divergences, and opportunities that I’d like to explore further here. Keeping these issues top-of-mind should help as this field moves forward.

Tool foragers and tool forgers

The following metaphor is not meant to be incendiary, but rather to illuminate two different approaches to tool innovation that seemed apparent at the symposium.

Imagine you live about 10,000 years ago, on the cusp of the Neolithic Revolution. The invention of agriculture is just around the corner. It’s spring and you’re hungry after the long winter. You can start scrounging around for berries and other tasty roots to feed you and your family — or you can stop and try to invent some agricultural implements, tools adapted to your own local crops and soil that could lead to an era of prosperity. If you take the inventive approach, you might fail, and there’s a real chance you’ll starve trying — while foraging will likely guarantee you another year of subsistence life.

What role does computation have in your field of practice?

Simon Rogers

Wednesday, February 6th, 2013

Simon Rogers

From the “about” page:

Simon Rogers is editor of guardian.co.uk/data, an online data resource which publishes hundreds of raw datasets and encourages its users to visualise and analyse them – and probably the world’s most popular data journalism website.

He is also a news editor on the Guardian, working with the graphics team to visualise and interpret huge datasets.

He was closely involved in the Guardian’s exercise to crowdsource 450,000 MP expenses records and the organisation’s coverage of the Afghanistan and Iraq Wikileaks war logs. He was also a key part of the Reading the Riots team which investigated the causes of the 2011 England disturbances.

Previously he was the launch editor of the Guardian’s online news service and has edited the paper’s science section. He has edited three Guardian books, including How Slow Can You Waterski and The Hutton Inquiry and its impact.

If you are interested in “data journalism,” data mining or visualization, Simon’s site is one of the first to bookmark.

Crowdsourcing campaign spending: …

Thursday, December 13th, 2012

Crowdsourcing campaign spending: What ProPublica learned from Free the Files by Amanda Zamora.

From the post:

This fall, ProPublica set out to Free the Files, enlisting our readers to help us review political ad files logged with Federal Communications Commission. Our goal was to take thousands of hard-to-parse documents and make them useful, helping to reveal hidden spending in the election.

Nearly 1,000 people pored over the files, logging detailed ad spending data to create a public database that otherwise wouldn’t exist. We logged as much as $1 billion in political ad buys, and a month after the election, people are still reviewing documents. So what made Free the Files work?

A quick backstory: Free the Files actually began last spring as an effort to enlist volunteers to visit local TV stations and request access to the “public inspection file.” Stations had long been required to keep detailed records of political ad buys, but they were only available on paper and required actually traveling to the station.

In August, the FCC ordered stations in the top 50 markets to begin posting the documents online. Finally, we would be able to access a stream of political ad data based on the files. Right?

Wrong. It turns out the FCC didn’t require stations to submit the data in anything that approaches an organized, standardized format. The result was that stations sent in a jumble of difficult to search PDF files. So we decided if the FCC or stations wouldn’t organize the information, we would.

Enter Free the Files 2.0. Our intention was to build an app to help translate the mishmash of files into structured data about the ad buys, ultimately letting voters sort the files by market, contract amount and candidate or political group (which isn’t possible on the FCC’s web site), and to do it with the help of volunteers.

In the end, Free the Files succeeded in large part because it leveraged data and community tools toward a single goal. We’ve compiled a bit of what we’ve learned about crowdsourcing and a few ideas on how news organizations can adapt a Free the Files model for their own projects.

The team who worked on Free the Files included Amanda Zamora, engagement editor; Justin Elliott, reporter; Scott Klein, news applications editor; Al Shaw, news applications developer, and Jeremy Merrill, also a news applications developer. And thanks to Daniel Victor and Blair Hickman for helping create the building blocks of the Free the Files community.

The entire story is golden but a couple of parts shine brighter for me than the others.

Design consideration:

The success of Free the Files hinged in large part on the design of our app. The easier we made it for people to review and annotate documents, the higher the participation rate, the more data we could make available to everyone. Our maxim was to make the process of reviewing documents like eating a potato chip: “Once you start, you can’t stop.”

Let me re-say that: The easier it is for users to author topic maps, the more topic maps they will author.

Yes?

Semantic Diversity:

But despite all of this, we still can’t get an accurate count of the money spent. The FCC’s data is just too dirty. For example, TV stations can file multiple versions of a single contract with contradictory spending amounts — and multiple ad buys with the same contract number means radically different things to different stations. But the problem goes deeper. Different stations use wildly different contract page designs, structure deals in idiosyncratic ways, and even refer to candidates and groups differently.

All true but knowing the semantics vary ahead of time, station to station, why not map the semantics in the markets ahead of time?

Granting I second their request to the FCC to request standardized data but having standardized blocks doesn’t mean the information has the same semantics.

The OMB can’t keep the same semantics for a handful of terms in one document.

What chance is there with dozens and dozens of players in multiple documents?

A new framework for innovation in journalism: How a computer scientist would do it

Tuesday, April 10th, 2012

A new framework for innovation in journalism: How a computer scientist would do it

Andrew Phelps writes:

What if journalism were invented today? How would a computer scientist go about building it, improving it, iterating it?

He might start by mapping out some fundamental questions: What are the project’s values and goals? What consumer needs would it satisfy? How much should be automated, how much human-powered? How could it be designed to be as efficient as possible?

Computer science Ph.D. Nick Diakopoulos has attempted to create a new framework for innovation in journalism. His new white paper, commissioned by CUNY’s Tow-Knight Center for Entrepreneurial Journalism, does not provide answers so much as a different way to come up with questions.

Diakopolous identified 27 computing concepts that could apply to journalism — think natural language processing, machine learning, game engines, virtual reality, information visualization — and pored over thousands of research papers to determine which topics get the most (and least) attention. (There are untapped opportunities in robotics, augmented reality, and motion capture, it turns out.)

He thinks computer science and journalism have a lot in common, actually. They are both fundamentally concerned with information. Acquiring it, storing it, modifying it, presenting it.

Suggest you read his paper in full: Cultivating the Landscape of Innovation in Computational Journalism.

Intrigued by the idea of gauging the opportunities along a continuum of activities. Could be a stunning visual of how subject identity is handled across activities and/or technologies.

Interested?