Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 14, 2015

The Forgotten V of Data – Verification

Filed under: Journalism,News,Reporting — Patrick Durusau @ 6:40 am

Tools for verifying and assessing the validity of social media and user-generated content by Josh Stearns.

From the post:

“Interesting if true” is the old line about some tidbit of unverified news. Recast as “Whoa, if true” for the Twitter age, it allows people to pass on rumors without having to perform even the most basic fact-checking — the equivalent of a whisper over a quick lunch. Working journalists don’t have such luxuries, however, even with the continuous deadlines of a much larger and more competitive media landscape. A cautionary tale was the February 2015 report of the death of billionaire Martin Bouygues, head of a French media conglomerate. The news was instantly echoed across the Web, only to be swiftly retracted: The mayor of the village next to Bouygues’s hometown said that “Martin” had died. Alas, it was the wrong one.

The issue has become even knottier in the era of collaborative journalism, when nonprofessional reporting and images can be included in mainstream coverage. The information can be crucial — but it also can be wrong, and even intentionally faked. For example, two European publications, Bild and Paris Match, said they had seen a video purportedly shot within the Germanwings flight that crashed in March 2015, but doubts about such a video’s authenticity have grown. (Of course, there is a long history of image tampering, and news organizations have been culpable year after year of running — and even producing — manipulated images.)

The speed of social media and the sheer volume of user-generated content make fact-checking by reporters even more important now. Thankfully, a wide variety of digital tools have been developed to help journalists check facts quickly. This post was adapted from VerificationJunkie, a directory of tools for assessing the validity of social-media and user-generated content. The author is Josh Stearns, director of the journalism sustainability project at the Geraldine R. Dodge Foundation.

The big data crowd added veracity as a fourth V some time ago but veracity isn’t the same thing as verification. Veracity is a question of how much credit do you given the data. Verification is the process of determining the veracity of the data. Different activity with different tools.

Josh also maintains Verification Junkie, of which this post is a quick summary.

Don’t limit verification to social media only. Whatever the source, check the “facts” that it claims. You may be surprised.

March 17, 2015

…unique radicalization process now taking place in the digital era…

Filed under: News,Politics,Reporting — Patrick Durusau @ 12:22 pm

Social and news media, violent extremism, ISIS and online speech: Research review

The Journalist’s Resource is produced by Harvard’s Shorenstein Center on Media, Politics and Public Policy and is a must read site if you are a journalist or if you are interested in high quality background on current news stories. Having said that, you need to read the background materials themselves, in addition to the summary given by these reports.

It is often claimed but always absent evidence, that the social media campaigns of ISIS and other terrorist groups are “successful” and “attracting supporter,” so much so that governments pressure social media companies to censor their content.

From the review:

A March 2015 report from the Brookings Institution estimates that there at least 46,000 Twitter accounts run by supporters of the Islamic State (also known as ISIS or ISIL), a group of violent extremists that currently occupies parts of Syria and Iraq. This group has also taken to posting violent videos and recruiting materials on digital platforms, posing a dilemma for Silicon Valley companies — YouTube, Google, Twitter, Facebook and the like — as well as traditional news publishers. Facebook, for example, has grappled with whether or not to allow videos of beheadings to be viewed on its platform, and on March 16, 2015, again modified its “community standards.”

Although rising connectivity has helped make these problems more acute in the past few years, terrorism analysts have long been theorizing about an international media war and a globalized insurgency. The RAND Corporation has documented the unique radicalization process now taking place in the digital era. The dilemmas are personal for many organizations: ISIS has not only executed journalists but has even threatened employees of Twitter who seek to block accounts threatening violence.

For news media, there are hard questions about when exactly propaganda is itself newsworthy and when reporting on it serves a larger public purpose that justifies allowing access to a mass audience and amplifying a violent message, however well contextualized. This has led to questions about whether the slick production and deft use of media by ISIS is indeed just a form of “gaming” journalists. Reporting on terrorism in a globalized media environment has been the subject of much debate and research since the Sept. 11, 2001, attacks; the press has faced steady criticism for focusing too much on relatively rare violent acts while neglecting other aspects of the Muslim world, and for hyping threats and helping to sow fear.

I applaud the resources that the review assembles, including the RAND report which is cited for the proposition:

The RAND Corporation has documented the unique radicalization process now taking place in the digital era.

There’s only one problem with using that report as a source. See if you can spot it from the abstract:

This paper presents the results from exploratory primary research into the role of the internet in the radicalisation of 15 terrorists and extremists in the UK. In recent years, policymakers, practitioners and the academic community have begun to examine how the internet influences the process of radicalisation: how a person comes to support terrorism and forms of extremism associated with terrorism. This study advances the evidence base in the field by drawing on primary data from a variety of sources: evidence presented at trial, computer registries of convicted terrorists, interviews with convicted terrorists and extremists, as well as police senior investigative officers responsible for terrorist investigations. The 15 cases were identified by the research team together with the UK Association of Chief Police Officers (ACPO) and UK Counter Terrorism Units (CTU). The research team gathered primary data relating to five extremist cases (the individuals were part of the Channel programme, a UK government intervention aimed at individuals identified by the police as vulnerable to violent extremism), and ten terrorist cases (convicted in the UK), all of which were anonymised. Our research supports the suggestion that the internet may enhance opportunities to become radicalised and provide a greater opportunity than offline interactions to confirm existing beliefs. However, our evidence does not necessarily support the suggestion that the internet accelerates radicalisation or replaces the need for individuals to meet in person during their radicalisation process. Finally, we didn’t find any supporting evidence for the concept of self-radicalisation through the internet. (emphasis added)

Opps! “…didn’t find any supporting evidence for the concept of self-radicalisation through the internet.”

Unlike some experts and reporters, the RAND researchers themselves call the fifteen subjects a “convenience sample” and caution against drawing conclusions based on so small a sample. But that is fifteen (15) more than were spoken to in the Brookings Institute study which is now all the rage on ISIS and Twitter.

The authors go on to point out:

The consensus is that self-radicalisation is extremely rare, if possible at all (Bermingham et al., 2009; Change Institute, 2008; Precht, 2008; Saddiq, 2010; Stevens and Neumann, 2009; Yasin, 2011 (Rand report, page 20))

Of course, the authors did base their study on primary evidence and not what might play well on the evening news. That is the most likely explanation of the difference between their conclusions and those of governments and social media companies who are goading each other towards more censorship.

I don’t doubt for a minute that media, social and otherwise plays some role in the political positions people adopt. The footage of air strikes taking the lives of women and children are as painful for some as the videos of family members being beheaded are for others.

The callous indifference of Western governments to human suffering, in pursuit of their goals and policies, is a more effective recruitment tool for terrorists than any ISIS could invent. Not to mention it has the advantage of being true.

Or to put it another way, the answer to the suffering of Palestinians, Syrians, Iraqis, etc., isn’t “Yes, but….” Conditioning a solution to human suffering on political ends is enough of an answer for everyone to choose sides.

March 4, 2015

Take two steps back from journalism:… [Your six degrees to victims/perps]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 11:36 am

Take two steps back from journalism: What are the editorial products we’re not building? by Jonathan Stray.

From the post:

The traditional goal of news is to say what just happened. That’s sort of what “news” means. But there are many more types of nonfiction information services, and many possibilities that few have yet explored.

I want to take two steps back from journalism, to see where it fits in the broader information landscape and try to imagine new things. First is the shift from content to product. A news source is more than the stories it produces; it’s also the process of deciding what to cover, the delivery system, and the user experience. Second, we need to include algorithms. Every time programmers write code to handle information, they are making editorial choices.

Imagine all the wildly different services you could deliver with a building full of writers and developers. It’s a category I’ve started calling editorial products.

In this frame, journalism is just one part of a broader information ecosystem that includes everything from wire services to Wikipedia to search engines. All of these products serve needs for factual information, and they all use some combination of professionals, participants, and software to produce and deliver it to users — the reporter plus the crowd and the algorithm. Here are six editorial products that journalists and others already produce, and six more that they could.

Jonathan’s existing editorial products list (with examples):

  • Record what just happened.
  • Locate pre-existing information.
  • Filter the information tsunami.
  • Give me background on this topic.
  • Expose wrongdoing.
  • Debunk rumors and lies.

A useful starting point to decide if a market is already saturated (or thought to be so) and how you could differentiate a new product in one of these areas. I’m not as certain as Jonathan that existing products perform well on locating pre-existing information or filtering the information tsunami. On the other hand, the low value of most queries may preclude a viable economic model for more accurate answers.

Jonathan’s potential editorial products list (with observations, VCs take note):

  • What can I do about it?
  • A moderated place for difficult discussions.
  • Personalized news that isn’t sort of terrible. [Terrible here refers to the algorithms that personalize the news.]
  • The online town hall.
  • Systematic government coverage.
  • Choose-your-own-adventure reporting.

A great starting point for discussing new editorial products. I suppose it is a refinement of “What can I do about it?” but I have a suggestion for a new editorial product: My Six Degrees.

Based on the idea of six degrees of separation (think Kevin Bacon), what if for any news report, you could enter your identification and based on the various social media sources and other data, you separation from the persons in the report could be calculated and returned to you with contact information for each step of the separation?

That has the potential to make the news you hear from other products a good deal more personal. It wouldn’t be “…too bad somebody got robbed…” it would be someone who was only two degrees of separation from you. As well as having the same revelation when someone is arrested for the crime.

Same should be true for the faceless bureaucrats that run much of the world. You would not hear “…the parole board denied clemency for someone on death row…” but rather X, Y, and Z, who are so many degrees from you denied clemency.

Could be a way to “personalize” the news in such a way as to motivate readers to take action.

Currently it would not work for everyone but there is enough data in the larger cities to “personalize” the news in a very meaningful way.

I first saw this in a tweet by Journalism Tools.

March 2, 2015

Drilling Down: A Quick Guide to Free and Inexpensive Data Tools

Filed under: Data Mining,Journalism,News,Reporting — Patrick Durusau @ 7:35 pm

Drilling Down: A Quick Guide to Free and Inexpensive Data Tools by Nils Mulvad.

From the post:

Newsrooms don’t need large budgets for analyzing data–they can easily access basic data tools that are free or inexpensive. The summary below is based on a five-day training session at Delo, the leading daily newspaper in Slovenia. Anuška Delić, journalist and project leader of DeloData at the paper, initiated the training with the aim of getting her team to work on data stories with easily available tools and a lot of new data.

“At first it seemed that not all of the 11 participants, who had no or almost no prior knowledge of this exciting field of journalism, would ‘catch the bug’ of data-driven thinking about stories, but soon it became obvious” once the training commenced, said Delić.

Encouraging story about data journalism as well as a source for inexpensive tools.

Even knowing the most basic tools will make you standout from people that repeat the government or party line (depending on where you are located).

February 27, 2015

300 Data journalism blogs [1 Feedly OPML File]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 5:44 pm

Data journalism blogs by Winny De Jong.

From the post:

At the News Impact Summit in Brussels I presented my workflow for getting ideas. Elsewhere on the blog a recap including interesting links. The RSS reader Feedly is a big part of my setup: together with Pocket its my most used app. Both are true lifesavers when reading is your default.

Since a lot op people of the News Summit audience use Feedly as well, I made this page to share my Feedly OPML file. If you’re not sure what an OPML file is read this page at Feedly.com.

Download my Feedly OPML export containing 300+ data journalism related sites here

Now that is a great way to start the weekend!

With a file of three hundred (300) data blogs!

Enjoy!

Data journalism: How to find stories in numbers

Filed under: Journalism,News,Reporting — Patrick Durusau @ 12:29 pm

Data journalism: How to find stories in numbers by Sandra Crucianelli.

From the post:

Colleagues often ask me what data journalism is. They’re confused by why it needs its own name — don’t all journalists use data?

The term is shorthand for ‘database journalism’ or ‘data-driven journalism’, where journalists find stories, or angles for stories, within large volumes of data.

It overlaps with investigative journalism in requiring lots of research, sometimes against people’s wishes. It can also overlap with data visualisation, as it requires close collaboration between journalists and digital specialists to find the best ways of presenting data.

So why get involved with spreadsheets and visualisation tools? At its most basic, adding data can give a story a new, factual dimension. But delving into datasets can also reveal new stories, or new aspects to them, that may not have otherwise surfaced.

Data journalism can also sometimes tell complicated stories more easily or clearly than relying on words alone — so it’s particularly useful for science journalists.

It can seem daunting if you’re trained in print or broadcast media. But I’ll introduce you to some new skills, and show you some excellent digital tools, so you too can soon find your feet as a data journalist.

Sandra gives as good an introduction to data journalism as you are likely to find. Her post covers everything from finding story ideas, researching relevant data, data processing and of course, presenting your findings in a persuasive way.

A must read for starting journalists but also for anyone needing an introduction to looking at data that supports a story (or not).

February 26, 2015

Gregor Aisch – Information Visualization, Data Journalism and Interactive Graphics

Filed under: Journalism,News,Reporting,Visualization — Patrick Durusau @ 8:04 pm

Gregor has two sites that I wanted to bring to your attention on information visualization, data journalism and interactive graphics.

The first one, driven-by-data.net are graphics from New York Times stories created by Gregor and others. Impressive graphics. If you are looking for visualization ideas, not a bad place to stop.

The second one, Vis4.net is a blog that features Gregor’s work. But more than a blog, if you choose the navigation links at the top of the page:

Color – Posts on color.

Code – Posts focused on code.

Cartography – Posts on cartography.

Advice – Advice (not for the lovelorn).

Archive – Archive of his posts.

Rather than a long list of categories (ahem), Gregor has divided his material into easy to recognize and use divisions.

Always nice when you see a professional at work!

Enjoy!

February 23, 2015

The Spy Cables: A glimpse into the world of espionage

Filed under: Government,News,Politics,Reporting — Patrick Durusau @ 4:13 pm

The Spy Cables: A glimpse into the world of espionage by Al Jazeera Investigative Unit.

From the post:

A digital leak to Al Jazeera of hundreds of secret intelligence documents from the world’s spy agencies has offered an unprecedented insight into operational dealings of the shadowy and highly politicised realm of global espionage.

Over the coming days, Al Jazeera’s Investigative Unit is publishing The Spy Cables, in collaboration with The Guardian newspaper.

Spanning a period from 2006 until December 2014, they include detailed briefings and internal analyses written by operatives of South Africa’s State Security Agency (SSA). They also reveal the South Africans’ secret correspondence with the US intelligence agency, the CIA, Britain’s MI6, Israel’s Mossad, Russia’s FSB and Iran’s operatives, as well as dozens of other services from Asia to the Middle East and Africa.

You need to start hitting the Al Jazeera site on a regular basis.

Kudos to Al Jazeera for the ongoing release of these documents!

On the other hand, however, I am deeply disappointed by the editing of the documents to be released:

It has not been easy to decide which Spy Cables to publish, and hundreds will not be revealed.

After verifying the cables, we had to consider whether the publication of each document served the public interest, in consultation with industry experts, lawyers, and our partners at The Guardian. Regardless of any advice received, the decision to publish has been Al Jazeera’s alone.

We believe it is important to achieve greater transparency in the field of intelligence. The events of the last decade have shown that there has been inadequate scrutiny on the activities of agencies around the world. That has allowed some to act outside their own laws and, in some cases international law.

Publishing these documents, including operational and tradecraft details, is a necessary contribution to a greater public scrutiny of their activities.

The Spy Cables also reveal that in many cases, intelligence agencies are over-classifying information and hiding behind an unnecessary veil of secrecy. This harms the ability of a democratic society to either consent to the activities of their intelligence agencies or provide adequate checks and balances to their powers.

The Spy Cables are filled with the names, personal details, and pseudonyms of active foreign intelligence operatives who work undercover for the dozens of global spy agencies referenced in the files.

We confronted the possibility that publishing identities revealed in the cables could result in harm to potentially innocent people. We agreed that publishing the names of undercover agents would pose a substantial risk to potentially unwitting individuals from around the world who had associated with these agents.

We believe we can most responsibly accomplish our goal of achieving greater transparency without revealing the identities of undercover operatives.

For these reasons, we have redacted their names. We have also redacted sections that could pose a threat to the public, such as specific chemical formulae to build explosive devices.

Finally, some of the Spy Cables have been saved for future broadcast – ones that needed further contextualisation. Regardless of when we publish, the same considerations will inform our decisions over what to redact.

The line: “…we had to consider whether the publication of each document served the public interest…” captures the source of my disappointment.

The governments who sent the cables in question could and do argue in good faith that they “consider …. the public interest” in deciding which documents should be public and which should be private.

As the cables and prior leaks make clear, the judgement of governments about “the public interest” is deeply suspect and in the aftermath of major leaks, has been shown to be completely false. The world of diplomacy has not reached a fiery end nor have nations entered wars against every other nation. Everyone blushes for a bit and then moves on.

Although I like Al Jazeera and The Guardian better than most governments, why should I trust their judgement about what secrets the public is entitled to know more than the government’s? At least some governments are in theory answerable to their populations. Whereas news organizations are entirely self-anointed.

Having said that I am sure that Al Jazeera and The Guardian will do the best they can but why not trust the public with the information that after all affects them? I don’t think of the public as ill-mannered children who need to be protected from ugly truths. As far as “innocent lives,” I find it contradictory to speak of intelligence operatives and “innocent lives,” in the same conversation.

Having chosen to betray others in the service of goals of individuals in various governments, innocence isn’t a claim intelligence operatives can make.

Release the cables as obtained by Al Jazeera. Give the public an opportunity to make its own judgements based on all the evidence.

The Value of Leaks

Filed under: Government,News,Politics,Reporting — Patrick Durusau @ 3:43 pm

The value of leaks of “secret” information cannot be over estimated.

The leaks by Edward Snowden haven’t changed the current practices of the U.S. government but they have sparked a lively debate over issues only a few suspected existed.

One specific advantage to the Snowden leaks is hopefully IT companies now realize that the government will betray them at a moment’s notice, such it be advantageous to do so. IT companies are far better off being loyal to their customer bases, as are their customers.

Another advantage to the Snowden leaks is an increased impetus for open source software. Not necessarily free software but open source so that a buyer can inspect the software for backdoors and other malware.

The most recent batch of leaks, the “Spy Cables,” appear to be of similar importance. Consider this current headline:

Mossad contradicted Netanyahu on Iran nuclear programme by Will Jordan, Rahul Radhakrishnan.

From the report:

Less than a month after Prime Minister Benjamin Netanyahu’s 2012 warning to the UN General Assembly that Iran was 70 per cent of the way to completing its “plans to build a nuclear weapon”, Israel’s intelligence service believed that Iran was “not performing the activity necessary to produce weapons”.

A secret cable obtained by Al Jazeera’s Investigative Unit reveals that Mossad sent a top-secret cable to South Africa on October 22, 2012 that laid out a “bottom line” assessment of Iran’s nuclear work.

It appears to contradict the picture painted by Netanyahu of Tehran racing towards acquisition of a nuclear bomb.

Writing that Iran had not begun the work needed to build any kind of nuclear weapon, the Mossad cable said the Islamic Republic’s scientists are “working to close gaps in areas that appear legitimate such as enrichment reactors”.

Such activities, however, “will reduce the time required to produce weapons from the time the instruction is actually given”.

The leaked information should (no guarantees) make it harder for Netanyahu to sell the U.S. Congress on something very foolish with regard to Iran and its nuclear energy program.

Just imagine how all the “scary” news would read if the public had full and free access to all the secret information circulated by governments and distorted for public consumption.

If you want a saner, better informed and safer world, leaking secret corporate and/or government documents is a step in that direction.

PS: Have you seen Snowden’s A Manifesto for the Truth?

February 22, 2015

Losing Your Right To Decide, Needlessly

Filed under: Censorship,News,Reporting,Security — Patrick Durusau @ 4:12 pm

France asks US internet giants to ‘help fight terror’

From the post:

Twitter and Facebook spokespeople said they do everything they can to stop material that incites violence but didn’t say whether they would heed the minister’s request for direct cooperation with French authorities.

“We regularly host ministers and other governmental officials from across the world at Facebook, and were happy to welcome Mr Cazeneuve today,” a Facebook spokesperson said.

“We work aggressively to ensure that we do not have terrorists or terror groups using the site, and we also remove any content that praises or supports terrorism.”

Cazeneuve [interior minister, France] said he called on the tech companies to join in the fight against extremist propaganda disseminated on the internet and to block extremists’ ability to use websites and videos to recruit and indoctrinate new followers.

The pace of foreign fighters joining the Islamic State of Iraq and the Levant and other armed groups has not slowed and at least 3,400 come from Western nations among 20,000 from around the world, US intelligence officials say.

As regular readers you have already spotted what is missing in the social media = terrorist recruitment narrative.

One obvious missing part is the lack of evidence even of correlation between social media and terrorist recruitment. None, nada, nil, zip.

There are statements about social media by Brookings Institute expert J.M. Berger who used his testimony before Congress to flog his forthcoming book with Jessica Stern, “ISIS: The State of Terror,” and in a Brooking report to be released in March, 2015. His testimony is reported in: The Evolution of Terrorist Propaganda: The Paris Attack and Social Media, where he claims IS propaganda is present on Twitter, but fails to claim any correlation, much less causation for IS recruitment. It is just assumed.

You right to hear IS “propaganda,” if indeed it is “propaganda,” is being curtailed by the U.S. government, France, Twitter, Facebook, etc. Shouldn’t you be the one who gets to use the “off” switch as it is known to decide what you will or won’t read? As an informed citizen, shouldn’t you make your own judgements about the threat, if any, that IS poses to your country?

The other, perhaps not so obvious missing point is the significance of people traveling to support IS. Taking the reported numbers at face value:

at least 3,400 come from Western nations among 20,000 from around the world

Let’s put that into perspective. As of late Sunday afternoon on the East Coast of the United States, the world population stood at: 7,226,147,500.

That’s seven billion (with a “b”), two hundred and twenty-six million, one hundred and forty-seven thousand, five hundred people.

Subtract for that the alleged 20,000 who have joined IS and you get:

Seven billion (with a “b”), two hundred and twenty-six million, one hundred and twenty-seven thousand, five hundred people (7,226,127,500.)

Really? Twitter, Facebook, the United States, France and others are going to take freedom of speech and to be informed away from Seven billion (with a “b”), two hundred and twenty-six million, one hundred and twenty-seven thousand, five hundred people (7,226,127,500) because of the potential that social media may have affected some, but we don’t know how many, of 20,000 people?

Sometimes when you run the numbers, absurd policy choices show up to be just that, absurd.

PS: A more disturbing aspect of this story is that I have seen none of the major news outlets, The New York Times, CNN, Wall Street Journal, or even the Guardian, to question the casual connection between social media and recruitment for IS. If that were true, shouldn’t there be evidence to support such a claim?


Update

Kathy Gilsinan’s Is ISIS’s Social-Media Power Exaggerated? (The Atlantic) confirms the social media impact of ISIS is on the minds of Western decision makers.

February 20, 2015

A massive database now translates news in 65 languages in real time [GDELT 2.0]

Filed under: GDELT,News,Reporting — Patrick Durusau @ 7:52 pm

A massive database now translates news in 65 languages in real time by Derrick Harris.

From the post:

I have written quite a bit about GDELT (the Global Database of Events, Languages and Tone) over the past year, because I think it’s a great example of the type of ambitious project only made possible by the advent of cloud computing and big data systems. In a nutshell, it’s database of more than 250 million socioeconomic and geopolitical events and their metadata dating back to 1979, all stored (now) in Google’s cloud and available to analyze for free via Google BigQuery or custom-built applications.

On Thursday, version 2.0 of GDELT was unveiled, complete with a slew of new features — faster updates, sentiment analysis, images, a more-expansive knowledge graph and, most importantly, real-time translation across 65 different languages. That’s 98.4 percent of the non-English content GDELT monitors. Because you can’t really have a global database, or expect to get a full picture of what’s happening around the world, if you’re limited to English language sources or exceedingly long turnaround times for translated content.

The GDELT homepage reports:

We’ll be releasing a new “Getting Started With GDELT” user guide in the next few days to walk you through the incredibly vast array of new capabilities in GDELT 2.0,…

Awesome, simply awesome!

Bear in mind that the data presented here isn’t “cooked.” That is it hasn’t been trimmed and merged with your client’s internal knowledge of “…socioeconomic and geopolitical events…” and how it impacts their interests.

For example, labor strikes in a shipping port on one continent may delay ontime shipments from a manufacturer on another for delivery to still a third continent. The information that ties all those items together is held by your client, not any public source.

There is vast sea of client data, relationships and interests to be mapped to from a resource like GDELT and the 2.0 version is simply upping the possible rewards.

Just in case you are curious:

Terms of Use

What can I do with GDELT and how can I use it in my projects?

Using GDELT

The GDELT Project is an open platform for research and analysis of global society and thus all datasets released by the GDELT Project are available for unlimited and unrestricted use for any academic, commercial, or governmental use of any kind without fee.

Redistributing GDELT

You may redistribute, rehost, republish, and mirror any of the GDELT datasets in any form. However, any use or redistribution of the data must include a citation to the GDELT Project and a link to this website (http://gdeltproject.org/).

It is hard to imagine a data resource getting any better than this!

PS: By late Spring 2015, the backfiles to 1979 will be available in GDELT 2.0 format. Maybe it can get better. 😉

PPS: See the GDELT Blog for posts on using GDELT.

February 19, 2015

Reporting Context for News Stories (Hate Crimes)

Filed under: News,Reporting — Patrick Durusau @ 11:11 am

AJ+ tweeted two graphics on 17 February 2015:

hate-muslims

hate-crimes

Unless my math is off, that is 1,031 religion based hate crimes in 2013.

We would all prefer that number to be 0 but it’s not.

The problem with those graphics is they give no sense of context for how those crimes compare to the incident of crime in general.

Assuming that hate crimes can be violent or property crimes, the total of those two categories of crime in the United States for 2013 were:

9,795,659 (1,163,146 violent crimes + 8,632,512 property crimes)

Or if you want to know the percentage of religious hate crimes against all violent and property crimes:

1031 / 9,792,659 = % of religious hate crime of all crime.

Or, let’s assume every religious hate crime was committed by different individuals, giving us a total of 1,031 offenders.

To put that in context, the estimated U.S. population was 316,497,531 in 2013, with 23.3% of the population being under 18 years of age. That leaves a population over 18 of 242,753,606.

If you want to know the percentage of religious hate crime offenders to the U.S. population over 18 years of age:

1031 / 242,753,606 = % of religious hate crime offenders to the U.S. population over 18.

Or the number of people in the U.S. who didn’t commit religious hate crimes in 2013: 242,753,606 – 1,036 = 242,752,575

Including context in those graphics would be extremely difficult because the context is so large that the acts in question would not show up on the graphics at all.

What should our response to religious hate crime be? At a minimum the offenders should be caught and punished and the local community should rally around the victims to assure them the aberrant offenders do not represent the local community and to help the victims and their community heal.

At the same time, we should recognize, as should religious communities, that religious hate crimes are aberrant behavior that represents views not shared by the general population or the government.

Take this as an illustration that: News without context isn’t news. It is noise.

Update: I omitted my source for U.S. population statistics: USA QuickFacts

February 10, 2015

Avoiding Civilian Casualties – Don’t Look, Don’t Tell

Filed under: Government,Journalism,News,Reporting — Patrick Durusau @ 4:35 pm

Statistics are a bloodless way to tell the story of a war but in the absence of territory to claim (or claim/reclaim as was the case in Vietnam) and lacking an independent press to document the war, there isn’t much else to report. But even the simple statistics are twisted.

In the “war” against ISIS (unauthorized, Obama soon to ask Congress for ISIS war authority), Nancy A. Youssef reports in U.S. Won’t Admit to Killing a Single Civilian in the ISIS War:

Five months and 1,800-plus strikes into the U.S. air campaign against ISIS, and not a single civilian has been killed, officially. But Pentagon officials concede that they really have no way of telling for sure who has died in their attacks‚—and admit that no one will ever know how many have been slain.

A free and independent press reported the My Lai Massacre, which was only one of the untold number of atrocities against civilians in Vietname. The current generation of “journalists” drink the military’s Kool-Aid with no effort to document the impact of its airstrikes. Instead of bemoaning the lack of independent reports, the media should be the origin of independent reports.

How do you square the sheepish admission from the Pentagon that they don’t know who had dies in their attacks with statements by the U.S. Ambassador to Iraq, Stuart Jones, claiming that 6,000 ISIS fighters and half their leadership has been killed?

That sounds administration top-heavy if one out of every two fighters is a leader. Inside the beltway in D.C. is the only known location with that ratio of “leaders” to “followers.” But realistically, if the Pentagon had those sort of numbers, they would be repeating them in every daily briefing. Yes? Unless you think Ambassador Jones has a crystal ball, the most likely case is those numbers are fictional.

I don’t doubt there have been civilian casualties. In war there are always civilian casualties. What troubles me is the don’t look, don’t tell position of the U.S. military command in order to make war palatable to a media savvy public.

War should be unpalatable. It should be presented in its full gory details, videos of troops bleeding out on the battlefield, burials, families torn apart, women, children and men killed in ways we don’t want to imagine, all of the aspects that make it unpalatable.

If nothing else, it will sharpen the debate on war powers in Congress because then the issue won’t be model towns and cars but people who are dying before our very eyes on social media. How many more lives will we take to save the Arab world from Arabs?

February 1, 2015

DJA Newsletter [If you can’t see the data, it’s not news, just rumor.]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 11:04 am

DJA Newsletter: The best of Data Journalism every month

From the about page:

The Global Editors Network is a cross-platform community of editors-in-chief and media innovators committed to sustainable, high-quality journalism, empowering newsrooms through a variety of programmes designed to inspire, connect and share. Our online community allows media innovators from around the world to interact and collaborate on projects created through our programmes. The GEN Board Members support this mission and have signed the GEN Manifesto.

We are driven by a journalistic imperative and a common goal: Content and Engagement First. To that end, we support all kinds of organizations and media outlets, to define a vision for the future of journalism and enhance its quality through innovation and cooperation. Freedom of information and independence of the news media are, and will remain, the main credo of the Global Editors Network and we will back all efforts to enhance press freedom worldwide.

The links in this month’s newsletter:

  1. Every active satellite orbiting earth
  2. Islam in Europe – the gap between perceptions and reality
  3. What news sources does China block?
  4. What happens when you scrape AirBnB data?
  5. RiseUp revolutions

Looking forward to seeing more issue of the DJA newsletter!

January 8, 2015

PANDA Project (News Reporters)

Filed under: News,Reporting — Patrick Durusau @ 2:44 pm

PANDA Project (News Reporters)

From the webpage:

Information on a deadline The newsroom’s data at your fingertips, available at the speed of breaking news.

Smarter, not harder Subscribe to your favorite searches to get an email when news happens.

Institutional memory People are going to leave, but your data shouldn’t. Make it faster for new reporters to find stories in data.

Newsroom born & raised PANDA was built by newsroom developers, with the support of The Knight Foundation. It is sustained by Investigative Reporters and Editors.

PANDA is …

You can install PANDA on Amazon EC2 or on your own hardware, assuming you are using Ubuntu 12.02.

I haven’t set this up (yet) but it looks promising. I don’t see an obvious way to store observations about data for discovery by others or how to create links (associations) between data. To say nothing of annotating subjects I find in the data.

Capturing that level of institutional knowledge might to might not be socially acceptable. I recall reading about an automatic collaborative bookmarks application developed by a news room that faced opposition from reporters not wanting to share their links. Sounded odd to me but I pass it along for your consideration.

December 15, 2014

Sony Pictures Demands That News Agencies Delete ‘Stolen’ Data

Filed under: News,Reporting,Text Analytics,Text Mining — Patrick Durusau @ 10:31 am

Sony Pictures Demands That News Agencies Delete ‘Stolen’ Data by Michael Cieply and Brooks Barnes.

From the article:

Sony Pictures Entertainment warned media outlets on Sunday against using the mountains of corporate data revealed by hackers who raided the studio’s computer systems in an attack that became public last month.

In a sharply worded letter sent to news organizations, including The New York Times, David Boies, a prominent lawyer hired by Sony, characterized the documents as “stolen information” and demanded that they be avoided, and destroyed if they had already been downloaded or otherwise acquired.

The studio “does not consent to your possession, review, copying, dissemination, publication, uploading, downloading or making any use” of the information, Mr. Boies wrote in the three-page letter, which was distributed Sunday morning.

Since I wrote about the foolish accusations against North Korea by Sony, I thought it only fair to warn you that the idlers at Sony have decided to threaten everyone else.

A rather big leap from trash talking about North Korea to accusing the rest of the world of being interested in their incestuous bickering.

I certainly don’t want a copy of their movies, released or unreleased. Too much noise and too little signal for the space they would take. But, since Sony has gotten on its “let’s threaten everybody” hobby-horse, I do hope the location of the Sony documents suddenly appears in many more inboxes. patrick@durusau.net. 😉

How would you display choice snippets and those who uttered them when a webpage loads?

The bitching and catching by Sony are sure signs that something went terribly wrong internally. The current circus is an attempt to distract the public from that failure. Probably a member of management with highly inappropriate security clearance because “…they are important!”

Inappropriate security clearances for management to networks is a sign of poor systems administration. I wonder when that shoe is going to drop?

December 7, 2014

Google News: The biggest missed opportunity in media right now

Filed under: News,Reporting — Patrick Durusau @ 8:14 pm

Google News: The biggest missed opportunity in media right now by Mathew Ingram.

From the post:

Almost every time I talk to a journalist who spends a lot of time online and the subject of Google News comes up, there is a shared sense of frustration: namely, frustration over how little the site has changed over the years since it launched, and how much more it could do if Google really wanted it to — what a powerful tool it could be. I was reminded of this again when I came across a presentation that a German designer came up with that involved a wholesale redesign and re-thinking of what Google News is and does.

I found George Kvasnikov’s presentation because of a post at the design and culture site PSFK — the original was posted on the design community Behance a couple of months ago, after what Kvasnikov said was a lot of brainstorming followed by about five weeks worth of wireframing and other mockup-related work. What he came up with isn’t perfect by any means, but it has some interesting elements — and at least it is an attempt to bring Google News kicking and screaming into the future, instead of looking like it was embalmed not long after it launched.

A number of great suggestions but the one I didn’t see was offering more information about individuals, locations or other subjects in a story. Multiple perspectives are essential but when pressed for time, I would much prefer to not search for URLs for projects and basic information about the same. Ditto for people and locations.

What I don’t know is the best method for delivery of such snippets of data. Suggestions?

November 9, 2014

Almost Everything in “Dr. Strangelove” Was True

Filed under: Cybersecurity,Government,News,Reporting,Security — Patrick Durusau @ 8:00 pm

Almost Everything in “Dr. Strangelove” Was True by Eric Schlosser. (New Yorker, January 17, 2014)

From the post:

This month marks the fiftieth anniversary of Stanley Kubrick’s black comedy about nuclear weapons, “Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb.” Released on January 29, 1964, the film caused a good deal of controversy. Its plot suggested that a mentally deranged American general could order a nuclear attack on the Soviet Union, without consulting the President. One reviewer described the film as “dangerous … an evil thing about an evil thing.” Another compared it to Soviet propaganda. Although “Strangelove” was clearly a farce, with the comedian Peter Sellers playing three roles, it was criticized for being implausible. An expert at the Institute for Strategic Studies called the events in the film “impossible on a dozen counts.” A former Deputy Secretary of Defense dismissed the idea that someone could authorize the use of a nuclear weapon without the President’s approval: “Nothing, in fact, could be further from the truth.” (See a compendium of clips from the film.) When “Fail-Safe”—a Hollywood thriller with a similar plot, directed by Sidney Lumet—opened, later that year, it was criticized in much the same way. “The incidents in ‘Fail-Safe’ are deliberate lies!” General Curtis LeMay, the Air Force chief of staff, said. “Nothing like that could happen.” The first casualty of every war is the truth—and the Cold War was no exception to that dictum. Half a century after Kubrick’s mad general, Jack D. Ripper, launched a nuclear strike on the Soviets to defend the purity of “our precious bodily fluids” from Communist subversion, we now know that American officers did indeed have the ability to start a Third World War on their own. And despite the introduction of rigorous safeguards in the years since then, the risk of an accidental or unauthorized nuclear detonation hasn’t been completely eliminated.

Grim reading for good password advocates when they learn that all the Minuteman launch sites shared a common launch code: 00000000.

The type of command and control issues discussed for nuclear weapons are the same issues that should be debated now for government surveillance. Which aren’t being debated I should say because of the curtain of secrecy that surrounds government surveillance operations.

A curtain of secrecy that has the same justifications, “we are defending the public interest,” “there is an implacable foe at the ramparts,” etc.

The question you have to ask for many government offices and agencies isn’t are they lying but why?

The government of my youth lied, the government for every year thereafter has lied.

On what basis should I trust the government to not be lying today and in the future?

PS: How do you draft privacy controls with a known liar as the enforcing party?

November 3, 2014

Suppressing Authentic Information

Filed under: News,Reporting,Skepticism — Patrick Durusau @ 8:14 pm

In my continuing search for information on the authenticity of Dabiq (see: Dabiq, ISIS and Data Skepticism) I encountered Slick, agile and modern – the IS media machine by Mina Al-Lami.

Mina makes it clear that IS (ISIL/ISIS) has been the target of a campaign to shut down all authentic outlets for news from the group:


IS has always relied heavily on hordes of online supporters to amplify its message. But their role has become increasingly important in recent months as the group’s official presence on a variety of social media platforms has been shut down and moved underground.

The group’s ability to keep getting its message out in the face of intensive counter-measures is due to the agility, resilience and adaptability of this largely decentralized force.

Until July this year, IS, like most jihadist groups, had a very strong presence on Twitter, with all its central and regional media outlets officially active on the platform. However, its military successes on the ground in Iraq and Syria in June triggered a concerted and sustained clampdown on the group’s accounts.

IS was initially quick to replace these accounts, in what became a game of whack-a-mole between IS and the Twitter administration. But by July the group appeared to have abandoned any attempt to maintain an official open presence there.

Instead, IS began experimenting with a string of less known social media platforms. These included the obscure Friendica, Quitter and Diaspora – all of which promise better privacy and data-protection than Twitter – as well as the popular Russian VKontakte.

Underground channels

While accounts on Friendica and Quitter were shut down within days, the official IS presence on Diaspora and VKontakte lasted several weeks before their involvement in the distribution of high profile beheading videos caused them too to be shut down.

Since the accounts on VKontakte were closed in September, IS appears to have resorted to underground channels to surface its material, making no attempt to advertise an official social media presence. Perhaps surprisingly, this has not yet caused any problems for the group in terms of authenticating its output.

Once a message has surfaced – via channels that are currently difficult to pin down – it is disseminated by loosely affiliated media groups who are capable of mobilizing a vast network of individual supporters on social media to target specific audiences.

Unfortunately, Mina misses the irony of reporting that IS has no authentic outlets in one breath to relying in the next breath on non-authentic materials (such as Dabiq) to talk about the group’s social media prowess.

Suppression of authentic content outlets for IS leaves an interested reader at the mercy of governments, news organizations and others who have a variety of motives for attributing content to IS.

As I mentioned in my last post:

Debates about international and national policy should not be based on faked evidence (such as “yellow cake uranium“) or faked publications.

I have heard the argument that IS content recruits support for terrorism. I have read propaganda attributed to IS, the Khmer Rouge, the KKK and terrorists sponsored by Western governments. I can report not the slightest interest in supporting or participating with any of them.

The recruitment argument is a variation of the fear of allowing gays, drug use, drinking, etc., on television would result in children growing up to be gay drug addicts with drinking problems. I can report that no sane person credits that fear today. (If you have that fear, contact your local mental health service for an appointment.)

Why is IS attractive? Hard to say given the lack of authentic information on its goals and platform, perhaps its reported opposition to corrupt governments in the Middle East?

If I weren’t concerned with corrupt Western governments I might be more concerned with governments in the Middle East. But, as they say, best to start cleaning your own house before complaining about the state of another’s.

October 25, 2014

Overview App API

Filed under: News,Reporting — Patrick Durusau @ 3:30 pm

Overview App API

From the webpage:

An Overview App is a program that uses Overview.

You can make one. You know you want to.

Using Overview’s App API you can drive Overview’s document handling engine from your own code, create new visualizations that replace Overview’s default Topic Tree, or write interactive document handling or data extraction apps.

If you don’t remember the Overview Project:

Overview is just what you need to search, analyze and cull huge volumes of text or documents. It was built for investigative journalists who go through thousands of pages of material, but it’s also used by reasearchers facing huge archives and social media analysts with millions of posts. With advanced search and interactive topic modeling, you can:

  • find what you didn’t even know to look for
  • quickly tag or code documents
  • let the computer organize your documents by topic, automatically

Leveraging the capabilities in Overview is a better use of resources than re-inventing basic file and search capabilities.

October 22, 2014

Dabiq, ISIS and Data Skepticism

Filed under: News,Reporting,Skepticism — Patrick Durusau @ 2:54 pm

If you are following the Middle East, no doubt you have heard that ISIS/ISIL publishes Dabiq, a magazine that promotes its views. It isn’t hard to find articles quoting from Dabiq, but I wanted to find copies of Dabiq itself.

Clarion Project (Secondary Source for Dabiq)

After a bit of searching, I found that the Clarion Project is posting every issue of Dabiq as it appears.

The hosting site, Clarion Project, is a well known anti-Muslim hate group. The founders of the Clarion Project just happened to be full time employees of Aish Hatorah, a pro-Israel organization.

Coverage of Dabiq by Mother Jones (who should know better), ISIS Magazine Promotes Slavery, Rape, and Murder of Civilians in God’s Name relies on The Clarion Project “reprint” of Dabiq.

Internet Archive (Secondary Source for Dabiq)

The Islamic State Al-Hayat Media Centre (HMC) presents Dabiq Issue #1 (July 5, 2014).

All the issues at the Internet Archive claim to be from: “The Islamic State Al-Hayat Media Centre (HMC). I say “claim to be from” because uploading to the Internet Archive only requires an account with a verified email address. Anyone could have uploaded the documents.

Robert Mackey writes for the New York Times: Islamic State Propagandists Boast of Sexual Enslavement of Women and Girls and references Dabiq. I asked Robert for his source for Dabiq and he responded that it was the Internet Archive version.

Wall Street Journal

In Why the Islamic State Represents a Dangerous Turn in the Terror Threat, Gerald F. Seib writes:

It isn’t necessary to guess at what ISIS is up to. It declares its aims, tactics and religious rationales boldly, in multiple languages, for all the world to see. If you want to know, simply call up the first two editions of the organization’s remarkably sophisticated magazine, Dabiq, published this summer and conveniently offered in English online.

Gerald implies, at least to me, that Dabiq has a “official” website where it appears in multiple languages. But if you read Gerald’s article, there is no link to such a website.

I wrote to Gerald today to ask what site he meant when referring to Dabiq. I have not heard back from Gerald as of posting but will insert his response when it arrives.

The Jamestown Foundation

The Jamestown Foundation website featured: Hot Issue: Dabiq: What Islamic State’s New Magazine Tells Us about Their Strategic Direction, Recruitment Patterns and Guerrilla Doctrine by Michael W. S. Ryan, saying:

On the first day of Ramadan (June 28), the Islamic State in Iraq and Syria (ISIS) declared itself the new Islamic State and the new Caliphate (Khilafah). For the occasion, Abu Bakr al-Baghdadi, calling himself Caliph Ibrahim, broke with his customary secrecy to give a surprise khutbah (sermon) in Mosul before being rushed back into hiding. Al-Baghdadi’s khutbah addressed what to expect from the Islamic State. The publication of the first issue of the Islamic State’s official magazine, Dabiq, went into further detail about the Islamic State’s strategic direction, recruitment methods, political-military strategy, tribal alliances and why Saudi Arabia’s concerns that the Kingdom may be the Islamic State’s next target are well-founded.

Which featured a thumbnail of the cover of the first issue of Dabiq, with the following legend:

Dabiq Magazine (Source: Twitter user @umOmar246)

Well, that’s a problem because the Twitter user “@umOmar246” doesn’t exist.

Don’t take my word for it, go to Twitter, search for “umOmar246,” limit search results to people and you will see:

twitter results

I took the screen shot today just in case the results change at some point in time.

Other Media

Other media carry the same stories but without even attempting to cite a source. For example:

Jerusalem Post: ISIS threatens to conquer the Vatican, ‘break the crosses of the infidels’. Source? None.

Global News: The twisted view of ISIS in latest issue of propaganda magazine Dabiq by Nick Logan.

I don’t think that Nick appreciates the irony of the title of his post. Yes, this is a twisted view of ISIS. The question is who is responsible for it?

General Comments

Pick any issue of Dabiq and skim through it. What impressed me was the “over the top” presentation of cruelty. The hate literature I am familiar with (I grew up in the Deep South in the 1960’s) usually portrays the atrocities of others, not the group in question. Hate literature places its emphasis on the “other” group, the one to be targeted, not itself.

Analysis

First and foremost, the lack of any “official” site of origin for Dabiq makes me highly suspicious of the authenticity of the materials that claim to originate with ISIS.

Second, why would ISIS rely upon the Clarion Project as a distributor for its English language version of Dabiq, along with the Internet Archive?

Third, what are we to make of missing @umOmar246 from Twitter? Before you say that the account has closed, http://twittercounter.com/
doesn’t know that user either:

twitter counter results

A different aspect of consistency on distributed data. The aspect of getting “caught” because distributed data is difficult to make consistent.

Fourth, the media coverage examined relies upon sites with questionable authenticity but cites the material found there as though authoritative. Is this a new practice in journalism? Some of the media outlets examined are hardly new and upcoming Internet news sites.

Finally, the content of the magazines themselves don’t ring true for hate literature.

Conclusion

Debates about international and national policy should not be based on faked evidence (such as “yellow cake uranium“) or faked publications.

Based on what I have uncovered so far, attributing Dabiq to ISIS is highly questionable.

It appears to be an attempt to discredit ISIS and to provide a basis for whipping up support for military action by the United States and its allies.

The United States destroyed the legitimate government of Iraq on the basis of lies and fabrications. If only for nationalistic reasons, not spending American funds and lives based on a tissue of lies, let’s not make the same mistake again.

Disclaimer: I am not a supporter of ISIS nor would I choose to live in their state should they establish one. However, it will be their state and I lack the arrogance to demand that others follow social, religious or political norms that I prefer.

PS: If you have suggestions for other factors that either confirm a link between ISIS and Dabiq or cast further doubt on such a link, please post them in comments. Thanks!

October 16, 2014

Storyline Ontology

Filed under: News,Ontology,Reporting — Patrick Durusau @ 4:18 pm

Storyline Ontology

From the post:

The News Storyline Ontology is a generic model for describing and organising the stories news organisations tell. The ontology is intended to be flexible to support any given news or media publisher’s approach to handling news stories. At the heart of the ontology, is the concept of Storyline. As a nuance of the English language the word ‘story’ has multiple meanings. In news organisations, a story can be an individual piece of content, such as an article or news report. It can also be the editorial view on events occurring in the world.

The journalist pulls together information, facts, opinion, quotes, and data to explain the significance of world events and their context to create a narrative. The event is an award being received; the story is the triumph over adversity and personal tragedy of the victor leading up to receiving the reward (and the inevitable fall from grace due to drugs and sexual peccadillos). Or, the event is a bombing outside a building; the story is an escalating civil war or a gas mains fault due to cost cutting. To avoid this confusion, the term Storyline has been used to remove the ambiguity between the piece of creative work (the written article) and the editorial perspective on events.

Storyline ontology

I know, it’s RDF. Well, but the ontology itself, aside from the RDF cruft, represents a thought out and shared view of story development by major news producers. It is important for that reason if no other.

And you can use it as the basis for developing or integrating other story development ontologies.

Just as the post acknowledges:

As news stories are typically of a subjective nature (one news publisher’s interpretation of any given news story may be different from another’s), Storylines can be attributed to some agent to provide this provenance.

the same is true for ontologies. Ready to claim credit/blame for yours?

August 6, 2014

Israel, Gaza, War & Data…

Filed under: News,Personalization,Reporting — Patrick Durusau @ 10:05 am

Israel, Gaza, War & Data – social networks and the art of personalizing propaganda by Gilad Lotan.

From the post:

It’s hard to shake away the utterly depressing feeling that comes with news coverage these days. IDF and Hamas are at it again, a vicious cycle of violence, but this time it feels much more intense. While war rages on the ground in Gaza and across Israeli skies, there’s an all-out information war unraveling in social networked spaces.

Not only is there much more media produced, but it is coming at us at a faster pace, from many more sources. As we construct our online profiles based on what we already know, what we’re interested in, and what we’re recommended, social networks are perfectly designed to reinforce our existing beliefs. Personalized spaces, optimized for engagement, prioritize content that is likely to generate more traffic; the more we click, share, like, the higher engagement tracked on the service. Content that makes us uncomfortable, is filtered out.
….

You are familiar with the “oooh” and “aaah” social network graphs. Interesting but too dense in most cases to be useful.

The first thing you will notice about Gilad’s post is that he is making effective use of fairly dense social network graphs. The second thing you will notice is the post is one of the relatively few that can be considered sane on the topic of Israel and Gaza. It is worth reading for its sanity if nothing else.

Gilad argues algorithms are creating information cocoons about us “…where never is heard a discouraging word…” or at least any that we would find disagreeable.

Social network graphs are used to demonstrate such information cocoons for the IDF and Hamas and to show possible nodes that may be shared by those cocoons.

I encourage you to read Gilad’s post as an illustration of good use of social network graphics, an interesting analysis of bridging information cocoons and a demonstration that relatively even-handed reporting remains possible.

I first saw this in a tweet by Wandora which read: “Thinking of #topicmaps and #LOD.”

July 30, 2014

Senator John Walsh plagiarism, color-coded

Filed under: News,Plagiarism,Reporting,Visualization — Patrick Durusau @ 4:43 pm

Senator John Walsh plagiarism, color-coded by Nathan Yau.

Nathan points to a New York Times’ visualization that makes a telling case for plagiarism against Senator John Walsh.

Best if you see it at Nathan’s site, his blog formats better than mine does.

Senator Walsh was rather obvious about it but I often wonder how much news copy, print or electronic, is really original?

Some is I am sure but when a story goes out over AP or UPI, how much of it is repeated verbatim in other outlets?

It’s not plagiarism because someone purchased a license to repeat the stories but it certainly isn’t original.

If an AP/UPI story is distributed and re-played in 500 news outlets, it remains one story. With no more credibility than it had at the outset.

Would color coding be as effective against faceless news sources as they have been against Sen. Walsh?

BTW, if you are interested in the sordid details: Pentagon Watchdog to review plagiarism probe of Sen. John Walsh. Incumbents need not worry, Sen. Walsh is an appointed senator and therefore is an easy throw-away in order to look tough on corruption.

July 10, 2014

Stupid Tag Tricks [Overview]

Filed under: News,Reporting — Patrick Durusau @ 7:15 pm

Stupid Tag Tricks by Jonathan Stray.

From the post:

Overview’s tags are very powerful, but it may not obvious how to use them best. Here’s a collection of tagging tricks that have been helpful to our users, from Overview developer Jonas Karlsson.

The “tricks” include:

  • Tracking documents for review
  • Grouping Tags
  • Create a visualization from your tags
  • Tag all documents that do not contain tag “abc”
  • Tag all documents that have tags “a” OR “b” OR “c”
  • Tag all documents that have tags “a” AND “b” AND “c”

But, there is no “trick” for discovering when two or more different tags mean the same thing.

If we are annotating a collection of documents separately, we might use different tags to mean the same thing.

The last “tag trick” can collect all those documents together, how do we find out different tags meant the same thing?

If tags had properties, that is key/value pairs that identify the subject they represent, we could search those properties and discover different tags that meant the same thing.

In fact, we could write rules for when different tags represent the same subject.

That would lead to better sharing of tagged documents.

And enhanced result of tagged documents.

Interested?

Data Journalism: Overpromised/Underdelivered?

Filed under: Journalism,News,Reporting — Patrick Durusau @ 3:02 pm

Alberto Cairo: Data journalism needs to up its own standards by Alberto Cairo.

From the post:

Did you know that wearing a helmet when riding a bike may be bad for you? Or that it’s possible to infer the rise of kidnappings in Nigeria from news reports? Or that we can predict the year when a majority of Americans will reject the death penalty (hint: 2044)? Or that it’s possible to see that healthcare prices in the U.S. are “insane” in 15 simple charts? Or that the 2015 El Niño event may increase the percentage of Americans who accept climate change as a reality?

But I have to confess my disappointment with the new wave of data journalism — at least for now. All the questions in the first paragraph are malarkey. Those stories may not be representative of everything that FiveThirtyEight, Vox, or The Upshot are publishing — I haven’t gathered a proper sample — but they suggest that, when you pay close attention at what they do, it’s possible to notice worrying cracks that may undermine their own core principles.

In my present interpretation of his examples, Alberto has good reason to complain.

But that doesn’t mean re-cast any of the stories would be closer to some “truth.” Rather they would be closer to my norms for such stories. Which isn’t the same thing.

Or as Nietzsche would say: There are no facts, only interpretations.

People from presidents on down lay claim to “facts.” Your opponents can be pilloried for ignoring “facts.” Current mis-adventures in domestic and foreign security of the United States are predicated on emotional insecurities packaged as “facts.”

Acknowledging Nietzsche puts all “facts” on an even footing.

Enough diverse “facts” and it is harder to agree to spend $Trillions pursuing a security that is pushed further away with every dollar spent.

Visual Journalism Training Resources

Filed under: Journalism,News,Reporting — Patrick Durusau @ 10:48 am

BBC Opens Up Internal Visual Journalism Training Resources to the Public by Gannon Burgett.

From the post:

Last week, the BBC College of Journalism opened up their training website to the public. Full of educational resources created by and for the internal BBC team, these professional videos and guides run through a number of circumstances and suggestions for approaching visual journalism.

Set to be open for a 12 month trial run, the videos and podcasts cover topics that range from safety when harmed in the field, to iPhone photojournalism, to basic three-point lighting techniques and even videos that show you how to properly use satellite phones when capturing stories in unconventional areas.

A rather extraordinary set of resources!

Should give you a window into how the BBC views news reporting as well as the tools for news reporting on your own.

To see all the resources, see the BBC Academy page.

I first saw this in a tweet by Michael Peter Edson.

July 9, 2014

MuckRock

Filed under: Government,Government Data,News,Reporting — Patrick Durusau @ 4:53 pm

MuckRock

From the webpage:

MuckRock is an open news tool powered by state and federal Freedom of Information laws and you: Requests are based on your questions, concerns and passions, and you are free to embed, share and write about any of the verified government documents hosted here. Want to learn more? Check out our about page. MuckRock has been funded in part by grants from the Sunlight Foundation, the Freedom of the Press Foundation and the Knight Foundation.

Join Our Mailing List »

An amazing site.

I found MuckRock while looking for documents released by mistake by DHS. DHS Releases Trove of Documents Related to Wrong “Aurora” in Response to Freedom of Information Act (FOIA) Request (Maybe the DHS needs a topic map?)

I’ve signed up for their mailing list. Thinking about what government lies I want to read. 😉

Looks like a great place to use your data mining/analysis skills.

Enjoy!

Overview can now read most file formats directly

Filed under: News,Reporting — Patrick Durusau @ 3:38 pm

Overview can now read most file formats directly by Jonathan Stray.

From the post:

Previously, Overview could only read PDF files. (You can also import all documents in a single CSV file, or import a project from DocumentCloud.)

Starting today, you can directly upload documents in a wide variety of file formats. Simply add the files — or entire folders — using the usual file upload page.

Overview will automatically detect the file type and extract the text. Your document will be displayed as a PDF in your browser when you view it. Overview supports a wide variety of formats, including:

  • PDF
  • HTML
  • Microsoft Word (.doc and .docx)
  • Microsoft PowerPoint (.ppt and .pptx)
  • plain text, and also rich text (.rtf)

For a full list, see the file formats that LibreOffice can read.

This is good news!

Pass it on!

June 16, 2014

Annotating the news

Filed under: Annotation,Authoring Topic Maps,News,Reporting — Patrick Durusau @ 4:56 pm

Annotating the news: Can online annotation tools help us become better news consumers? by Jihii Jolly.

From the post:

Last fall, Thomas Rochowicz, an economics teacher at Washington Heights Expeditionary Learning School in New York, asked his seniors to research news stories about steroids, drone strikes, and healthcare that could be applied to their class reading of Michael Sandel’s Justice. The students were to annotate their articles using Ponder, a tool that teachers can use to track what their students read and how they react to it. Ponder works as a browser extension that tracks how long a reader spends on a page, and it allows them to make inline annotations, which include highlights, text, and reaction buttons. These allow students to mark points in the article that relate to what they are learning in class—in this case, about economic theories. Responses are aggregated and sent back to the class feed, which the teacher controls.

Interesting piece on the use of annotation software with news stories.

I don’t know how configurable Ponder is in terms of annotation and reporting but being able to annotate web and pdf documents would be a long step towards lay authoring of topic maps.

For example, the “type” of a subject could be selected from a pre-composed list and associations created to map this occurrence of the subject in a particular document, by a particular author, etc. I can’t think of any practical reason to bother the average author with such details. Can you?

Certainly an expert author should have the ability to be less productive and more precise than the average reader but then we are talking about news stories. 😉 How precise does it need to be?

The post also mentions News Genius, which was pointed out to me by Sam Hunting some time ago. Probably better known for its annotation of rap music at rap genius. The only downside I see to Rap/News Genius is that the text to be annotated is loaded onto the site.

That is a disadvantage because if I wanted to create a topic map from annotations of archive files from the New York Times, that would not be possible. Remote annotation and then re-display of those annotations when a text is viewed (by an authorized user) is the sin qua non of topic maps for data resources.

« Newer PostsOlder Posts »

Powered by WordPress