Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 22, 2016

The five-step fact-check (Africa Check)

Filed under: Journalism,News,Reporting — Patrick Durusau @ 1:38 pm

The five-step fact-check from AfricaCheck

From the post:

Print our useful flow-chart and stick it up in a place where you can quickly refer to it when a deadline is pressing.

africa-check-fact-check-460

Click here to download the PDF for printing.

A great fact checking guide for reporters but useful insight for readers as well.

What’s missing from a story you are reading right now?

AfricaCheck offers to fact check claims about Africa tweeted with: #AfricaCheckIt.

There’s a useful service to the news community!

A quick example, eNCA (South African news site) claimed Zimbabwe’s President Robert Mugabe announced his retirement.

Africa Check responded with Mugabe’s original words plus translation.

I don’t read Mugabe as announcing his retirement but see for yourself.

November 19, 2016

How to get superior text processing in Python with Pynini

Filed under: FSTs,Journalism,News,Python,Reporting,Text Mining — Patrick Durusau @ 9:35 pm

How to get superior text processing in Python with Pynini by Kyle Gorman and Richard Sproat.

From the post:

It’s hard to beat regular expressions for basic string processing. But for many problems, including some deceptively simple ones, we can get better performance with finite-state transducers (or FSTs). FSTs are simply state machines which, as the name suggests, have a finite number of states. But before we talk about all the things you can do with FSTs, from fast text annotation—with none of the catastrophic worst-case behavior of regular expressions—to simple natural language generation, or even speech recognition, let’s explore what a state machine is, what they have to do with regular expressions.

Reporters, researchers and others will face a 2017 where the rate of information has increased, along with noise from media spasms over the latest taut from president-elect Trump.

Robust text mining/filtering will your daily necessities, if they aren’t already.

Tagging text is the first example. Think about auto-generating graphs from emails with “to:,” “from:,” “date:,” and key terms in the email. Tagging the key terms is essential to that process.

Once tagged, you can slice and dice the text as more information is uncovered.

Interested?

Tracking Business Records Across Asia

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:10 pm

Tracking Business Records Across Asia by GIJN staff.

From the post:

The paper trail has changed — money now moves digitally and business registries are databases — and this lets journalists do more than ever before in tracking people and companies across borders.

Backgrounding an individual or a company? Following an organized crime ring? The key to uncovering corruption is to “follow the money” — to discover who owns what, who gets which contract, and how business are linked to each other.

Resources on tracking corporate records in China, the Philippines and India!

While you are sharpening your tracking skills, don’t forget to support GIJN.

Eight steps reporters should take … [every day]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 2:38 pm

Eight steps reporters should take before Trump assumes office by Dana Priest.

Reporters should paste these eight steps to their bathrooms mirror for review every day, not just for the Trump presidency:

Rebuild sources: Call every source you’ve ever had who is either still in government or still connected to those who are. Touch base, renew old connections, and remind folks that you’re all ears.

Join forces: Triangulate tips and sources across the newsroom, like we did after 9/11, when reporting became more difficult.

Make outside partnerships: Reporting organizations outside your own newspaper, especially those abroad and with international reach, can help uncover the moves being considered and implemented in foreign countries.

Discover the first family: Now part of the White House team, Donald Trump’s children and son-in-law are an important target for deep-dive reporting into their own financial holdings and their professional and personal records.

Renew the hunt: Find those tax filings!

Out disinformation: Find a way to take on the many false news sites that now hold a destructive sway over some Americans.

Create a war chest: Donate and persuade your news organization to donate large sums to legal defense organizations preparing to jump in with legal challenges the moment Trump moves against access, or worse. The two groups that come to mind are the Reporters’ Committee for Freedom of the Press and the American Civil Liberties Union. Encourage your senior editors to get ready for the inevitable, quickly.

Be grateful: Celebrate your freedom to do hard-hitting, illuminating work by doing much more of it.

Don’t wait for reporters to carry all the load.

Many of these steps, “Renew the hunt” comes to mind, can be performed by non-reporters and then leaked.

A lack of transparency of government signals a lack of effort on the part of the press and public.

FOIA is great but it’s also being spoon fed what the government chooses to release.

I’m thinking of transparency that is less self-serving than FOIA releases.

November 18, 2016

Successful Hate Speech/Fake News Filters – 20 Facts About Facebook

Filed under: Facebook,Journalism,News,Reporting — Patrick Durusau @ 11:04 am

After penning Monetizing Hate Speech and False News yesterday, I remembered non-self-starters will be asking:

Where are examples of successful monetized filters for hate speech and false news?

Of The Top 20 Valuable Facebook Statistics – Updated November 2016, I need only two to make the case for monetized filters.

1. Worldwide, there are over 1.79 billion monthly active Facebook users (Facebook MAUs) which is a 16 percent increase year over year. (Source: Facebook as of 11/02/16)

15. Every 60 seconds on Facebook: 510 comments are posted, 293,000 statuses are updated, and 136,000 photos are uploaded. (Source: The Social Skinny)

(emphasis in the original)

By comparison, Newsonomics: 10 numbers on The New York Times’ 1 million digital-subscriber milestone [2015], the New York Times has 1 million digital subscribers.

If you think about it, the New York Times is a hate speech/fake news filter, although it has a much smaller audience than Facebook.

Moreover, the New York Times is spending money to generate content whereas on Facebook, content is there for the taking or filtering.

If the New York Times can make money as a filter for hate speech/fake news carrying its overhead, imagine the potential for profit from simply filtering content generated and posted by others. Across a market of 1.79 billion viewers. Where “hate,” and “fake” varies from audience to audience.

Content filters at Facebook and the ability to “follow” those filters for on timelines is all that is missing. (And Facebook monetizing the use of those filters.)

Petition Mark Zuckerberg and Facebook for content filters today!

November 17, 2016

Monetizing Hate Speech and False News

Filed under: Facebook,Journalism,News,Reporting — Patrick Durusau @ 5:48 pm

Eli Pariser has started If you were Facebook, how would you reduce the influence of fake news? on GoogleDocs.

Out of the now seventeen pages of suggestions, I haven’t noticed any that promise a revenue stream to Facebook.

I view ideas to filter “false news” and/or “hate speech” that don’t generate revenue for Facebook as non-starters. I suspect Facebook does as well.

Here is a broad sketch of how Facebook can monetize “false news” and “hate speech,” all while shaping Facebook timelines to diverse expectations.

Monetizing “false news” and “hate speech”

Facebook creates user defined filters for their timelines. Filters can block other Facebook accounts (and any material from them), content by origin, word and I would suggest, regex.

User defined filters apply only to that account and can be shared with twenty other Facebooks users.

To share a filter with more than twenty other Facebook users, Facebook charges an annual fee, scaled on the number of shares.

Unlike the many posts on “false news” and “hate speech,” being a filter isn’t free beyond twenty other users.

Selling Subscriptions to Facebook Filters

Organizations can sell subscriptions to their filters, Facebook, which controls the authorization of the filters, contracts for a percentage of the subscription fee.

Pro tip: I would not invoke Facebook filters from the Washington Post and New York Times at the same time. It is likely they exclude each other as news sources.

Advantages of Monetizing Hate Speech and False News

First and foremost for Facebook, it gets out of the satisfying every point of view game. Completely. Users are free to define as narrow or as broad a point of view as they desire.

If you see something you don’t like, disagree with, etc., don’t complain to Facebook, complain to your Facebook filter provider.

That alone will expose the hidden agenda behind most, perhaps not all, of the “false news” filtering advocates. They aren’t concerned with what they are seeing on Facebook but they are very concerned with deciding what you see on Facebook.

For wannabe filters of what other people see, beyond twenty other Facebook users, that privilege is not free. Unlike the many proposals with as many definitions of “false news” as appear in Eli’s document.

It is difficult to imagine a privilege people would be more ready to pay for than the right to attempt to filter what other people see. Churches, social organizations, local governments, corporations, you name them and they will be lining up to create filter lists.

The financial beneficiary of the “drive to filter for others” is of course Facebook but one could argue the filter owners profit by spreading their worldview and the unfortunates that follow them, well, they get what they get.

Commercialization of Facebook filters, that is selling subscriptions to Facebook filters creates a new genre of economic activity and yet another revenue stream for Facebook. (That two up to this point if you are keeping score.)

It isn’t hard to imagine the Economist, Forbes, professional clipping services, etc., creating a natural extension of their filtering activities onto Facebook.

Conclusion: Commercialization or Unfunded Work Assignments

Preventing/blocking “hate speech” and “false news,” for free has been, is and always will be a failure.

Changing Facebook infrastructure isn’t free and by creating revenue streams off of preventing/blocking “hate speech” and “false news,” creates incentives for Facebook to make the necessary changes and for people to build filters off of which they can profit.

Not to mention that filtering enables everyone, including the alt-right, alt-left and the sane people in between, to create the Facebook of their dreams, and not being subject to the Facebook desired by others.

Finally, it gets Facebook and Mark Zuckerberg out of the fantasy island approach where they are assigned unpaid work by others. New York Times, Mark Zuckerberg Is in Denial. (It’s another “hit” piece by Zeynep Tufekci.)

If you know Mark Zuckerberg, please pass this along to him.

November 16, 2016

“…Fake News Is Not the Problem”

Filed under: Journalism,News,Reporting — Patrick Durusau @ 1:52 pm

According to Snopes, Fake News Is Not the Problem by Brooke Binkowski.

From the post:

Take it from the internet’s chief myth busters: The problem is the failing media.

This is the state of truth on the internet in 2016, now that it is as easy for a Macedonian teenager to create a website as it is for The New York Times, and now that the information most likely to find a large audience is that which is most alarming, not most correct. In the wake of the election, the spread of this kind of phony news on Facebook and other social media platforms has come under fire for stoking fears and influencing the election’s outcome. Both Facebook and Google have taken moves to bar fake news sites from their advertising platforms, aiming to cut off the sites’ sources of revenue.

But as managing editor of the fact-checking site Snopes, Brooke Binkowski believes Facebook’s perpetuation of phony news is not to blame for our epidemic of misinformation. “It’s not social media that’s the problem,” she says emphatically. “People are looking for somebody to pick on. The alt-rights have been empowered and that’s not going to go away anytime soon. But they also have always been around.”

The misinformation crisis, according to Binkowski, stems from something more pernicious. In the past, the sources of accurate information were recognizable enough that phony news was relatively easy for a discerning reader to identify and discredit. The problem, Binkowski believes, is that the public has lost faith in the media broadly — therefore no media outlet is considered credible any longer. The reasons are familiar: as the business of news has grown tougher, many outlets have been stripped of the resources they need for journalists to do their jobs correctly. “When you’re on your fifth story of the day and there’s no editor because the editor’s been fired and there’s no fact checker so you have to Google it yourself and you don’t have access to any academic journals or anything like that, you will screw stories up,” she says.

Sadly Binkowski’s debunking of the false/fake news meme doesn’t turn up on Snopes.com.

That might make it more convincing to mainstream media who have seized upon false/fake news to excuse their lack of credibility with readers.

Please share the Binkowski post with your friends, especially journalists.

November 15, 2016

False, Misleading, Clickbait-y, and Satirical “News” Sources (Another Useful Listicle)

Filed under: Journalism,News,Reporting — Patrick Durusau @ 5:39 pm

False, Misleading, Clickbait-y, and Satirical “News” Sources by Melissa Zimdars.

From the document:

Below is a list of fake, false, regularly misleading, and/or otherwise questionable “news” organizations, as well as organizations that regularly use clickbait-y headlines and descriptions, that are commonly shared on facebook and other social media sites. Some of these websites rely on “outrage” by using distorted headlines and decontextualized or dubious information in order to generate likes, shares, and profits.

Other sources on this list are purposefully fake with the intent of satire/comedy, which can offer important critical commentary on politics and society, but they are regularly shared as actual/literal news. I’m including them here, for now, because 1.) they have the potential to perpetuate misinformation based on different audience (mis)interpretations and 2.) to make sure anyone who reads a story by The Onion, for example, understands its purpose. If you think this is unnecessary, please see Literally Unbelievable.

This list is in the process of being updated and to her credit, Melissa explicitly says that no source should be given an automatic imprimatur.

Too many commentators to complain about “false news,” and/or “bubbles:”

  • Want to separate true/false news for you
  • Want to sell you their bubble to replace your own

You will be less informed and less capable of evaluating news for yourself in either case.

As Melissa notes, read widely and with a critical eye.

November 12, 2016

Preventing Another Trump – Censor Facebook To Protect “Dumb” Voters

Filed under: Censorship,Free Speech,Government,Journalism,News,Politics,Reporting — Patrick Durusau @ 9:01 pm

Facebook can no longer be ‘I didn’t do it’ boy of global media by Emily Bell.


Barack Obama called out the fake news problem directly at a rally in Michigan on the eve of the election: “And people, if they just repeat attacks enough, and outright lies over and over again, as long as it’s on Facebook and people can see it, as long as it’s on social media, people start believing it….And it creates this dust cloud of nonsense.”

Yesterday, Zuckerberg disputed this, saying that “the idea that fake news on Facebook… influenced the election…is a pretty crazy idea” and defending the “diversity” of information Facebook users see. Adam Mosseri, the company’s VP of Product Development, said Facebook must work on “improving our ability to detect misinformation.” This line is part of Zuckerberg’s familiar but increasingly unconvincing narrative that Facebook is not a media company, but a tech company. Given the shock of Trump’s victory and the universal finger-pointing at Facebook as a key player in the election, it is clear that Zuckerberg is rapidly losing that argument.

In fact, Facebook, now the most influential and powerful publisher in the world, is becoming the “I didn’t do it” boy of global media. Clinton supporters and Trump detractors are searching for reasons why a candidate who lied so frequently and so flagrantly could have made it to the highest office in the land. News organizations, particularly cable news, are shouldering part of the blame for failing to report these lies for what they were. But a largely hidden sphere of propagandistic pages that target and populate the outer reaches of political Facebook are arguably even more responsible.

You can tell Bell has had several cups of the Obama kool-aid by her uncritical acceptance of Barack Obama’s groundless attacks on “…fake news problem….”

Does Bell examine the incidence of “fake news” in other elections?

No.

Does Bell specify which particular “fake news” stories should have been corrected?

No.

Does Bell explain why voters can’t distinguish “fake news” from truthful news?

No.

Does Bell explain why mainstream media is better than voters at detecting “fake news?”

No.

Does Bell explain why she should be the judge over reporting during the 2016 Presidential election?

No.

Does Bell explain why she and Obama consider voters to be dumber than themselves?

No.

Do I think Bell or anyone else should be censoring Facebook for “false news?”

No.

How about you?

November 10, 2016

Here’s to the return of the journalist as malcontent

Filed under: Journalism,News,Reporting — Patrick Durusau @ 6:39 pm

Here’s to the return of the journalist as malcontent by Kyle Pope.

From the post:

JOURNALISM’S MOMENT of reckoning has arrived.

Its inability to understand Donald Trump’s rise over the last year, ending in his victory Tuesday night, clearly stand among journalism’s great failures, certainly in a generation and probably in modern times.

Reporters’ eagerness first to ridicule Trump and his supporters, then dismiss them, and finally to actively lobby and argue for their defeat have led us to a moment when the entire journalistic enterprise needs to be rethought and rebuilt. In terms of bellwether moments, this is our anti-Watergate.

Already the finger-pointing deconstructions have begun. Yes, social media played a role, enclosing reporters in echo chambers that made it hard, if not impossible, for them to hear contrarian voices; yes, the brutal economics of the news business hurt all our efforts, decimating newsrooms around the country and leaving fewer people to grapple with what was a gargantuan story; and yes, reporters can be forgiven, at least initially, for laughing off a candidate whose views and personality seemed so outside the norm of a serious contender for the White House.

While all those things are true, journalism’s fundamental failure in this election, its original sin, is much more basic to who we are and what we are supposed to be. Simply put, it is rooted in a failure of reporting.

(emphasis in original)

You should read this essay at the start of everyday. Even after opposition to and suspicion of every government, corporation or other statement is second nature.

It’s ironic that Pope points out:

[Trump] already has made clear that he is no friend of the press.

True enough but the press has made it clear it is the friend of government, for several administrations.

Regaining the trust of the public is going to be a long and hard slough.

Bursting Your News Bubble

Filed under: Journalism,News,Reporting — Patrick Durusau @ 11:55 am

It would not have helped the Clinton clones (a sense of entitlement makes people tone deaf and fact blind) but C.J. Adams and Izzie Zahorian explore a way to “see” news beyond your usual news bubble.

In If you are reading this, we might be in the same news bubble they write:

In Myanmar we met two journalists who, during a period of military rule, had smuggled newspapers in duffel bags to carry news between their country and the outside world. Their story stuck with us as a sort of personal challenge: these reporters had regularly risked their lives to read a just a few pages of news from outside their country; while we, with all our connectivity, rarely make the effort to do the same.

Even with the power of the internet, it can be surprisingly difficult to explore the diversity of global perspectives. Technology has made it easier for everyone share information, but it hasn’t made us better at finding viewpoints that are distant from our own. In some ways, a duffel bag full of newspapers would include a wider range of perspectives than many of us see on a daily basis.

Search engines, social media and news aggregators are great at surfacing information close to our interests, but they are limited by the set of topics and people we choose to follow. Even if we read multiple news sources every day, what we discover is defined by the languages we are able to read, and the topics that our sources decide to cover. Ultimately, these limitations create a “news bubble” that shapes our perspective and awareness of the world. We often miss out on the chance to connect and empathize with ideas beyond these boundaries.

How to “see” news without your usual filters?


We’ve just released a new experiment related to this idea: a data visualization called Unfiltered.News. The viz uses Google News data to show what the daily news topics are being published in every region. Headlines for these topics can be viewed from around the world, with translations provided in 40 languages. We hope the viz can become a useful tool to explore what shapes our different perspectives, and to help users discover topics and viewpoints they would have otherwise missed.

Push this one up to the top of your “sites/technology to explore” stack!

I’m having a mixed experience on Ubuntu 14.04. Chrome fails altogether, no support for WebGL. Mozilla displays the side bar of headlines but not the graph like presentation of stories.

I also tried to load the site on Windows 7 with IE and got no joy.

Understandable (but disappointing) that the site may be optimized for Windows but to exclude Chrome?

It’s a great idea, hopeful that during this beta shakedown that it becomes more widely accessible.

November 2, 2016

Does Verification Matter? Clinton/Podesta Emails Update

Filed under: Data Mining,Hillary Clinton,News,Reporting — Patrick Durusau @ 12:51 pm

As of today, 10,357 DKIM Verified Clinton/Podesta Emails (of 43,526 total). That’s releases 1-26.

I ask “Does Verification Matter?” in the title to this post because of the seeming lack of interest in verification of emails in the media. Not that it would ever be a lead, but some mention of the verified/not status of an email seems warranted.

Every Clinton/Podesta story mentions Antony Weiner’s interest in sharing his sexual insecurities and nary a peep about the false Clinton/Obama/Clapper claims that emails have been altered. Easy enough to check. But no specifics are given or requested by the press.

Thanks to the Clinton/Podesta drops by Michael Best, @NatSecGeek, I have now uploaded:

DKIM-verified-podesta-1-26.txt.gz is a sub-set of 10,357 emails that have been verified by their DKIM keys.

The statements in or data attached to those emails may still be false. DKIM verification only validates the email being the same as when it left the email server, nothing more.

DKIM-complete-podesta-1-26.txt.gz is the full set of Podesta emails to date, some 43,526, with their DKIM results of either True or False.

Both files have these fields:

ID – 1| Verified – 2| Date – 3| From – 4| To – 5| Subject -6| Message-Id – 7

Enjoy!

PS: Perhaps verification doesn’t matter when the media repeats false and/or delusional statements of DNI Clapper in hopes of…, I don’t know what they are hoping for but I am hoping they are dishonest, not merely stupid.

October 31, 2016

9,477 DKIM Verified Clinton/Podesta Emails (of 39,878 total (today))

Filed under: Data Mining,Email,Hillary Clinton,News,Reporting — Patrick Durusau @ 3:19 pm

Still working on the email graph and at the same time, managed to catch up on the Clinton/Podesta drops by Michael Best, @NatSecGeek, at least for a few hours.

DKIM-verified-podesta-1-24.txt.gz is a sub-set of 9,477 emails that have been verified by their DKIM keys.

The statements in or data attached to those emails may still be false. DKIM verification only validates the email being the same as when it left the email server, nothing more.

DKIM-complete-podesta-1-24.txt.gz is the full set of Podesta emails to date, some 39,878, with their DKIM results of either True or False.

Both files have these fields:

ID – 1| Verified – 2| Date – 3| From – 4| To – 5| Subject -6| Message-Id – 7

Question: Have you seen any news reports that mention emails being “verified” in their reporting?

Emails in the complete set may be as accurate as those in the verified set, but I would think verification is newsworthy in and of itself.

You?

Clinton/Podesta Emails 23 and 24, True or False? Cryptographically Speaking

Filed under: Data Mining,Hillary Clinton,News,Politics,Reporting — Patrick Durusau @ 10:21 am

Catching up on the Clinton/Podesta email releases from Wikileaks, via Michael Best, NatSecGeek. Michael bundles the releases up and posts them at: Podesta emails (zipped).

For anyone coming late to the game, DKIM “verified” means that the DKIM signature on an email is valid for that email.

In lay person’s terms, that email has been proven by cryptography to have originated from a particular mail server and when it left that mail server, it read exactly as it does now, i.e., no changes by Russians or others.

What I have created are files that lists the emails in the order they appear at Wikileaks, with the very next field being True or False on the verification issue.

Just because an email has “False” in the second column doesn’t mean it has been modified or falsified by the Russians.

DKIM signatures fail for all manner of reasons but when they pass, you have a guarantee the message is intact as sent.

For your research into these emails:

DKIM-complete-podesta-23.txt.gz

and

DKIM-complete-podesta-24.txt.gz.

For release 24, I did have to remove the DKIM signature on 39256 00010187.eml in order for the script to succeed. That is the only modification I made to either set of files.

October 22, 2016

Validating Wikileaks Emails [Just The Facts]

Filed under: Cybersecurity,Hillary Clinton,Journalism,News,Reporting,Wikileaks — Patrick Durusau @ 8:27 pm

A factual basis for reporting on alleged “doctored” or “falsified” emails from Wikileaks has emerged.

Now to see if the organizations and individuals responsible for repeating those allegations, some 260,000 times, will put their doubts to the test.

You know where my money is riding.

If you want to verify the Podesta emails or other email leaks from Wikileaks, consult the following resources.

Yes, we can validate the Wikileaks emails by Robert Graham.

From the post:

Recently, WikiLeaks has released emails from Democrats. Many have repeatedly claimed that some of these emails are fake or have been modified, that there’s no way to validate each and every one of them as being true. Actually, there is, using a mechanism called DKIM.

DKIM is a system designed to stop spam. It works by verifying the sender of the email. Moreover, as a side effect, it verifies that the email has not been altered.

Hillary’s team uses “hillaryclinton.com”, which as DKIM enabled. Thus, we can verify whether some of these emails are true.

Recently, in response to a leaked email suggesting Donna Brazile gave Hillary’s team early access to debate questions, she defended herself by suggesting the email had been “doctored” or “falsified”. That’s not true. We can use DKIM to verify it.

Bob walks you through validating a raw email from Wikileaks with the DKIM verifier plugin for Thunderbird. And demonstrating the same process can detect “doctored” or “falsified” emails.

Bob concludes:

I was just listening to ABC News about this story. It repeated Democrat talking points that the WikiLeaks emails weren’t validated. That’s a lie. This email in particular has been validated. I just did it, and shown you how you can validate it, too.

Btw, if you can forge an email that validates correctly as I’ve shown, I’ll give you 1-bitcoin. It’s the easiest way of solving arguments whether this really validates the email — if somebody tells you this blogpost is invalid, then tell them they can earn about $600 (current value of BTC) proving it. Otherwise, no.

BTW, Bob also points to:

Here’s Cryptographic Proof That Donna Brazile Is Wrong, WikiLeaks Emails Are Real by Luke Rosiak, which includes this Python code to verify the emails:

clinton-python-email-460

and,

Verifying Wikileaks DKIM-Signatures by teknotus, offers this manual approach for testing the signatures:

clinton-sig-check-460

But those are all one-off methods and there are thousands of emails.

But the post by teknotus goes on:

Preliminary results

I only got signature validation on some of the emails I tested initially but this doesn’t necessarily invalidate them as invisible changes to make them display correctly on different machines done automatically by browsers could be enough to break the signatures. Not all messages are signed. Etc. Many of the messages that failed were stuff like advertising where nobody would have incentive to break the signatures, so I think I can safely assume my test isn’t perfect. I decided at this point to try to validate as many messages as I could so that people researching these emails have any reference point to start from. Rather than download messages from wikileaks one at a time I found someone had already done that for the Podesta emails, and uploaded zip files to Archive.org.

Emails 1-4160
Emails 4161-5360
Emails 5361-7241
Emails 7242-9077
Emails 9078-11107

It only took me about 5 minutes to download all of them. Writing a script to test all of them was pretty straightforward. The program dkimverify just calls a python function to test a message. The tricky part is providing context, and making the results easy to search.

Automated testing of thousands of messages

It’s up on Github

It’s main output is a spreadsheet with test results, and some metadata from the message being tested. Results Spreadsheet 1.5 Megs

It has some significant bugs at the moment. For example Unicode isn’t properly converted, and spreadsheet programs think the Unicode bits are formulas. I also had to trap a bunch of exceptions to keep the program from crashing.

Warning: I have difficulty opening the verify.xlsx file. In Calc, Excel and in a CSV converter. Teknotus reports it opens in LibreOffice Calc, which just failed to install on an older Ubuntu distribution. Sing out if you can successfully open the file.

Journalists: Are you going to validate Podesta emails that you cite? Or that others claim are false/modified?

October 15, 2016

Why Journalists Should Not Rely On Wikileaks Indexing – Podesta Emails

Filed under: Journalism,News,Reporting,Wikileaks — Patrick Durusau @ 3:58 pm

Clinton on Fracking, or, Another Reason to Avoid Wikileaks Indexing

fracking-podesta-460

The quote in the tweet is false.

Politico supplies the correct quotation in its post:


“Bernie Sanders is getting lots of support from the most radical environmentalists because he’s out there every day bashing the Keystone pipeline. And, you know, I’m not into it for that,” Clinton told the unions, according to the transcript. “My view is, I want to defend natural gas. … I want to defend fracking under the right circumstances.”

I’m guessing that “…under the right circumstances.” must have pushed Wikileaks too close to the 140 character barrier.

Ditto for the Wikileaks mis-quote of: “Get a life.”

Which reported as in the tweet, appears to refer to unbridled fracking.

Not so in the Politico post:


“I’m already at odds with the most organized and wildest” of the environmental movement, Clinton told building trades unions in September 2015, according to a transcript of the remarks apparently circulated by her aides. “They come to my rallies and they yell at me and, you know, all the rest of it. They say, ‘Will you promise never to take any fossil fuels out of the earth ever again?’ No. I won’t promise that. Get a life, you know.”

Doesn’t read quite the same way does it?

I supposed once you start lying it’s really hard to stop. Clinton is a good example of that and Wikileaks should not follow her example.

It’s hard to spot these lies because Wikileaks isn’t indexing the attachments.

You can search all day for “defend fracking,” “get a life” (by Clinton) and you will come up empty (at least as of today).

So that you don’t have to search for: 20150909 Transcript | Building Trades Union (Keystone XL) at Wikileaks – Podesta Emails, I have produced a PDF version of that attachment, Building-Trades-Union-Clinton-Sept-09-2015.pdf (my naming), for your viewing pleasure.

October 13, 2016

George Carlin’s Seven Dirty Words in Podesta Emails – Discovered 981 Unindexed Documents

Filed under: Government,Hillary Clinton,Humor,Journalism,News,Reporting — Patrick Durusau @ 10:42 am

While taking a break from serious crunching of the Podesta emails I discovered 981 unindexed documents at Wikileaks!

https://www.youtube.com/watch?v=FMkNsMMvrqk

Try searching for Carlin’s seven dirty words at The Podesta Emails:

  • shit – 44
  • piss – 19
  • fuck – 13
  • cunt – 0
  • cocksucker – 0
  • motherfucker – 0 (?)
  • tits – 0

I have a ? after “motherfucker” because working with the raw files I show one (1) hit for “motherfucker” and one (1) hit for “motherfucking.” Separate emails.

For “motherfucker,” American Sniper–the movie, responded to by Chris Hedges – From:magazine@tikkun.org To: Podesta@Law.Georgetown.Edu

For “motherfucking,” H4A News Clips 5.31.15 – From/To: aphillips@hillaryclinton.com.

“Motherfucker” and “motherfucking” occur in text attachments to emails, which Wikileaks does not search.

If you do a blank search for file attachments, Wikileaks reports there are 2427 file attachments.

Searching the Podesta emails at Wikileaks excludes the contents of 2427 files from your search results.

How significant is that?

Hmmm, 302 pdf, 501 docx, 167 doc, 12 xls, 9 xlsx – 981 documents excluded from your searches at Wikileaks.

For 9,011 emails, as of AM today, my local.

How comfortable are you with not searching those 981 documents? (Or additional documents that may follow?)

October 11, 2016

Parsing Foreign Law From News Reports (Warning For Journalists)

Filed under: Journalism,News,Reporting — Patrick Durusau @ 6:48 pm

Cory Doctorow‘s headline: Scotland Yard charge: teaching people to use crypto is an act of terrorism red-lined my anti-government biases.

I tend towards “unsound” reactions when free speech is being infringed upon.

But my alarm and perhaps yours as well. was needlessly provoked in this case.

Cory writes:


In other words, according to Scotland Yard, serving a site over HTTPS (as this one is) and teaching people to use crypto (as this site has done) and possessing a secure OS (as I do) are acts of terrorism or potential acts of terrorism. In some of the charges, the police have explicitly connected these charges with planning an act of terrorism, but in at least one of the charges (operating a site served over HTTPS and teaching people about crypto) the charge lacks this addendum — the mere act is considered worthy of terrorism charges.

The concern over:


but in at least one of the charges (operating a site served over HTTPS and teaching people about crypto) the charge lacks this addendum — the mere act is considered worthy of terrorism charges.

is mis-placed.

Cory points to the original report here: Man arrested on Cardiff street to face six terror charges by Viram Dodd.

Cory’s alarm is not repeated by Dodd:


Ullah has been charged with directing terrorism, providing training in encryption programs knowing the purpose was for terrorism, and using his blog site to provide such training. His activities are alleged to have “the intention of assisting another or others to commit acts of terrorism”.

Beyond that (I haven’t seen the charging document), be aware that under English Criminal Procedure, the “charge” on which Cory places so much weight is defined as:

uk-charge-460

Pay particular attention to 7.3(1)(a)(i) (page 65):

…describes the offense in ordinary language, and…

A “charge” isn’t a technical specification of an offense under English criminal procedure. Which means you attach legal significance to charging language at your own peril. And to the detriment of your readers.

PS: I have contacted the Westminster Magistrates’ Court and requested a copy of the charging document. If and when that arrives, I will update this post with it.

October 6, 2016

Arabic/Russian Language Internet

Filed under: Journalism,News,Reporting — Patrick Durusau @ 1:01 pm

No matter the result of the 2016 US presidential election, mis-information on areas where Arabic and/or Russian are spoken will increase.

If you are creating topic maps and/or want to do useful reporting on such areas consider:

How to get started investigating the Arabic-language internet by Tom Trewinnard, or,

How to get started investigating the Russian-language internet by Aric Toler.

Any hack can quote releases from official sources and leave their readers uninformed.

A journalist takes monotone “facts” from an “official” release and weaves a story of compelling interest to their readers.

Any other guides to language/country specific advice for journalists?

September 27, 2016

How media coverage of terrorism endorses a legal [4-Ply or More] standard

Filed under: Journalism,News,Reporting — Patrick Durusau @ 8:45 pm

How media coverage of terrorism endorses a legal double standard by Rafia Zakaria.

From the post:

On June 17, 2016, Dylann Roof entered a predominantly black church In Charleston, South Carolina, and opened fire. When he was done, nine people lay dead around him. For a few days after Roof’s grisly act, a debate raged in the media over whether the committed white supremacist and mass murderer should be considered a terrorist. Many, including The Washington Post’s Philip Bump, vehemently opposed the label, insisting that even though the Justice Department had dubbed Roof’s killing spree “an act of domestic terrorism,” calling Roof a terrorist would confer upon him the very notoriety he sought.

Like other journalists and analysts, Bump analyzed the sociological and ethical dimensions of the terror label, concerns about whether all who terrify are terrorists, and whether the wider application of the label somehow lessens the potency of the evil it represents. However, like nearly all other journalists who write about terrorism, Bump missed the most crucial point concerning the media’s use of the term: that American law does not currently recognize “domestic terror” as a crime. For an act, however bloody and hateful, to be considered terrorism in the United States, it must be connected to a “foreign” terror organization.

Rafia makes an important point about the “pass” being given to white supremacists, while law abiding Muslims are viewed with suspicion if not being actively persecuted in the United States.

But Rafia misses the opportunity to point to the more than double standard in place for use of “terrorism” and “terrorist.”

What label other than “terrorist” would you apply to the unknown military personnel who attack a known hospital? It has been alleged those responsible have been punished, but then without transparency, how do we know?

Or even the garden variety cruise missile or drone attacks that end the lives of innocents with every strike. Aren’t those acts of terrorism?

Or does “terrorism” require a non-U.S. government actor?

Does that mean only the U.S. government?

How “terrorized” would you be by a phone call followed a “knock” by a missile on your roof, ordering you to leave immediately?

The claim that is “designed to minimize civilian casualties,” sounds like a quote from a modern day Marquis de Sade.

A little introspection by the media could explode the dishonest and manipulative use of the labels “terrorist” and “terrorism.”

Let’s hope that happens sooner rather than later.

Bulk Access to the Colin Powell Emails – Update

Filed under: Colin Powell Emails,Government,Journalism,News,Politics,Reporting — Patrick Durusau @ 7:31 pm

Still working on finding a host for the 2.5 GB tarred, gzipped archive of the Colin Powell emails.

As an alternative, working on splitting the attachments (the main source of bulk) from the emails themselves.

My thinking at this point is to produce a message-only version of the emails. Emails with attachments will have auto-generated links to the source emails at DCLeaks.com.

Other processing is planned for the message-only version of the emails.

Anyone interested in indexing the attachments? Generating lists of those with pointers shouldn’t be a problem.

Hope to have more progress to report tomorrow!

September 26, 2016

Value-Add Of Wikileaks Hillary Clinton Email Archive?

Filed under: Government,Hillary Clinton,Journalism,News,Politics,Reporting — Patrick Durusau @ 12:20 pm

I was checking Wikileaks today for any new document drops on Hillary Clinton, but only found:

WikiLeaks offers award for #LabourLeaks

Trade in Services Agreement

Assange Medical and Psychological Records

The lesson from the last item is to always seek asylum in a large embassy, preferably one with a pool. You can search at Embassies by embassy for what country, located in what other country. I did not see an easy way to search for size and accommodations.

Oh, not finding any new data on Hillary Clinton, I checked the Hillary Clinton Email Archive at Wikileaks:

wikileaks-hillary-460

Compare that to the State Department FOIA server for Clinton_Email:

state-dept-hillary-460

Do you see a value-add to Wikileaks re-posting the State Department’s posting of Hillary’s emails?

If yes, please report in comments below the value-add you see. (Thanks in advance.)

If not, what do you think would be a helpful value-add to the Hillary Clinton emails? (Suggestions deeply appreciated.)

September 25, 2016

Colin Powell Email Files

Filed under: Colin Powell Emails,Government,Journalism,News,Politics,Reporting — Patrick Durusau @ 8:43 pm

DCLeaks.com posted on September 14, 2016, a set of emails to and from Colin Luther Powell.

From the homepage for those leaked emails:

Colin Luther Powell is an American statesman and a retired four-star general in the United States Army. He was the 65th United States Secretary of State, serving under U.S. President George W. Bush from 2001 to 2005, the first African American to serve in that position. During his military career, Powell also served as National Security Advisor (1987–1989), as Commander of the U.S. Army Forces Command (1989) and as Chairman of the Joint Chiefs of Staff (1989–1993), holding the latter position during the Persian Gulf War. Born in Harlem as the son of Jamaican immigrants, Powell was the first, and so far the only, African American to serve on the Joint Chiefs of Staff, and the first of two consecutive black office-holders to serve as U.S. Secretary of State.

The leaked emails start in June of 2014 and end in August of 2016.

Access to the emails is by browsing and/or full text searching.

Try your luck at finding Powell’s comments on Hillary Clinton or former Vice-President Cheney. Searching one chunk of emails at a time.

I appreciate and admire DCLeaks for taking the lead in posting this and similar materials. And I hope they continue to do so in the future.

However, the access offered reduces a good leak to a random trickle.

This series will use the Colin Powell emails to demonstrate better leaking practices.

Coming Monday, September 26, 2016 – Bulk Access to the Colin Powell Emails.

September 23, 2016

14 free digital tools that any newsroom can use

Filed under: Journalism,News,Reporting — Patrick Durusau @ 8:54 pm

14 free digital tools that any newsroom can use by Sara Olstad.

From the post:

ICFJ’s Knight Fellows are global media innovators who foster news innovation and experimentation to deepen coverage, expand news delivery and better engage citizens. As part of their work, they’ve created tools that they are eager to share with journalists worldwide.

Their projects range from Push, a mobile app for news organizations that don’t have the time, money or resources to build their own, to Salama, a tool that assesses a reporter’s risk and recommends ways to stay safe. These tools and others developed by Knight Fellows can help news organizations everywhere find stories in complex datasets, better distribute their content and keep their journalists safe from online and physical security threats.

As part of the 2016 Online News Association conference, try out these 14 digital tools that any newsroom can use. If you adopt any of these tools or lead any new projects inspired by them, tweet about it to @ICFJKnight.

I was mis-led by the presentation of the “14 free digital tools.”

The box where African Network of Centers for Investigative Reporting (ANCIR) and Aleph appear has a scroll marker on the right hand side.

I’m not sure why I missed it or why the embedding of a scrolling box is considered good page design.

But the tools themselves merit your attention.

Enjoy!

5 lessons on the craft of journalism from Longform podcast

Filed under: Journalism,News,Reporting — Patrick Durusau @ 4:37 pm

5 lessons on the craft of journalism from Longform podcast by Joe Freeman.

From the post:

AT FIRST I WAS RELUCTANT to dive into the Longform podcast, a series of interviews with nonfiction writers and journalists that recently produced its 200th episode. The reasons for my wariness were petty. What sane freelancer wants to listen to highly successful writers and editors droning on about their awards and awesome careers? Not this guy! But about a year ago, I succumbed, and quickly became a thankful convert. The more I listened, the more I realized that the show, started in 2012 on the website Longform.org and produced in collaboration with The Atavist, was a veritable goldmine of information. It’s almost as if the top baseball players in the country sat down every week and casually explained how to hit home runs.

Whether they meant to or not, the podcast’s creators and interviewers—Aaron Lammer, Max Linsky, and Evan Ratliff—have produced a free master class on narrative reporting, with practitioners sharing tips and advice about the craft and, crucially, the business. As a journalist, I’ve learned a lot listening to the podcast, but a few consistent themes emerge that I have distilled into five takeaways from specific interviews.

(emphasis in original)

I’m impressed with Joe’s five takeaways but as I sit here repackaging leaked data, there is one common characteristic I would emphasize:

They all involve writing!

That is the actual production of content.

Not plans for content.

Not models for content.

Not abstractions for content.

Content.

Not to worry, I intend to keep my tools/theory edge but in addition to adding Longform podcast to my listening list, I’m going to try to produce more data content as well.

I started off with that intention using XQuery at the start of this year, a theme that is likely to re-appear in the near future.

Enjoy!

September 20, 2016

Betraying Snowden:… [Cynical, but not odd]

Filed under: Journalism,News,NSA,Reporting — Patrick Durusau @ 6:30 pm

Betraying Snowden: There’s a special place in journalism hell for The Washington Post editorial board by Daniel Denvir.

From the post:

There is a special place in journalism hell reserved for The Washington Post editorial board now that it has called on President Barack Obama to not pardon National Security Agency whistleblower Edward Snowden.

As Glenn Greenwald wrote, it’s an odd move for a news publication, “which owes its sources duties of protection, and which — by virtue of accepting the source’s materials and then publishing them — implicitly declares the source’s information to be in the public interest.” Notably, the Post decided to “inexcusably omit . . . that it was not Edward Snowden, but the top editors of the Washington Post who decided to make these programs public,” as Greenwald added.

The Post’s peculiar justification is as follows: While the board grudgingly conceded that reporters, thanks to Snowden, revealed that the NSA’s collection of domestic telephone metadata — which “was a stretch, if not an outright violation, of federal surveillance law” — it condemns him for revealing “a separate overseas NSA Internet-monitoring program, PRISM, that was both clearly legal and not clearly threatening to privacy.”

Washington Post opposition to a pardon for Edward Snowden isn’t odd at all.

Which story generates more PR for the Washington Post:

  1. The Washington Post, having won a Pulitzer prize due to Edward Snowden, joins a crowd calling for his pardon?
  2. The Washington Post, having won a Pulitzer prize due to Edward Snowden, opposes his being pardoned?

It’s not hard to guess which one generates more ad-views and therefore the potential for click-throughs.

I have no problems with the disclosure of PRISM, save for Snowden having to break his word as a contractor to keep his client’s secrets, well, secret.

No one could be unaware that the NSA engages in illegal and immoral activity on a daily basis before agreeing to be employed by them.

Although Snowden has done no worse than his former NSA employers, it illustrates why I have no trust in government agencies.

If they are willing to lie for what they consider to be “good” reasons to you, then they are most certainly willing to lie to me.

Once it is established that an agency, take the NSA for example, has lied on multiple occasions, on what basis would you trust them to be telling the truth today?

Their assurance, “we’re not lying this time?” That seems rather tenuous.

Same rule should apply to contractors who lie to or betray their clients.

September 12, 2016

Inside the fight to reveal the CIA’s torture secrets [Support The Guardian]

Filed under: Government,Government Data,Journalism,News,Politics,Reporting,Transparency — Patrick Durusau @ 3:19 pm

Inside the fight to reveal the CIA’s torture secrets by Spencer Ackerman.

Part one: Crossing the bridge

Part two: A constitutional crisis

Part three: The aftermath

Ackerman captures the drama of a failed attempt by the United States Senate to exercise oversight on the Central Intelligence Agency (CIA) in this series.

I say “failed attempt” because even if the full 6,200+ page report is ever released, the lead Senate investigator, Daniel Jones, obscured the identities of all the responsible CIA personnel and sources of information in the report.

Even if the full report is serialized in your local newspaper, the CIA contractors and staff guilty of multiple felonies, will be not one step closer to being brought to justice.

To that extent, the “full” report is itself a disservice to the American people, who elect their congressional leaders and expect them to oversee agencies such as the CIA.

From Ackerman’s account you will learn that the CIA can dictate to its overseers, the location and conditions under which it can view documents, decide which documents it is allowed to see and in cases of conflict, the CIA can spy on the Select Senate Committee on Intelligence.

Does that sound like effective oversight to you?

BTW, you will also learn that members of the “most transparent administration in history” aided and abetted the CIA in preventing an effective investigation into the CIA and its torture program. I use “aided and abetted” deliberately and in their legal sense.

I mention in my header that you should support The Guardian.

This story by Spencer Ackerman is one reason.

Another reason is that given the plethora of names and transfers recited in Ackerman’s story, we need The Guardian to cover future breaks in this story.

Despite the tales of superhuman security, nobody is that good.

I leave you with the thought that if more than one person knows a secret, then it it can be discovered.

Check Ackerman’s story for a starting list of those who know secrets about the CIA torture program.

Good hunting!

September 4, 2016

Plugins for Newsgathering and Verification

Filed under: Journalism,News,Reporting — Patrick Durusau @ 7:55 pm

7 vital browser plugins for newsgathering and verification by Alastair Reid.

From the post:

When breaking news can travel the world in seconds, it is important for journalists to have the tools at their disposal to get to work fast. When searching the web, what quicker way is there to have those tools available than directly in the browser window?

Most browsers have a catalogue of programs and software to make your browsing experience more powerful, like a smartphone app store. At First Draft we find Google’s Chrome browser is the most effective but there are obviously other options available.

Text says “five” but this has been updated to include “seven” plugins.

One of the updates is: Frame by Frame for YouTube, which like the name says, enables frame by frame viewing, is touted for verification.

I can think of a number of uses for frame-by-frame viewing. You?

See Alastair’s post for the rest and follow @firstdraftnews to stay current on digital tools for journalists.

September 2, 2016

Best and Worst Journalism of August 2016 [An Exercise]

Filed under: Journalism,News,Reporting — Patrick Durusau @ 9:18 pm

The best and worst journalism of August 2016 by David Uberti.

Before you read Uberti’s post:

Take a few minutes to find stories you recall from August and sort them into best and worst, along with your reasons.

It’s one thing to passively go along with the judgment of others, it takes real effort to form a judgment of your own.

Now, compare your stories to Uberti’s.

Same, different? Were your reasons different?

What stories did Uberti “miss?”

PS: The boosterism of the New York Times for Iraqi militias merits a “worst” place, at least to me.

Journalism Drone Operations Manual

Filed under: Journalism,News,Reporting — Patrick Durusau @ 3:12 pm

CoJMC’s Drone Journalism Lab launches drone operations manual

From the webpage:

To help newsrooms get started using drones for journalism, the Drone Journalism Lab at the University of Nebraska-Lincoln is releasing the “The Drone Journalism Lab Operations Manual,” a guide that covers everything from pre-flight checklists to ethical considerations.

A first of its kind, the manual is free, Creative Commons licensed and provided as an open source document online. The Drone Journalism Lab created it with support from the John S. and James L. Knight Foundation.

“As journalists look to become more relevant and responsive to community needs, this manual is an important step towards experimenting with new ways of gathering and presenting news and information. It is a resource for best practices and an exciting invitation to explore a fresh, emerging area of the field,” said Shazna Nessa, Knight Foundation director for journalism.

Dr. Maria Marron, dean of the College of Journalism and Mass Communications, praised Professor Matt Waite for producing the operations manual.

“Matt is a key innovator in journalism,” she said. “It was his prescience about the potential for drones in journalism that made UNL’s Drone Journalism Lab the leader in the field. The operations manual will be the go-to resource for anyone interested in using drones for journalistic purposes.”

Link for the manual: https://www.dropbox.com/sh/32pi2e2gv6huyzg/AAAwGq7b1mO5ekikCn-7JFiMa?dl=0.

What a great resource!

A great template for how to describe your use of drones for journalism.

« Newer PostsOlder Posts »

Powered by WordPress