Archive for March, 2018

Phishing, The 43% Option

Sunday, March 11th, 2018

How’s that for a motivational poster?

You can, and some do, spend hours plumbing in the depths of code or chip design for vulnerabilities.

Or, you can look behind door #2, the phishing door, and find 43% of data breaches start with phishing.

Phishing doesn’t have the glamor or prestige of finding a Meltdown or Spectre bug.

But, on the other hand, do you want to breach a congressional email account for the 2018 mid-term election, or for the 2038 election?

Just so you know, no rumors of breached congressional email accounts have surfaced, at least not yet.

Ping me if you see any such news.

PS: The tweet points to:, an ad for AT&T.

Spreading “Fake News,” Science Says It Wasn’t Russian Bots

Sunday, March 11th, 2018

The spread of true and false news online by Soroush Vosoughi, Deb Roy, and Sinan Aral. (Science 09 Mar 2018: Vol. 359, Issue 6380, pp. 1146-1151 DOI: 10.1126/science.aap9559)


We investigated the differential diffusion of all of the verified true and false news stories distributed on Twitter from 2006 to 2017. The data comprise ~126,000 stories tweeted by ~3 million people more than 4.5 million times. We classified news as true or false using information from six independent fact-checking organizations that exhibited 95 to 98% agreement on the classifications. Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information, and the effects were more pronounced for false political news than for false news about terrorism, natural disasters, science, urban legends, or financial information. We found that false news was more novel than true news, which suggests that people were more likely to share novel information. Whereas false stories inspired fear, disgust, and surprise in replies, true stories inspired anticipation, sadness, joy, and trust. Contrary to conventional wisdom, robots accelerated the spread of true and false news at the same rate, implying that false news spreads more than the truth because humans, not robots, are more likely to spread it.

Real data science. The team had access to all the Twitter data and not a cherry-picked selection, which of course can’t be shared due to Twitter rules, or so say ISIS propaganda scholars.

The paper merits a slow read but highlights for the impatient:

  1. Don’t invest in bots or high-profile Twitter users for the 2018 mid-term elections.
  2. Craft messages with a high novelty factor that disfavor your candidates opponents.
  3. Your messages should inspire fear, disgust and surprise.

Democrats working hard to lose the 2018 mid-terms will cry you a river about issues, true facts, engagement on the issues and a host of other ideas used to explain losses to losers.

There’s still time to elect a progressive Congress in 2018.

Are you game?

Contesting the Right to Deliver Disinformation

Thursday, March 8th, 2018

Eric Singerman reports on a recent conference titled: Understanding and Addressing the Disinformation Ecosystem.

He summarizes the conference saying:

The problem of mis- and disinformation is far more complex than the current obsession with Russian troll factories. It’s the product of the platforms that distribute this information, the audiences that consume it, the journalist and fact-checkers that try to correct it – and even the researchers who study it.

In mid-December, First Draft, the Annenberg School of Communication at the University of Pennsylvania and the Knight Foundation brought academics, journalists, fact-checkers, technologists and funders together in a two-day workshop to discuss the challenges produced by the current disinformation ecosystem. The convening was intended to highlight relevant research, share best-practices, identify key questions of scholarly and practical concern and outline a potential research agenda designed to answer these questions.

In preparation for the workshop, a number of attendees prepared short papers that could act as starting points for discussion. These papers covered a broad range of topics – from the ways that we define false and harmful content, to the dystopian future of computer-generated visual disinformation.

Download the papers here.

Singerman points out the very first essay concedes that “fake news” isn’t anything new. Although I would read Schudson and Zelizer (authors of the first paper) with care. They contend:

Fake news lessened in centrality only in the late 1800s as printed news, particularly in Britain and the United States, came to center on what Jean Chalaby called “fact-centered discursive practices” and people realized that newspapers could compete with one another not simply on the basis of partisan affiliation or on the quality of philosophical and political essays but on the immediacy and accuracy of factual reports (Chalaby 1996).

I’m sorry, that’s just factually incorrect. The 1890’s were the age of “yellow journalism,” a statement confirmed by the Digital Library of America‘s resource collection: Fake News in the 1890s: Yellow Journalism:

Alternative facts, fake news, and post-truth have become common terms in the contemporary news industry. Today, social media platforms allow sensational news to “go viral,” crowdsourced news from ordinary people to compete with professional reporting, and public figures in offices as high as the US presidency to bypass established media outlets when sharing news. However, dramatic reporting in daily news coverage predates the smartphone and tablet by over a century. In the late nineteenth century, the news media war between Joseph Pulitzer’s New York World and William Randolph Hearst’s New York Journal resulted in the rise of yellow journalism, as each newspaper used sensationalism and manipulated facts to increase sales and attract readers.

Many trace the origin of yellow journalism to coverage of the sinking of the USS Maine in Havana Harbor on February 15, 1898, and America’s entry in the Spanish-American War. Both papers’ reporting on this event featured sensational headlines, jaw-dropping images, bold fonts, and aggrandizement of facts, which influenced public opinion and helped incite America’s involvement in what Hearst termed the “Journal’s War.”

The practice, and nomenclature, of yellow journalism actually predates the war, however. It originated with a popular comic strip character known as The Yellow Kid in Hogan’s Alley. Created by Richard F. Outcault in 1895, Hogan’s Alley was published in color by Pulitzer’s New York World. When circulation increased at the New York World, William Randolph Hearst lured Outcault to his newspaper, the New York Journal. Pulitzer fought back by hiring another artist to continue the comic strip in his newspaper.

The period of peak yellow journalism by the two New York papers ended in the late 1890s, and each shifted priorities, but still included investigative exposés, partisan political coverage, and other articles designed to attract readers. Yellow journalism, past and present, conflicts with the principles of journalistic integrity. Today, media consumers will still encounter sensational journalism in print, on television, and online, as media outlets use eye-catching headlines to compete for audiences. To distinguish truth from “fake news,” readers must seek multiple viewpoints, verify sources, and investigate evidence provided by journalists to support their claims.

You can see the evidence relied upon by the DPLA for its claims about yellow dog journalism here: Fake News in the 1890s: Yellow Journalism.

Why Schudson and Zelizer thought Chalaby, J. “Journalism as an Anglo-American Invention,” European Journal of Communication 11 (3), 1996, 303-326, supported their case isn’t clear.

If you read the Chalaby article, you find it is primarily concerned with contrasting the French press with Anglo-American practices, a comparison in which the French come off a distant second best.

More to the point, the New York World, the New York Journal, nor yellowdog journalism appears anywhere in the Chalaby article. Check for yourself: Journalism as an Anglo-American Invention.

Chalaby does claim the origin of “fact-centered discursive practices” in the 1890’s but the absence of any mention of journalism that lead to the Spanish-American war, casts doubt on how much we should credit Chalaby’s knowledge of US journalism.

I haven’t checked the other footnotes of Schudson and Zelizer, I leave that as an exercise for interested readers.

I do think Schudson and Zelizer capture the main driver of concern over “fake news” when they say:

First, there is a great anxiety today about the border between professional journalists and others who through digital media have easy access to promoting their ideas, perspectives, factual reports, pranks, inanities, conspiracy theories, fakes and lies….

Despite being framed as a contest between factual reporting and disinformation, the dispute over disinformation/fake news is over the right to profit from disinformation/fake news.

If you need a modern example of yellow journalism, consider the ongoing media frenzy over Russian “interference” in US elections.

How often do you hear reports of context that include instances of US-sponsored assassinations, funded and armed government overthrows, active military interference with both elections and governments, by the US?

What? Some Russians bought Facebook ads and used election hashtags on Twitter? That compares to overthrowing other governments? The long history of the U.S. interfering with elections elsewhere. (tip of the iceberg)

The constant hyperbole in the “Russian interference” story is a clue that journalists and social media are re-enacting the roles played by the New York World and the New York Journal, which lead to the Spanish-American war.

Truth be told, we should thank social media for the free distribution of disinformation, previously available only by subscription.

Discerning what is or is not accurate information, as always, falls on the shoulders of readers. It has ever been thus.

Confluence: Mapping @apachekafka connect schema types – to usual suspects

Thursday, March 8th, 2018

Confluence has posted a handy mapping from Kafka connect schema types to MySQL, Oracle, PostgreSQL, SQLite, SQL Server and Vertica.

The sort of information that I will waste 10 to 15 minutes every time I need it. Posting it here means I’ll cut the wasted time down to maybe 5 minutes if I remember I posted about it. 😉

Digital Public Library of America (DPLA) Has New Website!

Thursday, March 8th, 2018

Announcing the Launch of our New Website (the chest beating announcement)

From the post:

The Digital Public Library of America (DPLA) is pleased to unveil its all-new redesigned website, now live at Created in collaboration with renowned design firm Postlight, DPLA’s new website is more user-centered than ever before, with a focus on the tools, resources, and information that matter most to DPLA researchers and learners of all kinds. In a shift from the former site structure, content that primarily serves DPLA’s network of partners and others interested in deeper involvement with DPLA can now be found on DPLA Pro.

You can boil the post down to two links: DPLA (DPLA Resources) and DPLA Pro (helping DPLA build and spread resources). What more needs to be said?

Oh, yeah, donate to support the DPLA!

Numba Versus C++ – On Wolfram CAs

Tuesday, March 6th, 2018

Numba Versus C++ by David Butts, Gautham Dharuman, Bill Punch and Michael S. Murillo.

Python is a programming language that first appeared in 1991; soon, it will have its 27th birthday. Python was created not as a fast scientific language, but rather as a general-purpose language. You can use Python as a simple scripting language or as an object-oriented language or as a functional language…and beyond; it is very flexible. Today, it is used across an extremely wide range of disciplines and is used by many companies. As such, it has an enormous number of libraries and conferences that attract thousands of people every year.

But, Python is an interpreted language, so it is very slow. Just how slow? It depends, but you can count on about 10-100 times as slow as, say, C/C++. If you want fast code, the general rule is: don’t use Python. However, a few more moments of thought lead to a more nuanced perspective. What if you spend most of the time coding, and little time actually running the code? Perhaps your familiarity with the (slow) language, or its vast set of libraries, actually saves you time overall? And, what if you learned a few tricks that made your Python code itself a bit faster? Maybe that is enough for your needs? In the end, for true high performance computing applications, you will want to explore fast languages like C++; but, not all of our needs fall into that category.

As another example, consider the fact that many applications use two languages, one for the core code and one for the wrapper code; this allows for a smoother interface between the user and the core code. A common use case is C or C++ wrapped by, of course, Python. As a user, you may not even know that the code you are using is in another language! Such a situation is referred to as the “two-language problem”. This situation is great provided you don’t need to work in the core code, or you don’t mind working in two languages – some people don’t mind, but some do. The question then arises: if you are one of those people who would like to work only in the wrapper language, because it was chosen for its user friendliness, what options are available to make that language (Python in this example) fast enough that it can also be used for the core code?

We wanted to explore these ideas a bit further by writing a code in both Python and C++. Our past experience suggested that while Python is very slow, it could be made about as fast as C using the crazily-simple-to-use library Numba. Our basic comparisons here are: basic Python, Numba and C++. Because we are not religious about Python, and you shouldn’t be either, we invited expert C++ programmers to have the chance to speed up the C++ as much as they could (and, boy could they!).

This webpage is highly annoying, in both Mozilla and Chrome. You’ll have to visit to get the full impact.

It is, however, also a great post on using Numba to obtain much faster results while still using Python. The use of Wolfram CAs (cellular automata) as examples is an added bonus.


An Interactive Timeline of the Most Iconic Infographics

Thursday, March 1st, 2018

Map of Firsts: An Interactive Timeline of the Most Iconic Infographics by R. J. Andrews.

Careful with this one!

You might learn some history as well as discovering an infographic for your next project!


MSDAT: Microsoft SQL Database Attacking Tool

Thursday, March 1st, 2018

MSDAT: Microsoft SQL Database Attacking Tool

From the webpage:

MSDAT (Microsoft SQL Database Attacking Tool) is an open source penetration testing tool that tests the security of Microsoft SQL Databases remotely.

Usage examples of MSDAT:

  • You have a Microsoft database listening remotely and you want to find valid credentials in order to connect to the database
  • You have a valid Microsoft SQL account on a database and you want to escalate your privileges
  • You have a valid Microsoft SQL account and you want to execute commands on the operating system hosting this DB (xp_cmdshell)

Tested on Microsoft SQL database 2005, 2008 and 2012.

As I mentioned yesterday, you may have to wait a few years until the Office of Personnel Management (OMP) upgrades to a supported version of Microsoft SQL database, but think of the experience you will have gained with MSDAT by that time.

And by the time the OPM upgrades, new critical security flaws will emerge in Microsoft SQL database 2005, 2008 and 2012. Under current management, the OPM is becoming less and less secure over time.

Would it help if I posed a street/aerial view of OPM headquarters in DC? Would that help focus your efforts at dropping infected USB sticks, malware loaded DVDs and insecure sex toys for OPM management to find?

OPM headquarters is not marked on the standard tourist map for DC. The map does suggest a number of other fertile places for your wares.