Twitter As Investment Tool

May 21st, 2015

Social Media, Financial Algorithms and the Hack Crash by Tero Karppi and Kate Crawford.


@AP: Breaking: Two Explosions in the White House and Barack Obama is injured’. So read a tweet sent from a hacked Associated Press Twitter account @AP, which affected financial markets, wiping out $136.5 billion of the Standard & Poor’s 500 Index’s value. While the speed of the Associated Press hack crash event and the proprietary nature of the algorithms involved make it difficult to make causal claims about the relationship between social media and trading algorithms, we argue that it helps us to critically examine the volatile connections between social media, financial markets, and third parties offering human and algorithmic analysis. By analyzing the commentaries of this event, we highlight two particular currents: one formed by computational processes that mine and analyze Twitter data, and the other being financial algorithms that make automated trades and steer the stock market. We build on sociology of finance together with media theory and focus on the work of Christian Marazzi, Gabriel Tarde and Tony Sampson to analyze the relationship between social media and financial markets. We argue that Twitter and social media are becoming more powerful forces, not just because they connect people or generate new modes of participation, but because they are connecting human communicative spaces to automated computational spaces in ways that are affectively contagious and highly volatile.

Social sciences lag behind the computer sciences in making their publications publicly accessible as well as publishing behind firewalls so I can report on is the abstract.

On the other hand, I’m not sure how much practical advice you could gain from the article as opposed to the volumes of commentary following the incident itself.

The research reminds me of Malcolm Gladwell, author of The Tipping Point and similar works.

While I have greatly enjoyed several of Gladwell’s books, including the Tipping Point, it is one thing to look back and say: “Look, there was a tipping point.” It is quite another to be in the present and successfully say: “Look, there is a tipping point and we can make it tip this way or that.”

In retrospect, we all credit ourselves with near omniscience when our plans succeed and we invent fanciful explanations about what we knew or realized at the time. Others, equally skilled, dedicated and competent, who started at the same time, did not succeed. Of course, the conservative media (and ourselves if we are honest), invent narratives to explain those outcomes as well.

Of course, deliberate manipulation of the market with false information, via Twitter or not, is illegal. The best you can do is look for a pattern of news and/or tweets that result in downward changes in a particular stock, which then recovers and then apply that pattern more broadly. You won’t make $millions off of any one transaction but that is the sort of thing that draws regulatory attention.

LogJam – Postel’s Law In Action

May 21st, 2015

The seriousness of the LogJam vulnerability was highlighted by John Leyden in Average enterprise ‘using 71 services vulnerable to LogJam’

Based on analysis of 10,000 cloud applications and data from more than 17 million global cloud users, cloud visibility firm Skyhigh Networks reckons that 575 cloud services are potentially vulnerable to man-in-the middle attacks. The average company uses 71 potentially vulnerable cloud services.

[Details from Skyhigh Networks]

The historical source of LogJam?

James Maude, security engineer at Avecto, said that the LogJam flaw shows how internet regulations and architecture decisions made more than 20 years ago are continuing to throw up problems.

“The LogJam issue highlights how far back the long tail of security stretches,” Maude commented. “As new technologies emerge and cryptography hardens, many simply add on new solutions without removing out-dated and vulnerable technologies. This effectively undermines the security model you are trying to build. Several recent vulnerabilities such as POODLE and FREAK have harnessed this type of weakness, tricking clients into using old, less secure forms of encryption,” he added.

Graham Cluley in Logjam vulnerability – what you need to know has better coverage of the history of weak encryption that resulted in the LogJam vulnerability.

What does that have to do with Postel’s Law?

TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others. [RFC761]

As James Maude noted earlier:

As new technologies emerge and cryptography hardens, many simply add on new solutions without removing out-dated and vulnerable technologies.

Probably not what Postel intended at the time but certainly more “robust” in one sense of the word, technologies remain compatible with other technologies that use vulnerable technologies.

In other words, robustness is responsible for the maintenance of weak encryption and hence the current danger from LogJam.

This isn’t an entirely new idea. Eric Allman (Sendmail), warns of security issues with Postel’s Law in The Robustness Principle Reconsidered: Seeking a middle ground:

In 1981, Jon Postel formulated the Robustness Principle, also known as Postel’s Law, as a fundamental implementation guideline for the then-new TCP. The intent of the Robustness Principle was to maximize interoperability between network service implementations, particularly in the face of ambiguous or incomplete specifications. If every implementation of some service that generates some piece of protocol did so using the most conservative interpretation of the specification and every implementation that accepted that piece of protocol interpreted it using the most generous interpretation, then the chance that the two services would be able to talk with each other would be maximized. Experience with the Arpanet had shown that getting independently developed implementations to interoperate was difficult, and since the Internet was expected to be much larger than the Arpanet, the old ad-hoc methods needed to be enhanced.

Although the Robustness Principle was specifically described for implementations of TCP, it was quickly accepted as a good proposition for implementing network protocols in general. Some have applied it to the design of APIs and even programming language design. It’s simple, easy to understand, and intuitively obvious. But is it correct.

For many years the Robustness Principle was accepted dogma, failing more when it was ignored rather than when practiced. In recent years, however, that principle has been challenged. This isn’t because implementers have gotten more stupid, but rather because the world has become more hostile. Two general problem areas are impacted by the Robustness Principle: orderly interoperability and security.

Eric doesn’t come to a definitive conclusion with regard to Postel’s Law but the general case is always difficult to decide.

However, the specific case, supporting encryption known to be vulnerable shouldn’t be.

If there were a programming principles liability checklist, one of the tick boxes should read:

___ Supports (list of encryption schemes), Date:_________

Lawyers doing discovery can compare lists of known vulnerabilities as of the date given for liability purposes.

Programmers would be on notice that supporting encryption with known vulnerabilities is opening the door to legal liability.

Format String Bug Exploration

May 20th, 2015

Format String Bug Exploration by AJ Kumar.

From the post:


The Format String vulnerability significantly introduced in year 2000 when remote hackers gain root access on host running FTP daemon which had anonymous authentication mechanism. This was an entirely new tactics of exploitation the common programming glitches behind the software, and now this deadly threat for the software is everywhere because programmers inadvertently used to make coding loopholes which are targeting none other than Format string attack. The format string vulnerability is an implication of misinterpreting the stack for handling functions with variable arguments especially in Printf function, since this article demonstrates this subtle bug in C programming context on windows operating system. Although, this class of bug is not operating system–specific as with buffer overflow attacks, you can detect vulnerable programs for Mac OS, Linux, and BSD. This article drafted to delve deeper at what format strings are, how they are operate relative to the stack, as well as how they are manipulated in the perspective of C programming language.


To be cognizance with the format string bug explained in this article, you will require to having rudimentary knowledge of the C family of programming languages, as well as a basic knowledge of IA32 assembly over window operating system, by mean of visual studio development editor. Moreover, know-how about ‘buffer overflow’ exploitation will definitely add an advantage.

Format String Bug

The format string bug was first explained in June 2000 in a renowned journal. This notorious exploitation tactics enable a hacker to subvert memory stack protections and allow altering arbitrary memory segments by unsolicited writing over there. Overall, the sole cause behind happening is not to handle or properly validated the user-supplied input. Just blindly trusting the used supplied arguments that eventually lead to disaster. Subsequently, when hacker controls arguments of the Printf function, the details in the variable argument lists enable him to analysis or overwrite arbitrary data. The format string bug is unlike buffer overrun; in which no memory stack is being damaged, as well as any data are being corrupted at large extents. Hackers often execute this attack in context of disclosing or retrieving sensitive information from the stack for instance pass keys, cryptographic privates keys etc.

Now the curiosity around here is how exactly the hackers perform this deadly attack. Consider a program where we are trying to produce some string as “kmaraj” over the screen by employing the simple C language library Printf method as;

A bit deeper than most of my post on bugs but the lesson isn’t just the bug, but that it has persisted for going on fifteen (15) years now.

As a matter of fact, Karl Chen and David Wagner in Large-Scale Analysis of Format String Vulnerabilities in Debian Linux (2007) found:

We successfully analyze 66% of C/C++ source packages in the Debian 3.1 Linux distribution. Our system finds 1,533 format string taint warnings. We estimate that 85% of these are true positives, i.e., real bugs; ignoring duplicates from libraries, about 75% are real bugs.

We suggest that the technology exists to render format string vulnerabilities extinct in the near future. (emphasis added)

“…[N]ear future?” Maybe not because Mathias Payer and Thomas R. Gross report in 2013, String Oriented Programming: When ASLR is not Enough:

One different class of bugs has not yet received adequate attention in the context of DEP, stack canaries, and ASLR: format string vulnerabilities. If an attacker controls the first parameter to a function of the printf family, the string is parsed as a format string. Using such a bug and special format markers result in arbitrary memory writes. Existing exploits use format string vulnerabilities to mount stack or heap-based code injection attacks or to set up return oriented programming. Format string vulnerabilities are not a vulnerability of the past but still pose a significant threat (e.g., CVE-2012-0809 reports a format string bug in sudo and allows local privilege escalation; CVE-2012-1152 reports multiple format string bugs in perl-YAML and allows remote exploitation, CVE-2012-2369 reports a format string bug in pidgin-otr and allows remote exploitation) and usually result in full code execution for the attacker.

Should I assume in computer literature six (6) years doesn’t qualify as the “…near future?”

Would liability for string format bugs result in greater effort to avoid the same?

Hard to say in the abstract but the results could hardly be worse than fifteen (15) years of format string bugs.

Not to mention that liability would put the burden of avoiding the bug squarely on the shoulders of those best able to avoid it.

Math for Journalists Made Easy:…

May 20th, 2015

Math for Journalists Made Easy: Understanding and Using Numbers and Statistics – Sign up now for new MOOC

From the post:

Journalists who squirm at the thought of data calculation, analysis and statistics can arm themselves with new reporting tools during the new Massive Open Online Course (MOOC) from the Knight Center for Journalism in the Americas: “Math for Journalists Made Easy: Understanding and Using Numbers and Statistics” will be taught from June 1 to 28, 2015.

Click here to sign up and to learn more about this free online course.

“Math is crucial to things we do every day. From covering budgets to covering crime, we need to understand numbers and statistics,” said course instructor Jennifer LaFleur, senior editor for data journalism for the Center for Investigative Reporting, one of the instructors of the MOOC.

Two other instructors will be teaching this MOOC: Brant Houston, a veteran investigative journalist who is a professor and the Knight Chair in Journalism at the University of Illinois; and freelance journalists Greg Ferenstein, who specializes in the use of numbers and statistics in news stories.

The three instructors will teach journalists “how to be critical about numbers, statistics and research and to avoid being improperly swayed by biased researchers.” The course will also prepare journalists to relay numbers and statistics in ways that are easy for the average reader to understand.

“It is true that many of us became journalists because sometime in our lives we wanted to escape from mathematics, but it is also true that it has never been so important for journalists to overcome any fear or intimidation to learn about numbers and statistics,” said professor Rosental Alves, founder and director of the Knight Center. “There is no way to escape from math anymore, as we are nowadays surrounded by data and we need at least some basic knowledge and tools to understand the numbers.”

The MOOC will be taught over a period of four weeks, from June 1 to 28. Each week focuses on a particular topic taught by a different instructor. The lessons feature video lectures and are accompanied by readings, quizzes and discussion forums.

This looks excellent.

I will be looking forward to very tough questions of government and corporate statistical reports from anyone who takes this course.

A Call for Collaboration: Data Mining in Cross-Border Investigations

May 20th, 2015

A Call for Collaboration: Data Mining in Cross-Border Investigations by Jonathan Stray and Drew Sullivan.

From the post:

Over the past few years we have seen the huge potential of data and document mining in investigative journalism. Tech savvy networks of journalists such as the Organized Crime and Corruption Reporting Project (OCCRP) and the International Consortium of Investigative Journalists (ICIJ) have teamed together for astounding cross-border investigations, such as OCCRP’s work on money laundering or ICIJ’s offshore leak projects. OCCRP has even incubated its own tools, such as VIS, Investigative Dashboard and Overview.

But we need to do better. There is enormous duplication and missed opportunity in investigative journalism software. Many small grants for technology development have led to many new tools, but very few have become widely used. For example, there are now over 70 tools just for social network analysis. There are other tools for other types of analysis, document handling, data cleaning, and on and on. Most of these are open source, and in various states of completeness, usability, and adoption. Developer teams lack critical capacities such as usability testing, agile processes, and business development for sustainability. Many of these tools are beautiful solutions in search of a problem.

The fragmentation of software development for investigative journalism has consequences: Most newsrooms still lack capacity for very basic knowledge management tasks, such as digitally filing new documents where they can be searched and found later. Tools do not work or do not inter-operate. Ultimately the reporting work is slower, or more expensive, or doesn’t get done. Meanwhile, the commercial software world has so far ignored investigative journalism because it is a small, specialized user-base. Tools like Nuix and Palantir are expensive, not networked, and not extensible for the inevitable story-specific needs.

But investigative journalists have learned how to work in cross-border networks, and investigative journalism developers can too. The experience gained from collaborative data-driven journalism has led OCCRP and other interested organizations to focus on the following issues:

The issues:

  • Usability
  • Delivery
  • Networked Investigation
  • Sustainability
  • Interoperability and extensibility

The next step is reported to be:

The next step for us is a small meeting: the very first conference on Knowledge Management in Investigative Journalism. This event will bring together key developers and journalists to refine the problem definition and plan a way forward. OCCRP and the Influence Mappers project have already pledged support. Stay tuned…

Jonathan Stray jonathanstray@gmail.comand and Drew Sullivan, want to know if you are interested too?

See the original post, email Jonathan and Drew if you are interested. It sounds like a very good idea to me.

PS: You already know one of the technologies that I think is important for knowledge management: topic maps!

H2O 3.0

May 20th, 2015

H20 3.0

From the webpage:

Why H2O?

H2O is for data scientists and business analysts who need scalable and fast machine learning. H2O is an open source predictive analytics platform. Unlike traditional analytics tools, H2O provides a combination of extraordinary math and high performance parallel processing with unrivaled ease of use. H2O speaks the language of data science with support for R, Python, Scala, Java and a robust REST API. Smart business applications are powered by H2O’s NanoFastTM Scoring Engine.

Get H2O!

What is H2O?

H2O makes it possible for anyone to easily apply math and predictive analytics to solve today’s most challenging business problems. It intelligently combines unique features not currently found in other machine learning platforms including:

  • Best of Breed Open Source Technology – Enjoy the freedom that comes with big data science powered by OpenSource technology. H2O leverages the most popular OpenSource products like ApacheTM Hadoop® and SparkTM to give customers the flexibility to solve their most challenging data problems.
  • Easy-to-use WebUI and Familiar Interfaces – Set up and get started quickly using either H2O’s intuitive Web-based user interface or familiar programming environ- ments like R, Java, Scala, Python, JSON, and through our powerful APIs.
  • Data Agnostic Support for all Common Database and File Types – Easily explore and model big data from within Microsoft Excel, R Studio, Tableau and more. Connect to data from HDFS, S3, SQL and NoSQL data sources. Install and deploy anywhere
  • Massively Scalable Big Data Analysis – Train a model on complete data sets, not just small samples, and iterate and develop models in real-time with H2O’s rapid in-memory distributed parallel processing.
  • Real-time Data Scoring – Use the Nanofast Scoring Engine to score data against models for accurate predictions in just nanoseconds in any environment. Enjoy 10X faster scoring and predictions than the next nearest technology in the market.

Note the caveat near the bottom of the page:

With H2O, you can:

  • Make better predictions. Harness sophisticated, ready-to-use algorithms and the processing power you need to analyze bigger data sets, more models, and more variables.
  • Get started with minimal effort and investment. H2O is an extensible open source platform that offers the most pragmatic way to put big data to work for your business. With H2O, you can work with your existing languages and tools. Further, you can extend the platform seamlessly into your Hadoop environments.

The operative word being “can.” Your results with H2O depend upon your knowledge of machine learning, knowledge of your data and the effort you put into using H2O, among other things.

Bin Laden’s Bookshelf

May 20th, 2015

Bin Laden’s Bookshelf

From the webpage:

On May 20, 2015, the ODNI released a sizeable tranche of documents recovered during the raid on the compound used to hide Usama bin Ladin. The release, which followed a rigorous interagency review, aligns with the President’s call for increased transparency–consistent with national security prerogatives–and the 2014 Intelligence Authorization Act, which required the ODNI to conduct a review of the documents for release.

The release contains two sections. The first is a list of non-classified, English-language material found in and around the compound. The second is a selection of now-declassified documents.

The Intelligence Community will be reviewing hundreds more documents in the near future for possible declassification and release. An interagency taskforce under the auspices of the White House and with the agreement of the DNI is reviewing all documents which supported disseminated intelligence cables, as well as other relevant material found around the compound. All documents whose publication will not hurt ongoing operations against al-Qa‘ida or their affiliates will be released.

From the website:


The one expected work missing from Bin Laden’s library?

The Anarchist Cookbook!

Possession of the same books as Bin Laden will be taken as a sign terrorist sympathies. Weed your collection responsibly.

Political Futures Tracker

May 20th, 2015

Political Futures Tracker.

From the webpage:

The Political Futures Tracker tells us the top political themes, how positive or negative people feel about them, and how far parties and politicians are looking to the future.

This software will use ground breaking language analysis methods to examine data from Twitter, party websites and speeches. We will also be conducting live analysis on the TV debates running over the next month, seeing how the public respond to what politicians are saying in real time. Leading up to the 2015 UK General Election we will be looking across the political spectrum for emerging trends and innovation insights.

If that sounds interesting, consider the following from: Introducing… the Political Futures Tracker:

We are exploring new ways to analyse a large amount of data from various sources. It is expected that both the amount of data and the speed that it is produced will increase dramatically the closer we get to election date. Using a semi-automatic approach, text analytics technology will sift through content and extract the relevant information. This will then be examined and analysed by the team at Nesta to enable delivery of key insights into hotly debated issues and the polarisation of political opinion around them.

The team at the University of Sheffield has extensive experience in the area of social media analytics and Natural Language Processing (NLP). Technical implementation has started already, firstly with data collection which includes following the Twitter accounts of existing MPs and political parties. Once party candidate lists become available, data harvesting will be expanded accordingly.

In parallel, we are customising the University of Sheffield’s General Architecture for Text Engineering (GATE); an open source text analytics tool, in order to identify sentiment-bearing and future thinking tweets, as well as key target topics within these.

One thing we’re particularly interested in is future thinking. We describe this as making statements concerning events or issues in the future. Given these measures and the views expressed by a certain person, we can model how forward thinking that person is in general, and on particular issues, also comparing this with other people. Sentiment, topics, and opinions will then be aggregated and tracked over time.

Personally I suspect that “future thinking” is used in difference senses by the general population and political candidates. For a political candidate, however the rhetoric is worded, the “future” consists of reaching election day with 50% plus 1 vote. For the general population, the “future” probably includes a longer time span.

I mention this in case you can sell someone on the notion that what political candidates say today has some relevance to what they will do after election. President Obmana has been in office for six (6) years on office, the Guantanamo Bay detention camp remains open, no one has been held accountable for years of illegal spying on U.S. citizens, banks and other corporate interests have all but been granted keys to the U.S. Treasury, to name a few items inconsistent with his previous “future thinking.”

Unless you accept my suggestion that “future thinking” for a politician means election day and no further.

Analysis of named entity recognition and linking for tweets

May 20th, 2015

Analysis of named entity recognition and linking for tweets by Leon Derczynski, et al.


Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.

The questions addressed by the paper are:

RQ1 How robust are state-of-the-art named entity recognition and linking methods on short and noisy microblog texts?

RQ2 What problem areas are there in recognising named entities in microblog posts, and what are the major causes of false negatives and false positives?

RQ3 Which problems need to be solved in order to further the state-of-the-art in NER and NEL on this difficult text genre?

The ultimate conclusion is that entity recognition in microblog posts falls short of what has been achieved for newswire text but if you need results now or at least by tomorrow, this is a good guide to what is possible and where improvements can be made.

Detecting Deception Strategies [Godsend for the 2016 Election Cycle]

May 20th, 2015

Discriminative Models for Predicting Deception Strategies by Scott Appling, Erica Briscoe, C.J. Hutto.


Although a large body of work has previously investigated various cues predicting deceptive communications, especially as demonstrated through written and spoken language (e.g., [30]), little has been done to explore predicting kinds of deception. We present novel work to evaluate the use of textual cues to discriminate between deception strategies (such as exaggeration or falsifi cation), concentrating on intentionally untruthful statements meant to persuade in a social media context. We conduct human subjects experimentation wherein subjects were engaged in a conversational task and then asked to label the kind(s) of deception they employed for each deceptive statement made. We then develop discriminative models to understand the difficulty between choosing between one and several strategies. We evaluate the models using precision and recall for strategy prediction among 4 deception strategies based on the most relevant psycholinguistic, structural, and data-driven cues. Our single strategy model results demonstrate as much as a 58% increase over baseline (random chance) accuracy and we also find that it is more difficult to predict certain kinds of deception than others.

The deception strategies studied in this paper:

  • Falsification
  • Exaggeration
  • Omission
  • Misleading

especially omission, will form the bulk of the content in the 2016 election cycle in the United States. Only deceptive statements were included in the test data, so the models were tested on correctly recognizing the deception strategy in a known deceptive statement.

The test data is remarkably similar to political content, which aside from their names and names of their opponents (mostly), is composed entirely of deceptive statements, albeit not marked for the strategy used in each one.

A web interface for loading pointers to video, audio or text with political content that emits tagged deception with pointers to additional information would be a real hit for the next U.S. election cycle. Monetize with ads, the sources of additional information, etc.

I first saw this in a tweet by Leon Derczynski.

New Computer Bug Exposes Broad Security Flaws [Trust but Verify]

May 20th, 2015

New Computer Bug Exposes Broad Security Flaws by Jennifer Valentino-Devries.

From the post:

A dilemma this spring for engineers at big tech companies, including Google Inc., Apple Inc. and Microsoft Corp., shows the difficulty of protecting Internet users from hackers.

Internet-security experts crafted a fix for a previously undisclosed bug in security tools used by all modern Web browsers. But deploying the fix could break the Internet for thousands of websites.

“It’s a twitchy business, and we try to be careful,” said Richard Barnes, who worked on the problem as the security lead for Mozilla Corp., maker of the Firefox Web browser. “The question is: How do you come up with a solution that gets as much security as you can without causing a lot of disruption to the Internet?”

Engineers at browser makers traded messages for two months, ultimately choosing a fix that could make more than 20,000 websites unreachable. All of the browser makers have released updates including the fix or will soon, company representatives said.
No links or pointers to further resources.

The name of this new bug is “Logjam.”

I saw Jennifer’s story on Monday evening, about 19:45 ESDT and tried to verify the story with some of the standard bug reporting services.

No “hits” at CERT, IBM’s X-Force, or the Internet Storm Center as of 20:46 ESDT on May 19, 2015.

The problem being that Jennifer did not include any links to any source that would verify the existence of this new bug. Not one.

The only story that kept popping up in searches was Jennifer’s.

So, I put this post to one side, returning to it this morning.

As of this morning, now about 6:55 ESDT, the Internet Storm Center returns:

Logjam – vulnerabilities in Diffie-Hellman key exchange affect browsers and servers using TLS by Brad Duncan, ISC Handler and Security Researcher at Rackspace, with a pointer to: The Logjam Attack, which reads in part:

We have published a technical report, Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice, which has specifics on these attacks, details on how we broke the most common 512-bit Diffie-Hellman Group, and measurements of who is affected. We have also published several proof of concept demos and a Guide to Deploying Diffie-Hellman for TLS

This study was performed by computer scientists at Inria Nancy-Grand Est, Inria Paris-Rocquencourt, Microsoft Research, Johns Hopkins University, University of Michigan, and the University of Pennsylvania: David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J. Alex Halderman, Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, Benjamin VanderSloot, Eric Wustrow, Santiago Zanella-Beguelin, and Paul Zimmermann. The team can be contacted at

As of 7:06 ESDT on May 20, 2015, neither CERT nor IBM’s X-Force returns any “hits” on “Logjam.”

It is one thing to “trust” a report of a bug, but please verify before replicating a story based upon insider gossip. Links to third party materials for example.

Fighting Cybercrime at IBM

May 19th, 2015

15/05/15 – More than 1,000 Organizations Join IBM to Battle Cybercrime

From the post:

ARMONK, NY – 14 May 2015: IBM (NYSE: IBM) today announced that more than 1,000 organizations across 16 industries are participating in its X-Force Exchange threat intelligence network, just one month after its launch. IBM X-Force Exchange provides open access to historical and real-time data feeds of threat intelligence, including reports of live attacks from IBM’s global threat monitoring network, enabling enterprises to defend against cybercrime.

IBM’s new cloud-based cyberthreat network, powered by IBM Cloud, is designed to foster broader industry collaboration by sharing actionable data to defend against these very real threats to businesses and governments. The company provided free access last month, via the X-Force Exchange, to its 700 terabyte threat database – a volume equivalent to all data that flows across the internet in two days. This includes two decades of malicious cyberattack data from IBM, as well as anonymous threat data from the thousands of organizations for which IBM manages security operations. Participants have created more than 300 new collections of threat data in the last month alone.

“Cybercrime has become the equivalent of a pandemic — no company or country can battle it alone,” said Brendan Hannigan, General Manager, IBM Security. ““We have to take a collective and collaborative approach across the public and private sectors to defend against cybercrime. Sharing and innovating around threat data is central to battling highly organized cybercriminals; the industry can no longer afford to keep this critical resource locked up in proprietary databases. With X-Force Exchange, IBM has opened access to our extensive threat data to advance collaboration and help public and private enterprises safeguard themselves.”

Think about the numbers for a moment, 1,000 organizations and 300 new collections of threat data in a month. Not bad by anyone’s yardstick.

As I titled my first post on the X-Force Exchange: Being Thankful IBM is IBM.

Civil War Navies Bookworm

May 19th, 2015

Civil War Navies Bookworm by Abby Mullen.

From the post:

If you read my last post, you know that this semester I engaged in building a Bookworm using a government document collection. My professor challenged me to try my system for parsing the documents on a different, larger collection of government documents. The collection I chose to work with is the Official Records of the Union and Confederate Navies. My Barbary Bookworm took me all semester to build; this Civil War navies Bookworm took me less than a day. I learned things from making the first one!

This collection is significantly larger than the Barbary Wars collection—26 volumes, as opposed to 6. It encompasses roughly the same time span, but 13 times as many words. Though it is still technically feasible to read through all 26 volumes, this collection is perhaps a better candidate for distant reading than my first corpus.

The document collection is broken into geographical sections, the Atlantic Squadron, the West Gulf Blockading Squadron, and so on. Using the Bookworm allows us to look at the words in these documents sequentially by date instead of having to go back and forth between different volumes to get a sense of what was going on in the whole navy at any given time.

Before you ask:

The earlier post: Text Analysis on the Documents of the Barbary Wars

More details on Bookworm.

As with all ngram viewers, exercise caution in assuming a text string has uniform semantics across historical, ethnic, or cultural fault lines.

Mile High Club

May 19th, 2015

Mile High Club by Chris Rouland.

From the post:

A very elite club was just created by Chris Roberts, if his allegations of commandeering an airplane are true. Modern day transportation relies heavily on remote access to the outside world…and consumer trust. These two things have been at odds recently, ever since the world read a tweet from Chris Roberts, in which he jokingly suggested releasing oxygen masks while aboard a commercial flight. Whether or not Roberts was actually joking about hacking the aircraft is up for debate, but the move led the Government Accountability Office to issue a warning about potential vulnerabilities to aircraft systems via in-flight Wi-Fi.

Chris has a great suggestion:

While I agree that we don’t want every 16-year-old script kiddie trying to tamper with people’s lives at 35,000 feet, we do wonder if United or any of the other major carriers would be willing to park a plane at Black Hat. Surely if they were certain that there is no way to exploit the pilot’s aviation systems, they would be willing to allow expert researchers to have a look while the plane is on the ground? Tremendous insight and overall global information security could only improve if a major carrier or manufacturer hosted a hack week on a Dreamliner on the tarmac at McCarran international.

A couple of candidate Black Hat conferences:

BLACK HAT | USA August 1-6, 2015 | Mandalay Bay | Las Vegas, NV

BLACK HAT | EUROPE November 10-13, 2015 | Amsterdam RAI | The Netherlands

Do you think the conference organizers would comp registration for the people who come with the plane?

As far as airlines, The top ten (10) in income (US) for 2014:

  1. Delta
  2. United
  3. American
  4. Southwest
  5. US Airways
  6. JetBlue
  7. Alaska
  8. Hawaiian
  9. Spirit
  10. SkyWest

When you register for a major Black Hat conference, ask the organizers to stage an airline hacking event. Especially on:

Big Black Hat Conference signs with the make/model on them for the tarmac.

Would make an interesting press event to have a photo of the conference sign with no plane.

Sorta speaks for itself. Yes?

The Back-to-Basics Readings of 2012

May 19th, 2015

The Back-to-Basics Readings of 2012 by Werner Vogels (CTO –

From the post:

After the AWS re: Invent conference I spent two weeks in Europe for the last customer visits of the year. I have since returned and am now in New York City enjoying a few days of winding down the last activities of the year before spending the holidays here with family. Do not expect too many blog posts or twitter updates. Although there are still a few very exciting AWS news updates to happen this year.

I thought this was a good moment to collect all the readings I suggested this year in one summary post. It was not until later in the year that I started to recording the readings here on the blog, so I hope this is indeed the complete list. I am pretty sure some if not all of these papers deserved to be elected to the hall of fame of best papers in distributed systems.

My count is twenty-four (24) papers. More than enough for a weekend at the beach! ;-)

I first saw this in a tweet by Computer Science.

NY police commissioner wants 450 more cops to hunt Unicorns

May 19th, 2015

NY police commissioner wants 450 more cops to fight against jihadis by Robert Spencer.

OK, not exactly what Police Commissioner Bratton said but it may as well be.

Bratton is quoted in the post as saying:

As the fear over the threat of terrorism continues to swell around the world, New York City becomes increasingly on edge that it’s time to take extra security precautions.

Although we have not experienced the caliber such as the attacks in Paris, we have a history of being a major target and ISIS has already begun to infiltrate young minds through the use of video games and social media.

Since the New Year there have been numerous arrests in Brooklyn and Queens for people attempting to assist ISIS from afar, building homemade bombs and laying out plans of attack.

This is called no-evidence police policy.

Particularly when you examine the “facts” behind:

…numerous arrests in Brooklyn and Queens for people attempting to assist ISIS from afar, building homemade bombs and laying out plans of attack.

Queens? Oh, yes, the two women from Queens who an FBI informant befriended by finding a copy of the Anarchist Cookbook online and printing it out for them. Not to mention taking one of them shopping for bomb components. The alleged terrorists were going to educate themselves on bomb making. More a threat to themselves than anyone else. e451 and Senator Dianne Feinstein The focus of that post is on the Anarchist Cookbook but you can get the drift of how silly the compliant was in fact.

As far as Brooklyn, you can read the complaint for yourself but the gist of it was one of the three defendants could not travel because his mother would not give him his passport. Serious terrorist people we are dealing with here. The other two were terrorist wannabe’s who long on boosting skills but there isn’t a shortage of boosters in ISIS. Had they been able to connect with ISIS by some happenstance, it would have degraded the operational capabilities of ISIS, not assisted it.

A recent estimate for the Muslim population of New York puts the total number of Muslims at 600,000. 175 Mosques in New York City and counting. Muslims in New York City, Part II [2010]

The FBI was able to assist and prompt five (5) people out of an estimated 600,000 into making boosts about assisting ISIS and/or traveling to join ISIS.

I won’t even both doing the math. Anyone who is looking for facts will know that five (5) arrests in a city of 12 million people, doesn’t qualify as “numerous.” Especially when those arrests were for thought crimes and amateurish boosting more than any attempt at an actual crime.

Support more cops to hunt Unicorns. Unicorn hunting doesn’t require military tactics or weapons, thereby making the civilian population safer.

Fast parallel computing with Intel Phi coprocessors

May 19th, 2015

Fast parallel computing with Intel Phi coprocessors by Andrew Ekstrom.

Andrew tells a tale of going from more than a week processing a 10,000×10,000 matrix raised to 10^17 to 6-8 hours and then substantially shorter times. Sigh, using Windows but still an impressive feat! As you might expect, using Revolution Analytics RRO, Intel’s Math Kernel Library (MKL), Intel Phi coprocessors, etc.

There’s enough detail (I suspect) for you to duplicate this feat on your own Windows box, or perhaps more easily on Linux.


I first saw this in a tweet by David Smith.

The Applications of Probability to Cryptography

May 19th, 2015

The Applications of Probability to Cryptography by Alan M. Turing.

From the copyright page:

The underlying manuscript is held by the National Archives in the UK and can be accessed at using reference number HW 25/37. Readers are encouraged to obtain a copy.

The original work was under Crown copyright, which has now expired, and the work is now in the public domain.

You can go directly to the record page:

To get a useful image, you need to add the item to your basket for £3.30.

The manuscript is a mixture of typed text with inserted mathematical expressions added by hand (along with other notes and corrections). This is a typeset version that attempts to capture the original manuscript.

Another recently declassified Turning paper (typeset): The Statistics of Repetition.

Important reads. Turing would appreciate the need to exclude government from our day to day lives.

Apache Cassandra 2.2.0-beta1 released

May 19th, 2015

Apache Cassandra 2.2.0-beta1 released

From the post:

The Cassandra team is pleased to announce the release of Apache Cassandra version 2.2.0-beta1.

This release is *not* production ready. We are looking for testing of existing and new features. If you encounter any problem please let us know [1].

Cassandra 2.2 features major enhancements such as:

* Resume-able Bootstrapping
* JSON Support [4]
* User Defined Functions [5]
* Server-side Aggregation [6]
* Role based access control

Read [2] and [3] to learn about all the new features.

Downloads of source and binary distributions are listed in our download section:


-The Cassandra Team

[2]: (NEWS.txt)
[3]: (CHANGES.txt)

I was wondering what I would be reading this week! ;-)


You Are Safer Than You Think

May 19th, 2015

Despite the budget sustaining efforts of former associate director of the FBI, Thomas Fuentes, Why Does the FBI Have To Manufacture Its Own Plots If Terrorism And ISIS Are Such Grave Threats? by Glenn Greenwald, you are really safer than you think.

Not because security systems are that great, but from a lack of people who want to do you harm.

Think about airport security for a minute. There are the child fondling TSA agents and the ones who like to paw through used underwear but how serious do you think the protection is by those methods? What about all those highly trained and well-paid baggage handlers and other folks at the airport?

The answer to that question can be found in: Airport Baggage Handlers Charged in Wide-Ranging Conspiracy to Transport Drugs Across the Country. Certainly would not want airport security to interfere with the flow of illegal drugs across the country.

From the post:

Fourteen persons have been charged in connection with an alleged wide-ranging criminal conspiracy to violate airport security requirements and transport drugs throughout the country announced U.S. Attorney Melinda Haag of the Northern District of California, Special Agent in Charge José M. Martinez of the Internal Revenue Service-Criminal Investigation’s (IRS-CI) for the Northern District of California and Special Agent in Charge David J. Johnson Federal Bureau of Investigation (FBI). The case highlights the government’s determination to address security concerns in and around the nation’s airports.

In a criminal complaint partially unsealed today, the co-conspirators were described as a drug trafficking organization determined to use the special access some of them had been granted as baggage handlers at the Oakland International Airport to circumvent the security measures in place at the airport. As alleged in the complaint, the baggage handlers entered the Air Operations Area (AOA) of the Oakland Airport while in possession of baggage containing marijuana. The AOA is an area of the airport that is accessible to employees but not to passengers who have completed security screening through a Transportation Security Administration (TSA) checkpoint. The baggage handlers were not required to pass through a TSA security screening checkpoint to enter the AOA. The baggage handlers then used their security badges to open a secure door that separates the AOA from the sterile passenger terminal where outbound passengers, who have already passed through the TSA security and screening checkpoint, wait to board their flights. The baggage handlers then gave the baggage containing drugs to passengers who then transported the drugs in carry-on luggage on their outbound flights. After arriving in a destination city, the drugs were distributed and sold.

I take the line:

The case highlights the government’s determination to address security concerns in and around the nation’s airports.

to mean the government admits that drugs, weapons or bombs are easy to get on board aircraft in the United States.

That should not worry you because since 9/11, despite airline security being as solid as a bucket without a bottom, no one has blown a plane out of the sky, no one has taken over an airplane with a weapon.

That’s 14 years and more than 10.3 billion passengers later, not one bomb, not one take over by weapon, despite laughable airport “security.”

The only conclusion I can draw from the data, is the near total lack of anyone in the United States who wishes to bomb an airplane or take it over with weapons.

I say “near total lack” because there could be someone, but is is less than one person per 10.3 billion. Those are fairly good odds.


May 18th, 2015


From the “about” page:

The FreeSearch project is a search system on top of DBLP data provided by Michael Ley. FreeSearch is a joint project of the L3S Research Center and iSearch IT Solutions GmbH.

In this project we develop new methods for simple literature search that works on any catalogs, without requiring in-depth knowledge of the metadata schema. The system helps users proactively and unobtrusively by guessing at each step what the user’s real information need is and providing precise suggestions.

A more detailed description of the system can be found in this publication: FreeSearch – Literature Search in a Natural Way.

You can choose to search across:

DBLP (4,552,889 documents)

TIBKat (2,079,012 documents)

CiteSeer (1,910,493 documents)

BibSonomy (448,166 documents)


Cell Stores

May 18th, 2015

Cell Stores by Ghislain Fourny.


Cell stores provide a relational-like, tabular level of abstraction to business users while leveraging recent database technologies, such as key-value stores and document stores. This allows to scale up and out the efficient storage and retrieval of highly dimensional data. Cells are the primary citizens and exist in different forms, which can be explained with an analogy to the state of matter: as a gas for efficient storage, as a solid for efficient retrieval, and as a liquid for efficient interaction with the business users. Cell stores were abstracted from, and are compatible with the XBRL standard for importing and exporting data. The first cell store repository contains roughly 200GB of SEC filings data, and proves that retrieving data cubes can be performed in real time (the threshold acceptable by a human user being at most a few seconds).


Demonstration with 200 GB of SEC data.

Tutorial: An Introduction To The Cell Store REST API.

From the tutorial:

Cell stores are a new paradigm of databases. It is decoupled from XBRL and has a data model of its own, yet it natively support XBRL as a file format to exchange data between cell stores.

Traditional relational databases are focused on tables. Document stores are focused on trees. Triple stores are focused on graphs. Well, cell stores are focused on cells. Cells are units of data and also called facts, measures, etc. Think of taking an Excel spreadsheet and a pair of scissors, and of splitting the sheet into its cells. Put these cells in a bag. Pour some more cells that come from other spreadsheets. Many. Millions of cells. Billions of cells. Trillions of cells. You have a cell store.

Why is it so important to store all these cell in a single, big bag? That’s because the main use case for cell stores is the ability to query data across filings. Cell stores are very good at this. They were designed from day one to do this.

Cell stores are very good at reconstructing tables in the presence of highly dimensional data. The idea behind this is based on hypercubes and is called NoLAP (NoSQL Online Analytical Processing). NoLAP extends the OLAP paradigm by removing hypercube rigidity and letting users generate their own hypercubes on the fly on the same pool of cells.

For business users, all of this is completely transparent and hidden. The look and feel of a cell store, in the end, is that of a spreadsheet like Excel. If you are familiar with the pivot table functionality of Excel, cell stores will be straightforward to understand. Also the underlying XBRL is hidden.

XBRL is to cell store what the inside format of .xsls files are to Excel. How many of us have tried to unzip and open an Excel file with a text editor for any other reason than mere curiosity? The same goes for cell stores.

Forget about the complexity of XBRL. Get things done with your data.

The promise of a better user interface alone should be enough to attract attention to this proposal. Yet, so far as I can find, there hasn’t been a lot of use/discussion of it.

I do wonder about this statement in the paper:

When many people define their own taxonomy, this often ends up in redundant terminology. For example, someone might use the term Equity and somebody else Capital. When either querying cells with a hypercube, or loading cells into a spreadsheet, a mapping can be applied so that this redundant terminology is transparent. This way, when a user asks for Equity, (i) she will also get the cells having the concept Capital, (ii) and it will be transparent to her because the Capital concept is overridden with the expected value Equity.

In part because it omits the obvious case of conflicting terminology, that is we both want to use “German” as a term and I mean the language and you mean nationality. In one well known graph database the answer depends on which one of us gets there first. Poor form in my opinion.

Mapping can handle different terms for the same subject but how do we maintain that? Where do I look to discover the reason(s) underlying the mapping? Moreover, in the conflicting case, how do I distinguish otherwise opaque terms that are letter for letter identical?

There may be answers as I delve deeper into the documentation but those are some topic map issues that stood out for me on a first read.


A Virtual Database between MongoDB, ElasticSearch, and MarkLogic

May 18th, 2015

A Virtual Database between MongoDB, ElasticSearch, and MarkLogic by William Candillon.

From the post:

Virtual Databases enable developers to write applications regardless of the underlying database technologies. We recently updated a database infrastructure from MongoDB and ElasticSearch to MarkLogic without touching the codebase.

We just flipped a switch. We updated the database infrastructure of an application (20k LOC) from MongoDB and Elasticsearch to MarkLogic without changing a single line of code.

Earlier this year, we published a tutorial that shows how the 28msec query technology can enable developers to write applications regardless of the underlying database technology. Recently, we had the opportunity to put it to the test on both a real world use case and a substantial codebase.

At 28msec, we have designed1 and implemented2 an open source modern data warehouse called CellStore. Whereas traditional data warehousing solutions can only support hundreds of fixed dimensions and thus need to ETL the data to analyze, cell stores support an unbounded number of dimensions. Our implementation of the cell store paradigm is around 20k lines of JSONiq queries. Originally the implementation was running on top of MongoDB and Elasticsearch.


Impressive work and it merits a separate post on the underlying technology, CellStore.

Summer DIY: Combination Lock Cracker

May 18th, 2015

Former virus writer open-sources his DIY combination lock-picking robot by Paul Ducklin.

Amusing account of Samy Kamkar and his hacking history up to and including:

…an open-source 3D-printed robot that can crack a combination lock in just 30 seconds by twiddling the dial all by itself.

Paul includes some insights into opening combination locks.

Good opportunity to learn about 3D printing and fundamentals of combination locks.

Advanced: Safe Cracker

If that seems too simple, try safe locks with the 3D-printed robot (adjust for the size/torque required to turn the dial). The robot will turn the dial more consistently than any human hand. Use very sensitive vibration detectors to pick up the mechanical movement of the lock, capture that vibration as a digital file, from knowledge of the lock, you know the turns, directions, etc.

Then use deep learning over several passes on the lock to discover the opening sequence. Need a stand for the robot to isolate its vibrations from the safe housing and for it to reach the combination dial.

Or you can call a locksmith and pay big bugs to open a safe.

The DIY way has you learning some mechanics, a little physics and deep learning.

If you are up for a real challenge, consider the X-09™ Locks (NSN #5340-01-498-2758), which is certified to meet FF-L-2740A, the “the US Government’s highest security standard for container locks and doors.”

The factory default combination is 50-25-50, so try that first. ;-)

Pump Up The Noise! Real Time Video

May 18th, 2015

Why Meerkat and Periscope Are the Biggest Things Since, Well, Twitter by Ryan Holmes.

From the post:

Finally, there are the global and political implications. If every single person on earth with a phone is able to broadcast anything in real time, we’re going to see a democratization of sharing information in ways we’ve never seen before. Take for example the crucial role that Twitter played in the Egyptian revolution of 2011. In many cases, social media became a new type of lifeline for people on the ground to share accounts of what was happening with the world. Now, imagine a similar world event in which live updates from citizens are in real-time video. These types of updates will transport viewers to events and places in ways we have never seen before.

Live video streaming is valuable for some use cases but the thought of “…every single person on earth is able to broadcast anything in real time…” fills me with despair.

Seriously. Think about the bandwidth you lose from your real time circumstances to watch a partial view of someone else’s real time circumstance.

Every displaced person in every conflict around the world could broadcast a live feed of their plight, but how many of those can you fit into a day? (Assume you aren’t being tube fed and have some real time interaction in your own environment.)

Live video is imagining of a social context, a context that isn’t possible to display as part of a real time video. Every real time video feed has such a context, which require even more effort to acquire separate from the video feed.

As an example, take the “…the crucial role that Twitter played…” claim from the quote. Really? According to some accounts, The myth of the ‘social media revolution’, It’s Time to Debunk the Many Myths of the Egyptian Revolution, work on the issues and organization that resulted in the Arab Spring had been building for a decade, something the Twitter-centric pass over in silence.

Moreover, as of September 2011, Egypt had only 129,711 Twitter users, so as of the Arab Spring, it was even lower. Not to mention that the poor who provided the backbone of the revolution did not have Western style phones with Twitter accounts.

A tweeted revolution is one viewed through a 140 character lens with no social context.

Now imagine real time imagery of “riots by hooligans” or “revolts by the oppressed” or “historical reenactments.” Despite it high bandwidth, real time video can’t reliably provide you with the context necessary to distinguish any of those cases from the others. No doubt real time video can advocate for one case or the other, but that isn’t the same as giving you the facts necessary to reach your own conclusions.

Real time video is a market opportunity for editorial/summary services that mine live video and provide a synopsis of its content. Five thousand live video accounts about displaced persons suffering from cold temperatures and lack of food isn’t actionable. Knowing what is required and where to deliver it is.

Hijacking Planes and the Forgotten Network on ≤ 5,577 Planes

May 17th, 2015

The recent flare of discussion about hijacking airlines armed only with a laptop was due in part to: AIR TRAFFIC CONTROL: FAA Needs a More Comprehensive Approach to Address Cybersecurity As Agency Transitions to NextGen GAO-15-370: Published: Apr 14, 2015. Publicly Released: Apr 14, 2015.

The executive summary reads in part:

Modern aircraft are increasingly connected to the Internet. This interconnectedness can potentially provide unauthorized remote access to aircraft avionics systems. As part of the aircraft certification process, FAA’s Office of Safety (AVS) currently certifies new interconnected systems through rules for specific aircraft and has started reviewing rules for certifying the cybersecurity of all new aircraft systems.

Security expert Bruce Schneier comments on this report, saying in part:

The report doesn’t explain how someone could do this, and there are currently no known vulnerabilities that a hacker could exploit. But all systems are vulnerable–we simply don’t have the engineering expertise to design and build perfectly secure computers and networks–so of course we believe this kind of attack is theoretically possible. (emphasis added)

Bruce may be right about wireless networks, but what about someone plugging directly into an existing network on a:

(In service statistics from:

An FBI search warrant obtained on April 17, 2015 reads in part:

18. A Special Agent with the FBI interviewed Chris Roberts on February 13, 2015 and March 5, 2015 to obtain information about vulnerabilities with In Flight Entertainment (IFE) systems on airplanes. Chris Roberts advised that he had identified vulnerabilities with IFE systems on Boeing 737-800, 737-900, 757-200 and Airbus A-320 aircraft. Chris Roberts furnished the information because he would like the vulnerabilities to be fixed.

19. During these conversations, Mr. Roberts stated the following:

A. That he had exploited vulnerabilities with IFE systems on aircraft while in flight. He compromised the IFE systems approximately 15 to 20 times during the time period 2011 through 2014. He last exploited an IFE system during the middle of 2014. Each of the compromises occurred on airplanes equipped with IFE systems with monitors installed in the passenger seatbacks.

B. That the IFE systems he compromised were Thales and Panasonic systems. The IFE systems had video monitors installed in the passenger seatbacks.

C. That he was able to exploit/gain acccess to, or “hack” the IFE system after he would get physical access to the IFE system through the Seat Electronic Box (SEB) installed under the passenger seat on airplanes. He said he was able to remove the cover for the SEB under the seat in front of him by wiggling and squeezing the box.

D. After removing the cover to the SEB that was installed under the passenger seat in front of his seat, he would use a Cat6 ethernet cable with a modified connector to connect his laptop computer to the IFE system while in flight.

E. He then connected to other systems on the airplane network after he exploited/gained access to, or “hacked” the IFE system. He stated that he then overwrote code on the airplane’s Thrust Management Computer while aboard a flight. He stated that he successfully commanded the system he accessed to issue the “CLB” or climb command. He stated that he thereby caused one of the airplane engines to climb resulting in a lateral or sideways movement of the plane during one of these flights. He also stated that he used Vortex software after compromising/exploiting or “hacking” the airplane’s network. He used the software to monitor traffic from the cockpit system.

F. Roberts said he use Kali Linux to perform penetration testing of the IFE system. He used the default IDs and passwords to compromise the IFE systems. He said that he used VBox which is a virtualized environment to build his own version of the airplane network. The virtual environment would replicate airplane network, and he used virtual machine’s on his laptop while compromising the airplane network.
… (emphasis added)

The FBI search warrant wasn’t based on hacking wireless networks, but an old fashioned hardwire connection to the network.

Assuming Roberts wasn’t trying to impress the FBI agents (never a good idea), there are approximately 5,577 planes that may be susceptible to hardwire hacking into the avionics system. (Models change over production and maintenance so the susceptibility of any particular airplane is a question of physical examination.)

If I were still flying, I would be voting with my feet on airline safety from hardwire hacking.

PS: I first saw the search warrant in: Feds Say That Banned Researcher Commandeered a Plane. by Kim Zetter.

The tensor renaissance in data science

May 16th, 2015

The tensor renaissance in data science by Ben Lorica.

From the post:

After sitting in on UC Irvine Professor Anima Anandkumar’s Strata + Hadoop World 2015 in San Jose presentation, I wrote a post urging the data community to build tensor decomposition libraries for data science. The feedback I’ve gotten from readers has been extremely positive. During the latest episode of the O’Reilly Data Show Podcast, I sat down with Anandkumar to talk about tensor decomposition, machine learning, and the data science program at UC Irvine.

Modeling higher-order relationships

The natural question is: why use tensors when (large) matrices can already be challenging to work with? Proponents are quick to point out that tensors can model more complex relationships. Anandkumar explains:

Tensors are higher order generalizations of matrices. While matrices are two-dimensional arrays consisting of rows and columns, tensors are now multi-dimensional arrays. … For instance, you can picture tensors as a three-dimensional cube. In fact, I have here on my desk a Rubik’s Cube, and sometimes I use it to get a better understanding when I think about tensors. … One of the biggest use of tensors is for representing higher order relationships. … If you want to only represent pair-wise relationships, say co-occurrence of every pair of words in a set of documents, then a matrix suffices. On the other hand, if you want to learn the probability of a range of triplets of words, then we need a tensor to record such relationships. These kinds of higher order relationships are not only important for text, but also, say, for social network analysis. You want to learn not only about who is immediate friends with whom, but, say, who is friends of friends of friends of someone, and so on. Tensors, as a whole, can represent much richer data structures than matrices.

The passage:

…who is friends of friends of friends of someone, and so on. Tensors, as a whole, can represent much richer data structures than matrices.

caught my attention.

The same could be said about other data structures, such as graphs.

I mention graphs because data representations carry assumptions and limitations that aren’t labeled for casual users. Such as directed acyclic graphs not supporting the representation of husband-wife relationships.

BTW, the Wikipedia entry on tensors has this introduction to defining tensor:

There are several approaches to defining tensors. Although seemingly different, the approaches just describe the same geometric concept using different languages and at different levels of abstraction.

Wonder if there is a mapping between the components of the different approaches?

Suggestions of other tensor resources appreciated!

Microsoft Security Intelligence Report (Volume 18: July 2014 – December 2014)

May 15th, 2015

Microsoft Security Intelligence Report (Volume 18: July 2014 – December 2014)

Pay particular attention to the featured report: “The life and times of an exploit.” An exploit that was successfully used despite a patch being available.

A good illustration that once buggy software is “in the wild,” patching those bugs only protects users bright enough to apply the patches.

For example, 77% of PCs are running unpatched Java JREs.

The lesson here is that patch maintenance is a necessary evil but to avoid evil altogether, less buggy software should be the goal.

Dynamical Systems on Networks: A Tutorial

May 14th, 2015

Dynamical Systems on Networks: A Tutorial by Mason A. Porter and James P. Gleeson.


We give a tutorial for the study of dynamical systems on networks. We focus especially on “simple” situations that are tractable analytically, because they can be very insightful and provide useful springboards for the study of more complicated scenarios. We briefly motivate why examining dynamical systems on networks is interesting and important, and we then give several fascinating examples and discuss some theoretical results. We also briefly discuss dynamical systems on dynamical (i.e., time-dependent) networks, overview software implementations, and give an outlook on the field.

At thirty-nine (39) pages and two hundred and sixty-three references, the authors leave the reader with an overview of the field and the tools to go further.

I am intrigued by the closer by the authors:

Finally, many networks are multiplex (i.e., include multiple types of edges) or have other multilayer features [16, 136]. The existence of multiple layers over which dynamics can occur and the possibility of both structural and dynamical correlations between layers offers another rich set of opportunities to study dynamical systems on networks. The investigation of dynamical systems on multilayer networks is only in its infancy, and this area is also loaded with a rich set of problems [16, 136, 144, 205].

Topic maps can have multiple type of edges and multiple layers.

For further reading on those topics see:

The structure and dynamics of multilayer networks by S. Boccaletti, G. Bianconi, R. Criado, C.I. del Genio, J. Gómez-Gardeñes, M. Romance, I. Sendiña-Nadal, Z. Wang, M. Zanin.


In the past years, network theory has successfully characterized the interaction among the constituents of a variety of complex systems, ranging from biological to technological, and social systems. However, up until recently, attention was almost exclusively given to networks in which all components were treated on equivalent footing, while neglecting all the extra information about the temporal- or context-related properties of the interactions under study. Only in the last years, taking advantage of the enhanced resolution in real data sets, network scientists have directed their interest to the multiplex character of real-world systems, and explicitly considered the time-varying and multilayer nature of networks. We offer here a comprehensive review on both structural and dynamical organization of graphs made of diverse relationships (layers) between its constituents, and cover several relevant issues, from a full redefinition of the basic structural measures, to understanding how the multilayer nature of the network affects processes and dynamics.

Multilayer Networks by Mikko Kivelä, Alexandre Arenas, Marc Barthelemy, James P. Gleeson, Yamir Moreno, Mason A. Porter.


In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such “multilayer” features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize “traditional” network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary of terminology to relate the numerous existing concepts to each other, and provide a thorough discussion that compares, contrasts, and translates between related notions such as multilayer networks, multiplex networks, interdependent networks, networks of networks, and many others. We also survey and discuss existing data sets that can be represented as multilayer networks. We review attempts to generalize single-layer-network diagnostics to multilayer networks. We also discuss the rapidly expanding research on multilayer-network models and notions like community structure, connected components, tensor decompositions, and various types of dynamical processes on multilayer networks. We conclude with a summary and an outlook.

This may have been where we collectively went wrong in marketing topic maps. Yes, yes it is true that topic maps could do multilayer networks but network theory has made $billions with an overly simplistic model that bears little resemblance to reality.

As computation resources improve and closer to reality models, at least somewhat closer, become popular, something between simplistic networks and the full generality of topic maps could be successful.

Where Big Data Projects Fail

May 14th, 2015

Where Big Data Projects Fail by Bernard Marr.

From the post:

Over the past 6 months I have seen the number of big data projects go up significantly and most of the companies I work with are planning to increase their Big Data activities even further over the next 12 months. Many of these initiatives come with high expectations but big data projects are far from fool-proof. In fact, I predict that half of all big data projects will fail to deliver against their expectations.

Failure can happen for many reasons, however there are a few glaring dangers that will cause any big data project to crash and burn. Based on my experience working with companies and organizations of all shapes and sizes, I know these errors are all too frequent. One thing they have in common is they are all caused by a lack of adequate planning.

(emphasis added)

To whet your appetite for the examples Marr uses, here are the main problems he identifies:

  • Not starting with clear business objectives
  • Not making a good business case
  • Management Failure
  • Poor communication
  • Not having the right skills for the job

Marr’s post should be mandatory reading at the start of every proposed big data project. And after reading it, the project team should prepare a detailed statement of the business objectives and the business case, along with how it will be determined the business objectives will be measured.

Or to put it differently, no big data project should start without the ability to judge its success or failure.