Hijacking a Plane with Excel

April 26th, 2015

Wait! That’s not the right title! Hacking Airplanes by Bruce Schneier.

I was thinking about the Dilbert cartoon where the pointed haired boss tries to land a plane using Excel. ;-)

There are two points where I disagree with Bruce’s post, at least a little.

From the post:

Governments only have a fleeting advantage over everyone else, though. Today’s top-secret National Security Agency programs become tomorrow’s Ph.D. theses and the next day’s hacker’s tools. So while remotely hacking the 787 Dreamliner’s avionics might be well beyond the capabilities of anyone except Boeing engineers today, that’s not going to be true forever.

What this all means is that we have to start thinking about the security of the Internet of Things–whether the issue in question is today’s airplanes or tomorrow’s smart clothing. We can’t repeat the mistakes of the early days of the PC and then the Internet, where we initially ignored security and then spent years playing catch-up. We have to build security into everything that is going to be connected to the Internet.

First, I’m not so sure that only current Boeing engineers would be capable of hacking a 787 Dreamliner’s avionics. I don’t have a copy of it but I assume there are plenty of ex-Boeing engineers who may have a copy. And other people who could obtain a copy of it. Probably more of a lack of interest than access to the avionics code that explains why it hasn’t been hacked so far. If you want to crash an airline there are many easier methods than hacking its avionics code.

Second, I am far from convinced by Bruce’s argument:

We can’t repeat the mistakes of the early days of the PC and then the Internet, where we initially ignored security and then spent years playing catch-up.

Unless a rule against human stupidity was passed quite recently I don’t know of any reason why we won’t duplicate the mistakes of the early days of the PC and then of the Internet. Credit cards have been around far longer than both the PC and the Internet, yet fraud abounds in the credit card industry.

Do you remember: The reason companies don’t fix cybersecurity?

The reason why credit card companies don’t stop credit card fraud is that stopping it would cost more than the fraud. It isn’t a moral issue for them, it is a question of profit and loss. There is a point at which fraud becomes too costly and the higher cost of security is worth the cost.

For example, did you know at some banks that no check under $5,000.00 is ever inspected by anyone? Not even for signatures. It isn’t worth the cost of checking every item.

Security, at least for vendors, in the Internet of Things will be the same way. Security if and only if the cost of not having the security is justified against their bottom lines.

That plus human stupidity makes me think that cyber insecurity is here to stay.

PS: You should not attempt to hijack a plane with Excel. I don’t think your chances are all that good and the FBI and TSA (never having caught a hijacker yet), are warning airlines to be looking out for you. The FBI and TSA should be focusing on more likely threats, like hijacking a plane using telepathy.

New York Times Gets Stellarwind IG Report Under FOIA

April 26th, 2015

New York Times Gets Stellarwind IG Report Under FOIA by Benjamin Wittes.

A big thank you! to Benjamin Wittes and the New York Times.

They are the only two (2) stories on the Stellarwind IG report, released Friday evening, that give a link to the document!

The NYT story with the document: Government Releases Once-Secret Report on Post-9/11 Surveillance by Charlie Savage.

The document does not appear at:

Office of the Director of National Intelligence (as of Sunday, 25 April 2015, 17:45 EST).

US unveils 6-year-old report on NSA surveillance by Nedra Pickler (Associated Press or any news feed that parrots the Associated Press).

Suggestion: Don’t patronize news feeds that refer to documents but don’t include links to them.

NOAA weather data – Valuing Open Data – Guessing – History Repeats

April 26th, 2015

Tech titans ready their clouds for NOAA weather data by Greg Otto.

From the post:

It’s fitting that the 20 terabytes of data the National Oceanic and Atmospheric Administration produces every day will now live in the cloud.

The Commerce Department took a step Tuesday to make NOAA data more accessible as Commerce Secretary Penny Pritzker announced a collaboration among some of the country’s top tech companies to give the public a range of environmental, weather and climate data to access and explore.

Amazon Web Services, Google, IBM, Microsoft and the Open Cloud Consortium have entered into a cooperative research and development agreement with the Commerce Department that will push NOAA data into the companies’ respective cloud platforms to increase the quantity of and speed at which the data becomes publicly available.

“The Commerce Department’s data collection literally reaches from the depths of the ocean to the surface of the sun,” Pritzker said during a Monday keynote address at the American Meteorological Society’s Washington Forum. “This announcement is another example of our ongoing commitment to providing a broad foundation for economic growth and opportunity to America’s businesses by transforming the department’s data capabilities and supporting a data-enabled economy.”

According to Commerce, the data used could come from a variety of sources: Doppler radar, weather satellites, buoy networks, tide gauges, and ships and aircraft. Commerce expects this data to launch new products and services that could benefit consumer goods, transportation, health care and energy utilities.

The original press release has this cheery note on the likely economic impact of this data:

So what does this mean to the economy? According to a 2013 McKinsey Global Institute Report, open data could add more than $3 trillion in total value annually to the education, transportation, consumer products, electricity, oil and gas, healthcare, and consumer finance sectors worldwide. If more of this data could be efficiently released, organizations will be able to develop new and innovative products and services to help us better understand our planet and keep communities resilient from extreme events.

Ah, yes, that would be the Open data: Unlocking innovation and performance with liquid information, on which the summary page says:

Open data can help unlock $3 trillion to $5 trillion in economic value annually across seven sectors.

But you need to read the full report (PDF) in order to find footnote 3 on “economic value:”

3. Throughout this report we express value in terms of annual economic surplus in 2013 US dollars, not the discounted value of future cash flows; this valuation represents estimates based on initiatives where open data are necessary but not sufficient for realizing value. Often, value is achieved by combining analysis of open and proprietary information to identify ways to improve business or government practices. Given the interdependence of these factors, we did not attempt to estimate open data’s relative contribution; rather, our estimates represent the total value created.

That is a disclosure that the estimate of $3 to $5 trillion is a guess and/or speculation.

Odd how the guess/speculation disclosure drops out of the Commerce Department press release and when it gets to Greg’s story it reads:

open data could add more than $3 trillion in total value annually to the education, transportation, consumer products, electricity, oil and gas, healthcare, and consumer finance sectors worldwide.

From guess/speculation to no mention to fact, all in the short space of three publications.

Does the valuing of open data remind you of:


(Image from: http://civics.sites.unc.edu/files/2012/06/EarlyAmericanSettlements1.pdf)

The date of 1609 is important. Wikipedia has an article on Virginia, 1609-1610, titled, Starving Time. That year, only sixty (60) out of five hundred (500) colonists survived.

Does “Excellent Fruites by Planting” sound a lot like “new and innovative products and services?”

It does to me.

I first saw this in a tweet by Kirk Borne.

Getting Started with Spark (in Python)

April 26th, 2015

Getting Started with Spark (in Python) by Benjamin Bengfort.

From the post:

Hadoop is the standard tool for distributed computing across really large data sets and is the reason why you see "Big Data" on advertisements as you walk through the airport. It has become an operating system for Big Data, providing a rich ecosystem of tools and techniques that allow you to use a large cluster of relatively cheap commodity hardware to do computing at supercomputer scale. Two ideas from Google in 2003 and 2004 made Hadoop possible: a framework for distributed storage (The Google File System), which is implemented as HDFS in Hadoop, and a framework for distributed computing (MapReduce).

These two ideas have been the prime drivers for the advent of scaling analytics, large scale machine learning, and other big data appliances for the last ten years! However, in technology terms, ten years is an incredibly long time, and there are some well-known limitations that exist, with MapReduce in particular. Notably, programming MapReduce is difficult. You have to chain Map and Reduce tasks together in multiple steps for most analytics. This has resulted in specialized systems for performing SQL-like computations or machine learning. Worse, MapReduce requires data to be serialized to disk between each step, which means that the I/O cost of a MapReduce job is high, making interactive analysis and iterative algorithms very expensive; and the thing is, almost all optimization and machine learning is iterative.

To address these problems, Hadoop has been moving to a more general resource management framework for computation, YARN (Yet Another Resource Negotiator). YARN implements the next generation of MapReduce, but also allows applications to leverage distributed resources without having to compute with MapReduce. By generalizing the management of the cluster, research has moved toward generalizations of distributed computation, expanding the ideas first imagined in MapReduce.

Spark is the first fast, general purpose distributed computing paradigm resulting from this shift and is gaining popularity rapidly. Spark extends the MapReduce model to support more types of computations using a functional programming paradigm, and it can cover a wide range of workflows that previously were implemented as specialized systems built on top of Hadoop. Spark uses in-memory caching to improve performance and, therefore, is fast enough to allow for interactive analysis (as though you were sitting on the Python interpreter, interacting with the cluster). Caching also improves the performance of iterative algorithms, which makes it great for data theoretic tasks, especially machine learning.

In this post we will first discuss how to set up Spark to start easily performing analytics, either simply on your local machine or in a cluster on EC2. We then will explore Spark at an introductory level, moving towards an understanding of what Spark is and how it works (hopefully motivating further exploration). In the last two sections we will start to interact with Spark on the command line and then demo how to write a Spark application in Python and submit it to the cluster as a Spark job.

Be forewarned, this post uses the “F” word (functional) to describe the programming paradigm of Spark. Just so you know. ;-)

If you aren’t already using Spark, this is about as easy a learning curve as can be expected.


I first saw this in a tweet by DataMining.

pandas: powerful Python data analysis toolkit Release 0.16

April 25th, 2015

pandas: powerful Python data analysis toolkit Release 0.16 by Wes McKinney and PyData Development Team.

I mentioned Wes’ 2011 paper on pandas in 2011 and a lot has changed since then.

From the homepage:

pandas: powerful Python data analysis toolkit

PDF Version

Zipped HTML

Date: March 24, 2015 Version: 0.16.0

Binary Installers: http://pypi.python.org/pypi/pandas

Source Repository: http://github.com/pydata/pandas

Issues & Ideas: https://github.com/pydata/pandas/issues

Q&A Support: http://stackoverflow.com/questions/tagged/pandas

Developer Mailing List: http://groups.google.com/group/pydata

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with“relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

  • Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
  • Ordered and unordered (not necessarily fixed-frequency) time series data.
  • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
  • Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:

  • Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
  • Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
  • Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let Series, DataFrame, etc. automatically align the data for you in computations
  • Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data
  • Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects
  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
  • Intuitive merging and joining data sets
  • Flexible reshaping and pivoting of data sets
  • Hierarchical labeling of axes (possible to have multiple labels per tick)
  • Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving / loading data from the ultrafast HDF5 format
  • Time series-specific functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging, etc.

Many of these principles are here to address the shortcomings frequently experienced using other languages / scientific research environments. For data scientists, working with data is typically divided into multiple stages: munging and cleaning data, analyzing / modeling it, then organizing the results of the analysis into a form suitable for plotting or tabular display. pandas is the ideal tool for all of these tasks.

Some other notes

  • pandas is fast. Many of the low-level algorithmic bits have been extensively tweaked in Cython code. However, as with anything else generalization usually sacrifices performance. So if you focus on one feature for your application you may be able to create a faster specialized tool.
  • pandas is a dependency of statsmodels, making it an important part of the statistical computing ecosystem in Python.
  • pandas has been used extensively in production in financial applications.


This documentation assumes general familiarity with NumPy. If you haven’t used NumPy much or at all, do invest some time in learning about NumPy first.

Not that I’m one to make editorial suggestions, ;-), but with almost 200 pages of What’s New entries going back to September of 2011 and topping out at over 1600 pages, I would move all but the latest What’s New to the end. Yes?

BTW, at 1600 pages, you may already be behind in your reading. Are you sure you want to get further behind?

Not only will the reading be entertaining, it will have the side benefit of improving your data analysis skills as well.


I first saw this mentioned in a tweet by Kirk Borne.

Mathematicians Reduce Big Data Using Ideas from Quantum Theory

April 24th, 2015

Mathematicians Reduce Big Data Using Ideas from Quantum Theory by M. De Domenico, V. Nicosia, A. Arenas, V. Latora.

From the post:

A new technique of visualizing the complicated relationships between anything from Facebook users to proteins in a cell provides a simpler and cheaper method of making sense of large volumes of data.

Analyzing the large volumes of data gathered by modern businesses and public services is problematic. Traditionally, relationships between the different parts of a network have been represented as simple links, regardless of how many ways they can actually interact, potentially loosing precious information. Only recently a more general framework has been proposed to represent social, technological and biological systems as multilayer networks, piles of ‘layers’ with each one representing a different type of interaction. This approach allows a more comprehensive description of different real-world systems, from transportation networks to societies, but has the drawback of requiring more complex techniques for data analysis and representation.

A new method, developed by mathematicians at Queen Mary University of London (QMUL), and researchers at Universitat Rovira e Virgili in Tarragona (Spain), borrows from quantum mechanics’ well tested techniques for understanding the difference between two quantum states, and applies them to understanding which relationships in a system are similar enough to be considered redundant. This can drastically reduce the amount of information that has to be displayed and analyzed separately and make it easier to understand.

The new method also reduces computing power needed to process large amounts of multidimensional relational data by providing a simple technique of cutting down redundant layers of information, reducing the amount of data to be processed.

The researchers applied their method to several large publicly available data sets about the genetic interactions in a variety of animals, a terrorist network, scientific collaboration systems, worldwide food import-export networks, continental airline networks and the London Underground. It could also be used by businesses trying to more readily understand the interactions between their different locations or departments, by policymakers understanding how citizens use services or anywhere that there are large numbers of different interactions between things.

You can hop over to Nature, Structural reducibility of multilayer networks, where if you don’t have an institutional subscription:

ReadCube: $4.99 Rent, $9.99 to buy, or Purchase a PDF for $32.00.

Let me save you some money and suggest you look at:

Layer aggregation and reducibility of multilayer interconnected networks


Many complex systems can be represented as networks composed by distinct layers, interacting and depending on each others. For example, in biology, a good description of the full protein-protein interactome requires, for some organisms, up to seven distinct network layers, with thousands of protein-protein interactions each. A fundamental open question is then how much information is really necessary to accurately represent the structure of a multilayer complex system, and if and when some of the layers can indeed be aggregated. Here we introduce a method, based on information theory, to reduce the number of layers in multilayer networks, while minimizing information loss. We validate our approach on a set of synthetic benchmarks, and prove its applicability to an extended data set of protein-genetic interactions, showing cases where a strong reduction is possible and cases where it is not. Using this method we can describe complex systems with an optimal trade–off between accuracy and complexity.

Both articles have four (4) illustrations. Same four (4) authors. The difference being the second one is at http://arxiv.org. Oh, and it is free for downloading.

I remain concerned by the focus on reducing the complexity of data to fit current algorithms and processing models. That said, there is no denying that such reduction methods have proven to be useful.

The authors neatly summarize my concerns with this outline of their procedure:

The whole procedure proposed here is sketched in Fig. 1 and can be summarised as follows: i) compute the quantum Jensen-Shannon distance matrix between all pairs of layers; ii) perform hierarchical clustering of layers using such a distance matrix and use the relative change of Von Neumann entropy as the quality function for the resulting partition; iii) finally, choose the partition which maximises the relative information gain.

With my corresponding concerns:

i) The quantum Jensen-Shannon distance matrix presumes a metric distance for its operations, which may or may not reflect the semantics of the layers (or than by simplifying assumption).

ii) The relative change of Von Neumann entropy is a difference measurement based upon an assumed metric, which may or not represent the underlying semantics of the relationships between layers.

iii) The process concludes by maximizing a difference measurement based upon an assigned metric, which has been assigned to the different layers.

Maximizing a difference, based on an entropy calculation, which is itself based on an assigned metric doesn’t fill me with confidence.

I don’t doubt that the technique “works,” but doesn’t that depend upon what you think is being measured?

A question for the weekend: Do you think this is similar to the questions about dividing continuous variables into discrete quantities?

How to secure your baby monitor [Keep Out Creeps, FBI, NSA, etc.]

April 24th, 2015

How to secure your baby monitor by Lisa Vaas.

From the post:

Two more nurseries have been invaded, with strangers apparently spying on parents and their babies via their baby monitors.

This is nuts. We’re hearing more and more about these kinds of crimes, but there’s nothing commonplace about the level of fear they’re causing as families’ privacy is invaded. It’s time we put some tools into parents’ hands to help.

First, the latest creep-out cyber nursery tales. Read on to the bottom for ways to help keep strangers out of your family’s business.

I don’t know for a fact the FBI or NSA have tapped into baby monitors. But anyone who engages in an orchestrated campaign of false testimony in court spanning decades, lies to Congress (and the public), as well as kidnaps, tortures and executes people, well, my expectations aren’t all that high.

You really are entitled to privacy in your own home, especially with such a joyous occasion as the birth of a child. But that isn’t going to happen by default. Nor is the government going to guarantee that privacy. And it isn’t going to happen by default. Sorry.

You would not bath or dress your child in the front yard so don’t allow their room to become the front yard.

Teach your children good security habits along with looking both ways and holding hands to cross the street.

Almost all digitally recorded data is or can be compromised. That won’t change in the short run but we can create islands of privacy for our day to day lives. Starting with every child’s bedroom.

>30 Days From Patch – No Hacker Liability – Civil or Criminal

April 24th, 2015

Potent, in-the-wild exploits imperil customers of 100,000 e-commerce sites by Dan Goodin.

From the post:

Criminals are exploiting an extremely critical vulnerability found on almost 100,000 e-commerce websites in a wave of attacks that puts the personal information for millions of people at risk of theft.

The remote code-execution hole resides in the community and enterprise editions of Magento, the Internet’s No. 1 content management system for e-commerce sites. Engineers from eBay, which owns the e-commerce platform, released a patch in February that closes the vulnerability, but as of earlier this week, more than 98,000 online merchants still hadn’t installed it, according to researchers with Byte, a Netherlands-based company that hosts Magento-using websites. Now, the consequences of that inaction are beginning to be felt, as attackers from Russia and China launch exploits that allow them to gain complete control over vulnerable sites.

“The vulnerability is actually comprised of a chain of several vulnerabilities that ultimately allow an unauthenticated attacker to execute PHP code on the Web server,” Netanel Rubin, a malware and vulnerability researcher with security firm Checkpoint, wrote in a recent blog post. “The attacker bypasses all security mechanisms and gains control of the store and its complete database, allowing credit card theft or any other administrative access into the system.”

This flaw has been fixed but:

Engineers from eBay, which owns the e-commerce platform, released a patch in February that closes the vulnerability, but as of earlier this week, more than 98,000 online merchants still hadn’t installed it,…

The House of Representatives (U.S.) recently passed a cybersecurity bill to give companies liability protection while sharing threat data. As a step towards more sharing of cyberthreat information.

OK, but so far, have you heard of any incentives to encourage better security practices? Better security practices such as installing patches for known vulnerabilities.

Here’s an incentive idea for patch installation:

Exempt hackers from criminal and civil liability for vulnerabilities with patches more than thirty (30) days old.

Why not?

It will create a small army of hackers who pounce on every announced patch in hopes of catching someone over the thirty day deadline. It neatly solves the problem of how to monitor the installation of patches. (I am assuming the threat of being looted provides some incentive for patch maintenance.)

The second part should be a provision that insurance cannot be sold to cover losses due to hacks more than thirty days after patch release. As we have seen before, users rely on insurance to avoid spending money on cybersecurity. For more than thirty day after patch hacks, users have to eat the losses.

Let me know if you are interested in the >30-Day-From-Patch idea. I am willing to help draft the legislation.

For further information on this vulnerability:

Wikipedia on Magento, has about 30% of the ecommerce market.

Magento homepage, etc.

Analyzing the Magento Vulnerability (Updated) by Netanel Rubin.

From Rubin’s post:

Check Point researchers recently discovered a critical RCE (remote code execution) vulnerability in the Magento web e-commerce platform that can lead to the complete compromise of any Magento-based store, including credit card information as well as other financial and personal data, affecting nearly two hundred thousand online shops.

Check Point privately disclosed the vulnerabilities together with a list of suggested fixes to eBay prior to public disclosure. A patch to address the flaws was released on February 9, 2015 (SUPEE-5344 available here). Store owners and administrators are urged to apply the patch immediately if they haven’t done so already.
For a visual demonstration of one way the vulnerability can be exploited, please see our video here.

What kind of attack is it?

The vulnerability is actually comprised of a chain of several vulnerabilities that ultimately allow an unauthenticated attacker to execute PHP code on the web server. The attacker bypasses all security mechanisms and gains control of the store and its complete database, allowing credit card theft or any other administrative access into the system.

This attack is not limited to any particular plugin or theme. All the vulnerabilities are present in the Magento core, and affects any default installation of both Community and Enterprise Editions. Check Point customers are already protected from exploitation attempts of this vulnerability through the IPS software blade.

Rubin’s post has lots of very nice PHP code.

I first saw this in a tweet by Ciuffy.

Ordinary Least Squares Regression: Explained Visually

April 24th, 2015

Ordinary Least Squares Regression: Explained Visually by Victor Powell and Lewis Lehe.

From the post:

Statistical regression is basically a way to predict unknown quantities from a batch of existing data. For example, suppose we start out knowing the height and hand size of a bunch of individuals in a “sample population,” and that we want to figure out a way to predict hand size from height for individuals not in the sample. By applying OLS, we’ll get an equation that takes hand size—the ‘independent’ variable—as an input, and gives height—the ‘dependent’ variable—as an output.

Below, OLS is done behind-the-scenes to produce the regression equation. The constants in the regression—called ‘betas’—are what OLS spits out. Here, beta_1 is an intercept; it tells what height would be even for a hand size of zero. And beta_2 is the coefficient on hand size; it tells how much taller we should expect someone to be for a given increment in their hand size. Drag the sample data to see the betas change.

[interactive graphic omitted]

At some point, you probably asked your parents, “Where do betas come from?” Let’s raise the curtain on how OLS finds its betas.

Error is the difference between prediction and reality: the vertical distance between a real data point and the regression line. OLS is concerned with the squares of the errors. It tries to find the line going through the sample data that minimizes the sum of the squared errors. Below, the squared errors are represented as squares, and your job is to choose betas (the slope and intercept of the regression line) so that the total area of all the squares (the sum of the squared errors) is as small as possible. That’s OLS!

The post includes a visual explanation of ordinary least squares regression up to 2 independent variables (3-D).

Height wasn’t the correlation I heard with hand size but Visually Explained is a family friendly blog. And to be honest, I got my information from another teenager (at the time), so my information source is suspect.

jQAssistant 1.0.0 released

April 24th, 2015

jQAssistant 1.0.0 released by Dirk Mahler.

From the webpage:

We’re proud to announce the availability of jQAssistant 1.0.0 – lots of thanks go to all the people who made this possible with their ideas, criticism and code contributions!

Feature Overview

  • Static code analysis tool using the graph database Neo4j
  • Scanning of software related structures, e.g. Java artifacts (JAR, WAR, EAR files), Maven descriptors, XML files, relational database schemas, etc.
  • Allows definition of rules and automated verification during a build process
  • Rules are expressed as Cypher queries or scripts (e.g. JavaScript, Groovy or JRuby)
  • Available as Maven plugin or CLI (command line interface)
  • Highly extensible by plugins for scanners, rules and reports
  • Integration with SonarQube
  • It’s free and Open Source

Example Use Cases

  • Analysis of existing code structures and matching with proposed architecture and design concepts
  • Impact analysis, e.g. which test is affected by potential code changes
  • Visualization of architectural concepts, e.g. modules, layers and their dependencies
  • Continuous verification and reporting of constraint violations to provide fast feedback to developers
  • Individual gathering and filtering of metrics, e.g. complexity per component
  • Post-Processing of reports of other QA tools to enable refactorings in brown field projects
  • and much more…

Get it!

jQAssistant is available as a command line client from the downloadable distribution

jqassistant.sh scan -f my-application.war
jqassistant.sh analyze
jqassistant.sh server

or as Maven plugin:


For a list of latest changes refer to the release notes, the documentation provides usage information.

Those who are impatient should go for the Get Started page which provides information about the first steps about scanning applications and running analysis.

Your Feedback Matters

Every kind of feedback helps to improve jQAssistant: feature requests, bug reports and even questions about how to solve specific problems. You can choose between several channels – just pick your preferred one: the discussion group, stackoverflow, a Gitter channel, the issue tracker, e-mail or just leave a comment below.


You want to get started quickly for an inventory of an existing Java application architecture? Or you’re interested in setting up a continuous QA process that verifies your architectural concepts and provides graphical reports?
The team of buschmais GbR offers individual workshops for you! For getting more information and setting up an agenda refer to http://jqassistant.de (German) or just contact us via e-mail!

Short of wide spread censorship, in order for security breaches to fade from the news spotlight, software quality/security must improve.

jQAssistant 1.0.0 is one example of the type of tool required for software quality/security to improve.

Of particular interest is its use of Neo4j, enables having named relationships of materials to your code.

I don’t mean to foster the “…everything is a graph…” any more than I would foster “…everything is a set of relational tables…” or “…everything is a key/value pair…,” etc. Yes, but the question is: “What is the best way, given my requirements and constraints to achieve objective X?” Whether relationships are explicit, if so, what can I say about them?, or implicit, depends on my requirements, not those of a vendor.

In the case of recording who wrote the most buffer overflows and where, plus other flaws, tracking named relationships and similar information should be part of your requirements and graphs are a good way to meet that requirement.

Animation of Gerrymandering?

April 24th, 2015

United States Congressional District Shapefiles by Jeffrey B. Lewis, Brandon DeVine, and Lincoln Pritcher with Kenneth C. Martis.

From the description:

This site provides digital boundary definitions for every U.S. Congressional District in use between 1789 and 2012. These were produced as part of NSF grant SBE-SES-0241647 between 2009 and 2013.

The current release of these data is experimental. We have had done a good deal of work to validate all of the shapes. However, it is quite likely that some irregulaties remain. Please email jblewis@ucla.edu with questions or suggestions for improvement. We hope to have a ticketing system for bugs and a versioning system up soon. The district definitions currently available should be considered an initial-release version.

Many districts were formed by aggregragating complete county shapes obtained from the National Historical Geographic Information System (NHGIS) project and the Newberry Library’s Atlas of Historical County Boundaries. Where Congressional district boundaries did not coincide with county boundaries, district shapes were constructed district-by-district using a wide variety of legal and cartographic resources. Detailed descriptions of how particular districts were constructed and the authorities upon which we relied are available (at the moment) by request and described below.

Every state districting plan can be viewed quickly at https://github.com/JeffreyBLewis/congressional-district-boundaries (clicking on any of the listed file names will create a map window that can be paned and zoomed). GeoJSON definitions of the districts can also be downloaded from the same URL. Congress-by-Congress district maps in ERSI ShapefileA format can be downloaded below. Though providing somewhat lower resolution than the shapefiles, the GeoJSON files contain additional information about the members who served in each district that the shapefiles do not (Congress member information may be useful for creating web applications with, for example, Google Maps or Leaflet).

Project Team

The Principal Investigator on the project was Jeffrey B. Lewis. Brandon DeVine and Lincoln Pitcher researched district definitions and produced thousands of digital district boundaries. The project relied heavily on Kenneth C. Martis’ The Historical Atlas of United States Congressional Districts: 1789-1983. (New York: The Free Press, 1982). Martis also provided guidance, advice, and source materials used in the project.

How to cite

Jeffrey B. Lewis, Brandon DeVine, Lincoln Pitcher, and Kenneth C. Martis. (2013) Digital Boundary Definitions of United States Congressional Districts, 1789-2012. [Data file and code book]. Retrieved from http://cdmaps.polisci.ucla.edu on [date of

An impressive resource for anyone interested in the history of United States Congressional Districts and their development. An animation of gerrymandering of congressional districts was the first use case that jumped to mind. ;-)


I first saw this in a tweet by Larry Mullen.

Are Government Agencies Trustworthy? FBI? No!

April 23rd, 2015

Pseudoscience in the Witness Box: The FBI faked an entire field of forensic science by Dahlia Lithwick.

From the post:

The Washington Post published a story so horrifying this weekend that it would stop your breath: “The Justice Department and FBI have formally acknowledged that nearly every examiner in an elite FBI forensic unit gave flawed testimony in almost all trials in which they offered evidence against criminal defendants over more than a two-decade period before 2000.”

What went wrong? The Post continues: “Of 28 examiners with the FBI Laboratory’s microscopic hair comparison unit, 26 overstated forensic matches in ways that favored prosecutors in more than 95 percent of the 268 trials reviewed so far.” The shameful, horrifying errors were uncovered in a massive, three-year review by the National Association of Criminal Defense Lawyers and the Innocence Project. Following revelations published in recent years, the two groups are helping the government with the country’s largest ever post-conviction review of questioned forensic evidence.

Chillingly, as the Post continues, “the cases include those of 32 defendants sentenced to death.” Of these defendants, 14 have already been executed or died in prison.

You should read Dahlia’s post carefully and then write “untrustworthy” next to any reference to or material from the FBI.

This particular issue involved identifying hair samples to be the same, which went beyond any known science.

But if 26 out of 28 experts were willing to go there, how far do you think the average agent on the street goes towards favoring the prosecution?

True, the FBI is working to find all the cases where this has happened, but questions about this type of evidence were raised long before now. But questioning the prosecution’s evidence doesn’t work in favor of the FBI.

Defense teams need to start requesting judicial notice of the propensity of executive branch department employees to give false testimony and a cautionary instruction to jurors in cases where they appear in trials.

Unker Non-Linear Writing System

April 23rd, 2015

Unker Non-Linear Writing System by Alex Fink & Sai.

From the webpage:


“I understood from my parents, as they did from their parents, etc., that they became happier as they more fully grokked and were grokked by their cat.”[3]

Here is another snippet from the text:

Binding points, lines and relations

Every glyph includes a number of binding points, one for each of its arguments, the semantic roles involved in its meaning. For instance, the glyph glossed as eat has two binding points—one for the thing consumed and one for the consumer. The glyph glossed as (be) fish has only one, the fish. Often we give glosses more like “X eat Y”, so as to give names for the binding points (X is eater, Y is eaten).

A basic utterance in UNLWS is put together by writing out a number of glyphs (without overlaps) and joining up their binding points with lines. When two binding points are connected, this means the entities filling those semantic roles of the glyphs involved coincide. Thus when the ‘consumed’ binding point of eat is connected to the only binding point of fish, the connection refers to an eaten fish.

This is the main mechanism by which UNLWS clauses are assembled. To take a worked example, here are four glyphs:


If you are interested in graphical representations for design or presentation, this may be of interest.

Sam Hunting forwarded this while we were exploring TeX graphics.

PS: The “cat” people on Twitter may appreciate the first graphic. ;-)

Protecting Your Privacy From The NSA?

April 23rd, 2015

House passes cybersecurity bill by Cory Bennett and Cristina Marcos.

From the post:

The House on Wednesday passed the first major cybersecurity bill since the calamitous hacks on Sony Entertainment, Home Depot and JPMorgan Chase.

Passed 307-116, the Protecting Cyber Networks Act (PCNA), backed by House Intelligence Committee leaders, would give companies liability protections when sharing cyber threat data with government civilian agencies, such as the Treasury or Commerce Departments.

“This bill will strengthen our digital defenses so that American consumers and businesses will not be put at the mercy of cyber criminals,” said House Intelligence Committee Chairman Devin Nunes (R-Calif.).

Lawmakers, government officials and most industry groups argue more data will help both sides better understand their attackers and bolster network defenses that have been repeatedly compromised over the last year.

Privacy advocates and a group of mostly Democratic lawmakers worry the bill will simply shuttle more sensitive information to the National Security Agency (NSA), further empowering its surveillance authority. Many security experts agree, adding that they already have the data needed to study hackers’ tactics.

The connection between sharing threat data and loss of privacy to the NSA escapes me.

At present, the NSA can or is:

  • Monitoring all Web traffic
  • Monitoring all Email traffic
  • Collecting all Phone metadata
  • Collecting all Credit Card information
  • Collecting all Social Media data
  • Collecting all Travel data
  • Collecting all Banking data
  • Has spied on Congress and other agencies
  • Can demand production of other information and records from anyone
  • Probably has a copy of your income tax and social security info

You are concerned private information about you might be leaked to the NSA in the form of threat data?


Anything is possible so something the NSA doesn’t already know could possibly come to light, but I would not waste my energy opposing a bill that is virtually no additional threat to privacy.

The NSA is the issue that needs to be addressed. Its very existence is incompatible with any notion of privacy.

NPR and The “American Bias”

April 23rd, 2015

Can you spot the “American bias” both in this story and the reporting by NPR?

U.S. Operations Killed Two Hostages Held By Al-Qaida, Including An American by Krishnadev Calamur:

President Obama offered his “grief and condolences” to the families of the American and Italian aid workers killed in a U.S. counterterrorism operation in January. Both men were held hostage by al-Qaida.

“I take full responsibility for a U.S. government counterterrorism operation that killed two innocent hostages held by al-Qaida,” Obama said.

The president said both Warren Weinstein, an American held by the group since 2011, and Giovanni Lo Porto, an Italian national held since 2012, were “devoted to improving the lives of the Pakistani people.”

Earlier Thursday, the White House in a statement announced the two deaths, along with the killings of two American al-Qaida members.

“Analysis of all available information has led the Intelligence Community to judge with high confidence that the operation accidentally killed both hostages,” the White House statement said. “The operation targeted an al-Qa’ida-associated compound, where we had no reason to believe either hostage was present, located in the border region of Afghanistan and Pakistan. No words can fully express our regret over this terrible tragedy.”

Exact numbers of casualties from American drone strikes are hard to come by but current estimates suggest that more people have died from drone attacks than in 9/11. A large number of those people were not the intended targets but civilians, including hundreds of children. A Bureau of Investigative Journalism report has spreadsheets you can download to find the specifics about drone strikes in particular countries.

Let’s pause to hear the Obama Administration’s “grief and condolences” over the deaths of civilians and children in each of those strikes:


That’s right, the Obama Administration has trouble admitting any civilians or children have died as a result of its drone war. Perhaps trying to avoid criminal responsibility for their actions. But it certainly has not expressed any “grief and condolences” over those deaths.

Jeff Bachman, of American University, estimates that between twenty-eight (28) and thirty-five (35) civilians die for every one (1) person killed on the Obama “kill” list in Pakistan alone. Drone Strikes: Are They Obama’s Enhanced Interrogation Techniques?

You will notice that NPR reporting does not contrast Obama’s “grief and condolences” for the deaths of two hostages (one of who was American) with his lack of any remorse over the deaths of civilians and children in other drone attacks.

Obama’s lack of remorse over the deaths of innocents in other drone attacks, reportedly isn’t unusual for war criminals. War criminals see their crimes as justified by the pursuit of a goal worth more than innocent human lives. Or in this case, more valuable than non-American innocent lives.

A Scary Earthquake Map – Oklahoma

April 22nd, 2015

Earthquakes in Oklahoma – Earthquake Map


Great example of how visualization can make the case that “standard” industry practices are in fact damaging the public.

The map is interactive and the screen shot above is only one example.

The main site is located at: http://earthquakes.ok.gov/.

From the homepage:

Oklahoma experienced 585 magnitude 3+ earthquakes in 2014 compared to 109 events recorded in 2013. This rise in seismic events has the attention of scientists, citizens, policymakers, media and industry. See what information and research state officials and regulators are relying on as the situation progresses.

The next stage of data mapping should be identifying the owners or those who profited from the waste water disposal wells and their relationships to existing oil and gas interests, as well as their connections to members of the Oklahoma legislature.

What is it that Republicans call it? Ah, accountability, as in holding teachers and public agencies “accountable.” Looks to me like it is time to hold some oil and gas interests and their owners, “accountable.”

PS: Said to not be a “direct” result of fracking but of the disposal of water used for fracking. Close enough for my money. You?

Gathering, Extracting, Analyzing Chemistry Datasets

April 22nd, 2015

Activities at the Royal Society of Chemistry to gather, extract and analyze big datasets in chemistry by Antony Williams.

If you are looking for a quick summary of efforts to combine existing knowledge resources in chemistry, you can do far worse than Antony’s 118 slides on the subject (2015).

I want to call special attention to Slide 107 in his slide deck:


True enough, extraction is problematic, expensive, inaccurate, etc., all the things Antony describes. And I would strongly second all of what he implies is the better practice.

However, extraction isn’t just a necessity for today or for a few years, extraction is going to be necessary so long as we keep records about chemistry or any other subject.

Think about all the legacy materials on chemistry that exist in hard copy format just for the past two centuries. To say nothing of all of still older materials. It is more than unfortunate to abandon all that information simply because “modern” digital formats are easier to manipulate.

That was’t what Antony meant to imply but even after all materials have been extracted and exist in some form of digital format, that doesn’t mean the era of “extraction” will have ended.

You may not remember when atomic chemistry used “punch cards” to record isotopes:


An isotope file on punched cards. George M. Murphy J. Chem. Educ., 1947, 24 (11), p 556 DOI: 10.1021/ed024p556 Publication Date: November 1947.

Today we would represent that record in…NoSQL?

Are you confident that in another sixty-eight (68) years we will still be using NoSQL?

We have to choose from the choices available to us today, but we should not deceive ourselves into thinking our solution will be seen as the “best” solution in the future. New data will be discovered, new processes invented, new requirements will emerge, all of which will be clamoring for a “new” solution.

Extraction will persist as long as we keep recording information in the face of changing formats and requirements. We can improve that process but I don’t think we will ever completely avoid it.

QUANTUM-type packet injection attacks [From NSA to Homework]

April 22nd, 2015

QUANTUM-type packet injection attacks

From the homework assignment:

CSE508: Network Security (PhD Section), Spring 2015

Homework 4: Man-on-the-Side Attacks

Part 1:

The MotS injector you are going to develop, named ‘quantuminject’, will capture the traffic from a network interface in promiscuous mode, and attempt to inject spoofed responses to selected client requests towards TCP services, in a way similar to the Airpwn tool.

Part 2:

The MotS attack detector you are going to develop, named ‘quantumdetect’, will capture the traffic from a network interface in promiscuous mode, and detect MotS attack attempts. Detection will be based on identifying duplicate packets towards the same destination that contain different TCP payloads, i.e., the observation of the attacker’s spoofed response followed by the server’s actual response. You should make every effort to avoid false positives, e.g., due to TCP retransmissions.

See the homework details for further requirements and resources.

If you need a starting point for “Man-on-the-Side Attacks,” I saw Bruce Schneier recommend: Our Government Has Weaponized the Internet. Here’s How They Did It by Nicholas Weaver.

You may also want to read: Attacking Tor: how the NSA targets users’ online anonymity by Bruce Schneier, but with caveats.

For example, Bruce says:

To trick targets into visiting a FoxAcid server, the NSA relies on its secret partnerships with US telecoms companies. As part of the Turmoil system, the NSA places secret servers, codenamed Quantum, at key places on the internet backbone. This placement ensures that they can react faster than other websites can. By exploiting that speed difference, these servers can impersonate a visited website to the target before the legitimate website can respond, thereby tricking the target’s browser to visit a Foxacid server.

In the academic literature, these are called “man-in-the-middle” attacks, and have been known to the commercial and academic security communities. More specifically, they are examples of “man-on-the-side” attacks.

They are hard for any organization other than the NSA to reliably execute, because they require the attacker to have a privileged position on the internet backbone, and exploit a “race condition” between the NSA server and the legitimate website. This top-secret NSA diagram, made public last month, shows a Quantum server impersonating Google in this type of attack.

Have you heard the story of the mountain hiker who explained he was wearing sneakers instead of boots in case he and his companion were chased by a bear? The companion pointed out that no one can outrun a bear, to which the mountain hiker replied, “I don’t have to outrun the bear, I just have to outrun you.

A man-in-the-middle attack can be made from a privileged place on the Internet backbone, but that’s not a requirement. The only requirement is that my “FoxAcid” server has to respond more quickly than the website a user is attempting to contact. That hardly requires a presence on the Internet backbone. I just need to out run the packets from the responding site.

Assume I want to initiate a man-on-the-side attack against a user or organization at a local university. All I need do is obtain access to the university connection to the Internet, on the university side of the connection and by definition I am going to be faster than any site remote to the university.

So I would disagree with Bruce’s statement:

They are hard for any organization other than the NSA to reliably execute, because they require the attacker to have a privileged position on the internet backbone, and exploit a “race condition” between the NSA server and the legitimate website.

Anyone can do man-on-the-side attacks, the only requirement is being faster than the responding computer.

The NSA wanted to screw everyone on the Internet, hence the need to be on the backbone. If you are less ambitious, you can make do with far less expensive and rare resources.


April 22nd, 2015

TIkZ & PGF by Till Tantau.


From the introduction:

Welcome to the documentation of TikZ and the underlying pgf system. What began as a small LaTEX style for creating the graphics in my (Till Tantau’s) PhD thesis directly with pdfLATEX has now grown to become a full-flung graphics language with a manual of over a thousand pages. The wealth of options offered by TikZ is often daunting to beginners; but fortunately this documentation comes with a number slowly-paced tutorials that will teach you almost all you should know about TikZ without your having to read the rest….

The examples will make you want to install the package just to see if you can duplicate them. Some of the graphics I am unlikely to ever use. On the other hand, going over this manual in detail will enable you to recognize what is possible, graphically speaking.

This is truly going to be a lot of fun!


Liability as an Incentive for Secure Software?

April 21st, 2015

Calls Arise to Make Developers Liable for Insecure Software by Sean Doherty.

The usual suspects show up in Sean’s post:

Dan Geer, chief information security officer at the CIA’s venture capital arm, In-Q-Tel, is often in the news arguing for legal measures to make companies accountable for developing vulnerable code. In his keynote address at the Black Hat USA conference in Las Vegas in August 2014, Geer said he would place the onus of security onto software developers.

In a recent Financial Times story, Dave Merkel, chief technology officer at IT security vendor FireEye, said, “Attackers are specifically looking for the things that code was not designed to do. As a software creator, you can test definitively for all the things that your software should do. But testing it for all things it shouldn’t do is an infinite, impossible challenge.”

But Sean adds an alternative to liability versus no-liability:

In today’s software development environment, there is no effective legal framework for liability. But perhaps lawyers are looking for the wrong framework.

The FT story also quoted Wolfgang Kandek, CTO at IT security vendor Qualys: “Building software isn’t like building a house or a bridge or a ship, where accepted engineering principles apply across whole industries.”

Like Greer, there are people in the software industry saying code development should become like the building industry—with standards. An organization of computing professionals, the IEEE Computer Society, found a working group to address the lack of software design standards: Center for Secure Design (CSD).

Liability is coming, its up to the software community to decide how to take that “hit.”

Relying on the courts to work out what “negligence” means for software development will take decades and lead to a minefield of mixed results. States will vary from each other and the feds will no doubt have different standards by circuits, at least for a while.

Standards for software development? Self-imposed standards that set a high but attainable bar that demonstrate improved results to users are definitely preferable to erratic and costly litigation.

Your call.

Imagery Processing Pipeline Launches!

April 21st, 2015

Imagery Processing Pipeline Launches!

From the post:

Our imagery processing pipeline is live! You can search the Landsat 8 imagery catalog, filter by date and cloud coverage, then select any image. The image is instantly processed, assembling bands and correcting colors, and loaded into our API. Within minutes you will have an email with a link to the API end point that can be loaded into any web or mobile application.

Our goal is to make it fast for anyone to find imagery for a news story after a disaster, easy for any planner to get the the most recent view of their city, and any developer to pull in thousands of square KM of processed imagery for their precision agriculture app. All directly using our API

There are two ways to get started: via the imagery browser fetch.astrodigital.com, or directly via the the Search and Publish APIs. All API documentation is on astrodigital.com/api. You can either use the API to programmatically pull imagery though the pipeline or build your own UI on top of the API, just like we did.

The API provides direct access to more than 300TB of satellite imagery from Landsat 8. Early next year we’ll make our own imagery available once our own Landmapper constellation is fully commissioned.

Hit us up @astrodigitalgeo or sign up at astrodigital.com to follow as we build. Huge thanks to our partners at Development Seed who is leading our development and for the infinitively scalable API from Mapbox.

If you are interested in Earth images, you really need to check this out!

I haven’t tried the API but did get a link to an image of my city and surrounding area.

Definitely worth a long look!

Why nobody knows what’s really going into your food

April 21st, 2015

Why nobody knows what’s really going into your food by Phillip Allen, et al.

From the webpage:

Why doesn’t the government know what’s in your food? Because industry can declare on their own that added ingredients are safe. It’s all thanks to a loophole in a 57-year-old law that allows food manufacturers to circumvent the approval process by regulators. This means companies can add substances to their food without ever consulting the Food and Drug Administration about potential health risks.

The animation is quite good and worth your time to watch.

If you think the animation is disheartening, you could spend some time at the Generally Recognized as Safe (GRAS) page over at the FDA.

From the webpage:

“GRAS” is an acronym for the phrase Generally Recognized As Safe. Under sections 201(s) and 409 of the Federal Food, Drug, and Cosmetic Act (the Act), any substance that is intentionally added to food is a food additive, that is subject to premarket review and approval by FDA, unless the substance is generally recognized, among qualified experts, as having been adequately shown to be safe under the conditions of its intended use, or unless the use of the substance is otherwise excluded from the definition of a food additive.

Links to legislation, regulations, applications, and other sources of information.

Leaving the question of regulation to one side, every product should be required to list all of its ingredients. In addition to the package, it should be required to post a full chemical analysis online.

Disclosure would not reach everyone but at least careful consumers would have a sporting chance to discover what they are eating.

IPew Attack Map

April 21st, 2015

IPew Attack Map

From the webpage:


(a collaborative effort by @alexcpsec & @hrbrmstr)

Why should security vendors be the only ones allowed to use silly, animated visualizations to “compensate”? Now, you can have your very own IP attack map that’s just as useful as everyone else’s.

IPew is a feature-rich, customizable D3 / javascript visualization, needing nothing more than a web server capable of serving static content and a sense of humor to operate. It’s got all the standard features that are expected including:

  • Scary dark background!
  • Source & destination country actor/victim attribution!
  • Inane attack names!

BUT, it has one critical element that is missing from the others: SOUND EFFECTS! What good is a global cyberbattle without some cool sounds.

In all seriousness, IPew provides a simple framework – based on Datamaps – for displaying cartographic attack data in a (mostly) responsive way and shows how to use dynamic data via javascript event timers and data queues (in case you’re here to learn vs have fun – or both!).

One important feature, if you work inside the beltway in DC, you can set all attacks as originating from North Korea or China.

Instructive and fun!


The Vocabulary of Cyber War

April 21st, 2015

The Vocabulary of Cyber War

From the post:

At the 39th Joint Doctrine Planning Conference, a semiannual meeting on topics related to military doctrine and planning held in May 2007, a contractor for Booz Allan Hamilton named Paul Schuh gave a short presentation discussing doctrinal issues related to “cyberspace” and the military’s increasing effort to define its operations involving computer networks. Schuh, who would later become chief of the Doctrine Branch at U.S. Cyber Command, argued that military terminology related to cyberspace operations was inadequate and failed to address the expansive nature of cyberspace. According to Schuh, the existing definition of cyberspace as “the notional environment in which digitized information is communicated over computer networks” was imprecise. Instead, he proposed that cyberspace be defined as “a domain characterized by the use of electronics and the electromagnetic spectrum to store, modify, and exchange data via networked systems and associated physical infrastructures.”

Amid the disagreements about “notional environments” and “operational domains,” Schuh informed the conference that “experience gleaned from recent cyberspace operations” had revealed “the necessity for development of a lexicon to accommodate cyberspace operations, cyber warfare and various related terms” such as “weapons consequence” or “target vulnerability.” The lexicon needed to explain how the “‘four D’s (deny, degrade, disrupt, destroy)” and other core terms in military terminology could be applied to cyber weapons. The document that would later be produced to fill this void is The Cyber Warfare Lexicon, a relatively short compendium designed to “consolidate the core terminology of cyberspace operations.” Produced by the U.S. Strategic Command’s Joint Functional Command Component – Network Warfare, a predecessor to the current U.S. Cyber Command, the lexicon documents early attempts by the U.S. military to define its own cyber operations and place them within the larger context of traditional warfighting. A version of the lexicon from January 2009 obtained by Public Intelligence includes a complete listing of terms related to the process of creating, classifying and analyzing the effects of cyber weapons. An attachment to the lexicon includes a series of discussions on the evolution of military commanders’ conceptual understanding of cyber warfare and its accompanying terminology, attempting to align the actions of software with the outcomes of traditional weaponry.

A bit dated, 2009, particularly in terms of the understanding of cyber war but possibly useful for leaked documents from that time period and as a starting point to study the evolution of terminology in the area.

To the extent this crosses over with cybersecurity, you may find the A Glossary of Common Cybersecurity Terminology (NICCS) or Glossary of Information Security Terms, useful. There is overlap between the two.

There are several information sharing efforts under development or in place, which will no doubt lead to the creation of more terminology.

Syrian Travel Guide, Courtesy of the FBI

April 21st, 2015

More Arrests of Americans Attempting to Fight for ISIL in Syria by Bobby Chesney.

For the post:

Six Somali-American men from the Minneapolis area have been arrested on material support charges, based on allegations that they were attempting to travel to Syria to join ISIL. The complaint and corresponding FBI affidavit are posted here. Note that the complaint is a handy case study in the variety of investigative techniques that FBI might employ in a case of this kind, with examples including open-source review of a suspect’s Twitter and Facebook accounts, use of a CHS (“Confidential Human Source”) who previously had been part of this same material support conspiracy, review of call records to establish connections among the defendants, review of bank records, use of video footage recorded in public places, and review of instant messages exchanged via Kik (a footnote on p. 9 of the affidavit notes that Kik “does not maintain records of user conversations”).

Take special note of:

Note that the complaint is a handy case study in the variety of investigative techniques that FBI might employ in a case of this kind, with examples including open-source review of a suspect’s Twitter and Facebook accounts, use of a CHS (“Confidential Human Source”) who previously had been part of this same material support conspiracy, review of call records to establish connections among the defendants, review of bank records, use of video footage recorded in public places, and review of instant messages exchanged via Kik (a footnote on p. 9 of the affidavit notes that Kik “does not maintain records of user conversations”).

If you seriously want to travel to Syria, for reasons that seem sufficient to you, print out the FBI complaint in this case and avoid each and every one of the activities and statements (or statements of that kind), detailed in the complaint.

If you engage in any of those activities or make statements of that sort, your legitimate travel plans to Syria may be disrupted.

Any aid these six defendants could have provided to ISIL would have been more accidental than on purpose. If being nearly overwhelmed with the difficulty of traveling overseas isn’t enough of a clue as to the defendant’s competence, their travel arrangements would have been made more bizarre only by wearing a full Ronald McDonald costume to the airport. One day in a foreign country before returning?

I understand. Idealistic young people have always wanted to join causes larger than themselves. Just taking recent history into account, there were the Freedom Riders in the 1960’s, along with the Anti-War Movement of the same era. And they want to join those causes despite the orthodoxy being preached and enforced by secular governments.

Personally, I don’t see anything wrong with opposition to corrupt, U.S.-supported Arab governments. To the extent ISIL does exactly that, its designation as a “terrorist” organization is ill-founded. Terrorist designations are more political than moral.

Here’s a suggestion:

IS/ISIL seems to be short on governance expertise, however well it has been doing in terms of acquiring territory. Territory is ok, but effective governance gives a greater reason to be invited to the bargaining table.

Under 18 U.S.C. 2339B (j), there is an exception:

No person may be prosecuted under this section in connection with the term “personnel”, “training”, or “expert advice or assistance” if the provision of that material support or resources to a foreign terrorist organization was approved by the Secretary of State with the concurrence of the Attorney General. The Secretary of State may not approve the provision of any material support that may be used to carry out terrorist activity (as defined in section 212(a)(3)(B)(iii) of the Immigration and Nationality Act).

I’m not saying it is likely, but asking the State Department for permission to supply governance, medical expertise, civil engineers, etc. are all necessary aspects of governance that IS/ISIL needs just as much as fighters.

Yes, I know, doing the administrative work of government isn’t as romantic as riding into battle on a “technical” but it is just as necessary.

PS: If anyone is seriously interested, I can collate the FBI complaint with similar complaints and create a “So You Want to Travel to Syria?” document that lists all the statements and activities to avoid.

Aside to the FBI: Syria is going to need civil engineers, etc., no matter who “wins.” Putting people on productive paths is far more useful than feeding and feeding off of desires to make an immediate difference.

Security Mom (Violence In Your Own Backyard)

April 21st, 2015

Security Mom by Juliette Kayyem.

Juliette describes this new podcast series:

My goal with every guest on this podcast– whether it’s a sneak peek into the war room, a debate between friends, or a revealing conversation from the fronts lines of homeland security — is to bring it home for you. We’re going to unpack how this strange and secretive world works, and give you a new perspective on the challenges, successes, and failures we all confront to to keep our nation and our families safe.

What do you want to hear from me? What security issues are on your mind? Email me at securitymom@wgbh.org, or find me on Twitter: @JulietteKayyem.

The first episode: Inside Command And Control During The Boston Marathon Bombings by WGBH News & Juliette Kayyem.

Former Boston Police Commissioner Ed Davis was in command and control during the week of the Boston Marathon bombings in April 2013. On the eve of the second anniversary of the bombing, he details incredible behind-the-scenes decisions during the 100 hours spent in pursuit of Tamerlan and Dzhokhar Tsarnaev.

Not deeply technical but promises to be an interesting window on how security advocates view the world.

Juliette’s reaction to violence in her “backyard” wasn’t unexpected but was still interesting.

Transpose her reaction to individuals and families who have experienced U.S. drone strikes in “their” backyards.

Do you think their reactions are any different?

“Explanations” of violence, including drone strikes, only “work” for the perpetrators of such violence. Something to keep in mind as every act of violence makes security more and more elusive.

I first saw this in a blog post by Jack Goldsmith.

Sony at Wikileaks! (MPAA Privacy versus Your Privacy)

April 20th, 2015

Sony at Wikileaks!

From the press release:

Today, 16 April 2015, WikiLeaks publishes an analysis and search system for The Sony Archives: 30,287 documents from Sony Pictures Entertainment (SPE) and 173,132 emails, to and from more than 2,200 SPE email addresses. SPE is a US subsidiary of the Japanese multinational technology and media corporation Sony, handling their film and TV production and distribution operations. It is a multi-billion dollar US business running many popular networks, TV shows and film franchises such as Spider-Man, Men in Black and Resident Evil.

In November 2014 the White House alleged that North Korea’s intelligence services had obtained and distributed a version of the archive in revenge for SPE’s pending release of The Interview, a film depicting a future overthrow of the North Korean government and the assassination of its leader, Kim Jong-un. Whilst some stories came out at the time, the original archives, which were not searchable, were removed before the public and journalists were able to do more than scratch the surface.

Now published in a fully searchable format The Sony Archives offer a rare insight into the inner workings of a large, secretive multinational corporation. The work publicly known from Sony is to produce entertainment; however, The Sony Archives show that behind the scenes this is an influential corporation, with ties to the White House (there are almost 100 US government email addresses in the archive), with an ability to impact laws and policies, and with connections to the US military-industrial complex.

WikiLeaks editor-in-chief Julian Assange said: “This archive shows the inner workings of an influential multinational corporation. It is newsworthy and at the centre of a geo-political conflict. It belongs in the public domain. WikiLeaks will ensure it stays there.”

Lee Munson writes in WikiLeaks publishes massive searchable archive of hacked Sony documents,

According to the Guardian, former senator Chris Dodd, chairman of the MPAA, wrote how the republication of this information signifies a further attack on the privacy of those involved:

This information was stolen from Sony Pictures as part of an illegal and unprecedented cyberattack. Wikileaks is not performing a public service by making this information easily searchable. Instead, with this despicable act, Wikileaks is further violating the privacy of every person involved.

Hacked Sony documents soon began appearing online and were available for download from a number of different sites but interested parties had to wade through vast volumes of data to find what they were looking for.

WikiLeaks’ new searchable archive will, sadly, make it far easier to discover the information they require.

I don’t see anything sad about the posting of the Sony documents in searchable form by Wikileaks.

If anything, I regret there aren’t more leaks, breaches, etc., of both corporate and governmental document archives. Leaks and breaches that should be posted “as is” with no deletions by Wikileaks, the Guardian or anyone else.

Chris Dodd’s privacy concerns aren’t your privacy concerns. Not even close.

Your privacy concerns (some of them):

  • personal finances
  • medical records
  • phone calls (sorry, already SOL on that one)
  • personal history and relationships
  • more normal sort of stuff

The MPAA, Sony and such, have much different privacy concerns:

  • concealment of meetings with and donations to members of government
  • concealment of hiring practices and work conditions
  • concealment of agreements with other businesses
  • concealment of offenses against the public
  • concealment of the exercise of privilege

Not really the same are they?

Your privacy centers on you, the MPAA/Sony privacy centers on what they have done to others.

New terms? You have a privacy interest, MPAA/Sony has an interest in concealing information.

That sets a better tone for the discussion.

Same Sex Marriage Resources (Another Brown?)

April 20th, 2015

You may be aware that the right of same sex couples to marry is coming up for oral argument before the Supreme Court of the United States on 28 April 2015.

The case, Obergefell v. Hodges, has been consolidated by the Court with Tanco v. Haslam (Tennessee), DeBoer v. Snyder (Michigan), Bourke v. Beshear (Kentucky), and the Court has posed two questions:

  1. Does the Fourteenth Amendment require a state to license a marriage between two people of the same sex?
  2. Does the Fourteenth Amendment require a state to recognize a marriage between two people of the same sex when their marriage was lawfully licensed and performed out-of-state?

What you may not know is that SCOTUSblog has extensive commentary and primary documents collected at: Obergefell vs. Hodges. In addition to blog commentary covering all the positions of the parties and others who have filed briefs in this proceeding, there are links to the briefs by the parties and one hundred and fifty-one (151) briefs filed by others.

There will be a lot of loose talk about a decision favoring gay marriage as another Brown v. Board of Education. A favorable decision would legally end another form of narrow mindedness, as it should. However, I don’t think the two cases are comparable in terms of magnitude.

Perhaps because I was born the year Brown was decided and due to the practice of “…all deliberate speed…” in the South, I attended segregated schools until I was in the ninth grade. I won’t bore you will distorted recollections from so long ago but suffice it to say that interest on the debt of Jim Crow and de jure segregation is still being paid by children of all races in the South.

Same sex couples have been discriminated against and that should end, but they are adults, not children. Brown recognized sinning against children and started the nation on a long road to recognize that as well.

Twitter cuts off ‘firehose’ access…

April 20th, 2015

Twitter cuts off ‘firehose’ access, eyes Big Data bonanza by Mike Wheatley.

From the post:

Twitter upset the applecart on Friday when it announced it would no longer license its stream of half a billion daily tweets to third-party resellers.

The social media site said it had decided to terminate all current agreements with third parties to resell its ‘firehose’ data – an unfiltered, full stream of tweets and all of the metadata that comes with them. For companies that still wish to access the firehose, they’ll still be able to do so, but only by licensing the data directly from Twitter itself.

Twitter’s new plan is to use its own Big Data analytics team, which came about as a result of its acquisition of Gnip in 2014, to build direct relationships with data companies and brands that rely on Twitter data to measure market trends, consumer sentiment and other metrics that can be best understood by keeping track of what people are saying online. The company hopes to complete the transition by August this year.

Not that I had any foreknowledge of Twitter’s plans but I can’t say this latest move is all that surprising.

What I hope also emerges from the “new plan” is a fixed pricing structure for smaller users of Twitter content. I’m really not interested in an airline pricing model where the price you pay has no rational relationship to the value of the product. If it’s the day before the end of a sales quarter I get a very different price for a Twitter feed than mid-way through the quarter. That sort of thing.

Along with being able to specify users to follow/searches and tweet streams in daily increments of 250,000, 500,000, 750,000, 1,000,000, where they are spooled for daily pickup over high speed connections (to put less stress on infrastructure).

I suppose renewable contracts would be too much to ask? ;-)

Unannotated Listicle of Public Data Sets

April 20th, 2015

Great Github list of public data sets by Mirko Krivanek.

Large list of public data sets, previously published on GitHub, which has no annotations to guide you to particular datasets.

Just in case you know of any legitimate aircraft wiring sites, i.e., ones that existed prior to the GAO report on hacking aircraft networks, ping me with the links. Thanks!