Archive for March, 2015

Machine Learning – Ng – Self-Paced

Tuesday, March 31st, 2015

Machine Learning – Ng – Self-Paced

If you need a self-paced machine learning course, consider your wish as granted!

From the description:

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI. This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.

Great if your schedule/commitments varies from week to week, take the classes at your own pace!

Same great content that has made this course such a winner for Coursera.

I first saw this in a tweet by Tryolabs.

Hacking Browsers: Are Browsers the Weakest Link of the Security Chain?

Tuesday, March 31st, 2015

Hacking Browsers: Are Browsers the Weakest Link of the Security Chain?

From the post:

The number of cyber attacks is constantly increasing, and according to security experts they grow even more sophisticated. The security firm Secunia has recently released its annual study of trends in software vulnerabilities, an interesting report that highlights the impact of the presence of flaws in common software and provides useful details on the way bad actors exploit them. According to data provided by Secunia, the number of Web browser vulnerabilities and zero-day exploited by hackers worldwide in 2014 has increased in a significant way.

You can download the Secunia 2015 report here.

The report runs twenty-four (24) pages including the covers and while interesting, a longer report would be more useful. For example, near the end of the report, there are the top 20 core products with the most bugs, followed by the top 50 software portfolios. Of course, the problem being that the lists are not deduped so that Google Chrome, for example, appears in both lists with the same number of vulnerabilities. A one line summary of the bugs in a deduped list with links to Secunia advisories would be quite grand.

Search the Secunia Advisory and Vulnerability Database for details. (Requires Secunia community registration (free) for access to details.)

The report is useful to illustrate the breath of the security problem beyond browsers:

The absolute number of vulnerabilities detected was 15,435, discovered in 3,870 applications from 500 vendors. The number shows a 55% increase in the five year trend, and a 18% increase from 2013 to 2014.

You can improve browser security but if I root the OS, your “security” is more of a wish than a reality.

Securing browsers is like installing deadbolts on the doors of a house with only framing and no walls. An improvement but doesn’t rise to the level of being “secure.”

Systematic incentives (positive ones) are needed to move towards greater security.

Plumbing the Depths of Abject Stupidity

Tuesday, March 31st, 2015

Congressional Rep. John Carter Discovers Encryption; Worries It May One Day Be Used On Computers To Protect Your Data by Mike Masnick.

I can’t match Mike’s setup to where Rep. John Carter discovers encryption but here is Rep. Carter’s reaction:

Carter: Then if that gets to be the wall, the stone wall, and even the law can’t penetrate it, then aren’t we creating an instrument [that] is the perfect tool for lawlessness. This is a very interesting conundrum that’s developing in the law. If they, at their own will at Microsoft can put something in a computer — or at Apple — can put something in that computer [points on a smartphone], which it is, to where nobody but that owner can open it, then why can’t they put it in the big giant super computers, that nobody but that owner can open it. And everything gets locked away secretly. And that sounds like a solution to this great cyber attack problem, but in turn it allows those who would do us harm [chuckles] to have a tool to do a great deal of harm where law enforcement can’t reach them. This is a problem that’s gotta be solved.

That is truly appalling.

That would be Rep. John Carter from Texas.

John has been elected to the House for six (6) terms. Suggestion: Contact every person you know who has ever donated to one of John’s campaigns. Urge them to 1) Demand a refund (not going to happen but it will get his attention), and, 2) Refuse to donate another dime to any future John Carter campaign. If you don’t know someone who has supported John Carter, email this post to anyone you know who may know someone in Texas. (The old six degrees of separation plot.)

I say that in lieu of suggestions for structural reform that may be amusing but not “practical.”

Does Your Visualization Frighten or Inform?

Tuesday, March 31st, 2015

Telling your data’s story: How storytelling can enhance the effectiveness of your visualizations by Michael Freeman.

From the post:

Visualizing complex relationships in big data often requires involved graphical displays that can be intimidating to users. As the volume and complexity of data collection and storage scale exponentially, creating clear, communicative, and approachable visual representations of that data is an increasing challenge. As a data visualization specialist, I frightened one of my first sets of collaborators when I suggested using this display:


What I had failed to communicate was that we would use a story structure to introduce audiences to the complex layout (you can see how I did it here).

Michael tackles big data visualizations that are unclear, present too much information and too many variables.

Most of us can produce visualizations that frighten and confuse, but how many of us can construct visualizations that inform and persuade?

There isn’t a cookie cutter solution to the problem of effectively visualizing data but this post will gently move you in the direction of better visualizations.


PS: Not that anyone has ever seen a topic map visualization that frightened rather than informed. 😉

Graph Resources for Neo4j

Tuesday, March 31st, 2015

GitHub – GraphAware

A small sampling of what you will find at GraphAware’s GitHub page:

GithubNeo4j – Demo Application importing User Github Public Events into Neo4j.

neo4j-timetree – Java and REST APIs for working with time-representing tree in Neo4j.

neo4j-changefeed – A GraphAware Framework Runtime Module allowing users to find out what were the latest changes performed on the graph.

Others await your discovery at: GitHub – GraphAware


I first saw this in a tweet by Jeremy Deane.

Google DeepMind Resources

Monday, March 30th, 2015

Google DeepMind Resources

A collection of all the Google DeepMind publications to date.

Twenty-two (22) papers so far!

A nice way to start your reading week!


The Field Guide to Data Science

Monday, March 30th, 2015

The Field Guide to Data Science by Booz Allen Hamilton.

From “The Story of the Field Guide:”

While there are countless industry and academic publications describing what Data Science is and why we should care, little information is available to explain how to make use of data as a resource. At Booz Allen, we built an industry-leading team of Data Scientists. Over the course of hundreds of analytic challenges for dozens of clients, we’ve unraveled the DNA of Data Science. We mapped the Data Science DNA to unravel the what, the why, the who and the how.

Many people have put forth their thoughts on single aspects of Data Science. We believe we can offer a broad perspective on the conceptual models, tradecraft, processes and culture of Data Science. Companies with strong Data Science teams often focus
on a single class of problems – graph algorithms for social network analysis and recommender models for online shopping are two notable examples. Booz Allen is different. In our role as consultants, we support a diverse set of clients across a variety of domains. This allows us to uniquely understand the DNA of Data Science. Our goal in creating The Field Guide to Data Science is to capture what we have learned and to share it broadly. We want this effort to help drive forward the science and art of Data Science.

This is a great example of what can be done with authors, professional editors and graphic artists putting together a publication.

While it is just a “field guide,” it has enough depth to use it as a starting point for exploring data science projects.

Imagine that you have senior staff who have read and have a grasp of the field guide. I can easily imagine taking the appropriate parts of the field guide to serve as “windows” onto further steps for a particular project. Which would enable senior staff to remain grounded in what they understand and how further steps related back to that understanding. Overall I think this is an excellent field guide/introduction to data science.

BTW, the “Analytic Connection in the Data Lake” graphic on page 28 is similar to topic maps pointing into an infoverse.

I first saw this in a tweet by Kirk Borne.


Hotel Wi-Fi Insecurity – Big Time

Monday, March 30th, 2015

Hotel Wi-Fi router security hole: will this be the Ultimate Pwnie Award Winning Bug for 2015? by Paul Ducklin.

Paul has a highly amusing account of the Pwnie awards and his choice for 2015: CVE-2015-0932, Vulnerability Note VU#930956.

The security hole at issue:

Multiple ANTlabs InnGate models allow unauthenticated read/write to filesystem.

Simply put, some versions of a popular hotel internet access server – those portals you interact with to get Wi-Fi access while you’re at a conference centre or staying in a hotel – can be completely drained of data, and then reprogrammed arbitrarily, via the outside (internet-facing) interface.

Without any authentication.

See Paul’s post for all the details, including a very lucid discussion of rsync that is guaranteed to hold you attention.

Paul also has suggestions for avoiding unpatched ANTlabs InnGate hotel internet access servers.

You can even help your local hotel community by finding unpatched servers. Say near law enforcement conferences. The Department of Homeland Security has helpfully made a list of law enforcement meetings for 2015. (I have a copy just in case it disappears.)

OpenTSDB 2.0.1

Sunday, March 29th, 2015

OpenTSDB 2.0.1

From the homepage:


  • Data is stored exactly as you give it
  • Write with millisecond precision
  • Keep raw data forever


  • Runs on Hadoop and HBase
  • Scales to millions of writes per second
  • Add capacity by adding nodes


  • Generate graphs from the GUI
  • Pull from the HTTP API
  • Choose an open source front-end

If that isn’t impressive enough, check out the features added for the 2.0 release:

OpenTSDB has a thriving community who contributed and requested a number of new features. 2.0 has the following new features:

  • Lock-less UID Assignment – Drastically improves write speed when storing new metrics, tag names, or values
  • Restful API – Provides access to all of OpenTSDB’s features as well as offering new options, defaulting to JSON
  • Cross Origin Resource Sharing – For the API so you can make AJAX calls easily
  • Store Data Via HTTP – Write data points over HTTP as an alternative to Telnet
  • Configuration File – A key/value file shared by the TSD and command line tools
  • Pluggable Serializers – Enable different inputs and outputs for the API
  • Annotations – Record meta data about specific time series or data points
  • Meta Data – Record meta data for each time series, metrics, tag names, or values
  • Trees – Flatten metric and tag combinations into a single name for navigation or usage with different tools
  • Search Plugins – Send meta data to search engines to delve into your data and figure out what’s in your database
  • Real-Time Publishing Plugin – Send data to external systems as they arrive to your TSD
  • Ingest Plugins – Accept data points in different formats
  • Millisecond Resolution – Optionally store data with millisecond precision
  • Variable Length Encoding – Use less storage space for smaller integer values
  • Non-Interpolating Aggregation Functions – For situations where you require raw data
  • Rate Counter Calculations – Handle roll-over and anomaly supression
  • Additional Statistics – Including the number of UIDs assigned and available

I suppose traffic patterns (license plates) are a form of time series data. Yes?

New York Times Confirms: Meaning Is Important!

Sunday, March 29th, 2015

Kate O’Neill tweeted:

There’s that pesky need for meaning again: “Learning to See Data” via @NYTimes #bigdata

, along with this image:


I’m flogging their content on my dime, despite their firewall. 😉 Go figure.

Will the trough of disillusionment with Big Data be mostly due to lack of meaning? Lack of integration (which depends on meaning)? Lack of return (which depends on integration and meaning)?

Or some other cause? Such as visualizing data as a replacement for the meaning of data. None of the better visualization experts would so advise but is your visualization vendor one of them?

The Theory of Relational Databases

Sunday, March 29th, 2015

The Theory of Relational Databases by David Maier.

From the webpage:

This text has been long out of print, but I still get requests for it. The copyright has reverted to me, and you have permission to reproduce it for personal or academic use, but not for-profit purposed. Please include “Copyright 1983 David Maier, used with permission” on anything you distribute.

Out of date, 1983, if you are looking for the latest work but not if you are interested in where we have been. Sometimes the later is more important than the former.


The Genetic Programming Bibliography (Hits 10K!)

Sunday, March 29th, 2015

The Genetic Programming Bibliography by William Langdon.

A truly awesome bibliography collection and tool!

The introduction (10 pages PDF) is a model of clarity and will enhance your use/enjoyment of this bibliography.

You will also find there:Ai/index.html

I first saw this in a tweet by Jason H. Moore, PhD.

I suppose Google may downgrade my search listing because I have included a list of URLs that may be useful to you.

I prefer to post useful data for my readers than I care about gaming Google. If more of us felt that way, search results might be less the products of SEO gaming.

Collected, Vetted, Forty Visualization Blogs

Sunday, March 29th, 2015

Blog Radar at VisuaLoop.

You can run a search with your favorite search engine on “visualization blogs” + “data-visualization blogs” and get about 6,500 “hits,” including duplicates. This is the weed yourself option.

Or you can choose Blog Radar and have forty (40) blogs without duplicates with three (3) posts for each one. This is the pre-weeded option.

Whether you want to expand your blog reading or are looking for a good starting point for a crawler on visualization, you will be hard pressed to find a better resource.


Big Data Leaves Money On The Table

Sunday, March 29th, 2015

Big data hype reminds me of the “He’s Large” song from Popeye.

The recurrent theme is that whatever his other qualities, Bluto is large.

I mention that because Anthony Smith illustrates in When it Comes to Data, Small is the New Big, big data is great, but it never tells the whole story.

The whole story includes how and why customers buy and use your product. Trivial things like that.

Don’t use big data like the NSA uses phone data:

There is no other way we know of to connect the dots NSA & Connecting the Dots

Big data can show a return on your investment but it will only show you some of the facts that are available.

Don’t allow a fixation on “big data” blind you to the value of small data, which isn’t available to big data approaches and tools.

PS: The NSA uses phone data as churn for the sake their budget. Churn of big data doesn’t add to your bottom line.

Income Inequality – News Analysis

Sunday, March 29th, 2015

Discussions of increasing income inequality are common but disconnected from facts after disparity is established. By disconnected from facts I mean, what facts should guide policies to reduce income inequality?

In general I favor reducing income inequality (substantially) so that gives you an idea of where I fall in the discussion.

I am not entirely sold on the ideas discussed in A 26-year-old MIT graduate is turning heads over his theory that income inequality is actually about housing (in 1 graph) by Greg Ferenstein, but the analysis by Matthew Rognlie is different enough from the usual positions to merit your attention.

From the post:

Wealthy tech founders and the automation of middle-class jobs are often blamed for increasing concentrations of wealth in fewer hands. But, a 26-year-old MIT graduate student, Matthew Rognlie, is making waves for an alternative theory of inequality: the problem is housing [PDF].

Rognlie is attacking the idea that rich capitalists have an unfair ability to turn their current wealth into a lazy dynasty of self-reinforcing investments. This theory, made famous by French economist Thomas Piketty, argues that wealth is concentrating in the 1% because more money can be made by investing in machines and land (capital) than paying people to perform work (wages). Because capital is worth more than wages, those with an advantage to invest now in capital become the source of long-term dynasties of wealth and inequality.

Rognlie’s blockbuster rebuttal to Piketty is that “recent trends in both capital wealth and income are driven almost entirely by housing.” Software, robots, and other modern investments all depreciate in price as fast as the iPod. Technology doesn’t hold value like it used to, so it’s misleading to believe that investments in capital now will give rich folks a long-term advantage.

Housing policy. Now there’s a minefield of conflicting interests. Still, if you want to follow where the “data leads” then it is worth a comparison to other policy alternatives and explanations.

To be sure, you can create topic maps that replicate the fanatical ravings of any economic theorist you care to follow but my prejudice is in favor of a critical examination of news reports and policy recommendations. Reasoning you can further your clients goals if your analysis has some resemblance to reality as experienced by others. (That may explain the long term and persistent failures of U.S. policy in the Middle East but I digress.)

New Spatial Aggregation Tutorial for GIS Tools for Hadoop

Sunday, March 29th, 2015

New Spatial Aggregation Tutorial for GIS Tools for Hadoop by Sarah Ambrose.

Sarah motivates you to learn about spatial aggregation, aka spatial binning, with two visualizations of New York taxi data:

No aggregation:




Now that I have your attention, ;-), from the post:

This spatial analysis ability is available using the GIS Tools for Hadoop. There is a new tutorial posted that takes you through the steps of aggregating taxi data. You can find the tutorial here.


A “confusion of logos”?

Sunday, March 29th, 2015

Stuart Wolpert reports that 84 out of 85 students could not draw the Apple logo from memory. 85 college students tried to draw the Apple logo from memory. 84 failed..

Could you draw the ubiquitous Apple computer logo from memory? Probably not, as it turns out.

In a new study published in the Quarterly Journal of Experimental Psychology, UCLA psychologists found that almost none of their subjects could draw the logo correctly from memory. Out of 85 UCLA undergraduate students, only one correctly reproduced the Apple logo when asked to draw it on a blank sheet of paper. Fewer than half the students correctly identified the actual logo when they were shown it among a number of similar logos with slightly altered features.

Among the participants were 52 Apple users, 10 PC users and 23 students who used both Apple and PC products — but the findings did not differ between Apple and PC users.

Just as we have a “murder of crows,” a “pride of lions,” etc., do we need a term for a collection of nearly alike logos?

I ask because of less than one half of those asked could recognize the Apple logo when shown among slightly altered logos.

Would a “confusion of logos” be the correct phrase?

Perhaps Apple will start embedding punch-outs in its ads so you can take an authentic Apple logo with you shopping to avoid confusion while shopping. 😉

PS: The failure of the 84 out of 85 may be explained by 84 out of 85 people not being able to draw an apple from memory, much less an apple based logo. That’s just speculation on my part.

NewsStand: A New View on News (+ Underwear Down Under)

Saturday, March 28th, 2015

NewsStand: A New View on News by Benjamin E. Teitler, et al.


News articles contain a wealth of implicit geographic content that if exposed to readers improves understanding of today’s news. However, most articles are not explicitly geotagged with their geographic content, and few news aggregation systems expose this content to users. A new system named NewsStand is presented that collects, analyzes, and displays news stories in a map interface, thus leveraging on their implicit geographic content. NewsStand monitors RSS feeds from thousands of online news sources and retrieves articles within minutes of publication. It then extracts geographic content from articles using a custom-built geotagger, and groups articles into story clusters using a fast online clustering algorithm. By panning and zooming in NewsStand’s map interface, users can retrieve stories based on both topical signifi cance and geographic region, and see substantially diff erent stories depending on position and zoom level.

Of particular interest to topic map fans:

NewsStand’s geotagger must deal with three problematic cases in disambiguating terms that could be interpreted as locations: geo/non-geo ambiguity, where a given phrase might refer to a geographic location, or some other kind of entity; aliasing, where multiple names refer to the same geographic location, such as “Los Angeles” and “LA”; and geographic name ambiguity or polysemy , where a given name might refer to any of several geographic locations. For example, “Springfield” is the name of many cities in the USA, and thus it is a challenge for disambiguation algorithms to associate with the correct location.

Unless you want to hand disambiguate all geographic references in your sources, this paper merits a close read!

BTW, the paper dates from 2008 and I saw it in a tweet by Kirk Borne, where Kirk pointed to a recent version of NewsStand. Well, sort of “recent.” The latest story I could find was 490 days ago, a tweet from CBS News about the 50th anniversary of the Kennedy assassination in Dallas.

Undaunted I checked out TwitterStand but it seems to suffer from the same staleness of content, albeit it is difficult to tell because links don’t lead to the tweets.

Finally I did try PhotoStand, which judging from the pop-up information on the images, is quite current.

I noticed for Perth, Australia, “A special section of the exhibition has been dedicated to famous dominatrix Madame Lash.”

Sadly this appears to be one the algorithm got incorrect, so members of Congress should not select purchase on their travel arrangements just yet.

Sarah Carty for Daily Mail Australia reports in From modest bloomers to racy corsets: New exhibition uncovers the secret history of women’s underwear… including a unique collection from dominatrix Madam Lash:

From the modesty of bloomers to the seductiveness of lacy corsets, a new exhibition gives us a rare glimpse into the most intimate and private parts of history.

The Powerhouse Museum in Sydney have unveiled their ‘Undressed: 350 Years of Underwear in Fashion’ collection, which features undergarments from the 17th-century to more modern garments worn by celebrities such as Emma Watson, Cindy Crawford and even Queen Victoria.

Apart from a brief stint in Bendigo and Perth, the collection has never been seen by any members of the public before and lead curator Edwina Ehrman believes people will be both shocked and intrigued by what’s on display.

So the collection was once shown in Perth, but for airline reservations you had best book for Sydney.

And no, I won’t leave you without the necessary details:

Undressed: 350 Years of Underwear in Fashion opens at the Powerhouse Museum on March 28 and runs until 12 July 2015. Tickets can be bought here.

Ticket prices do not include transportation expenses to Sydney.

Spoiler alert: The exhibition page says:

Please note that photography is not permitted in this exhibition.


Who You Gonna Call?

Saturday, March 28th, 2015

Confirmation you should join the NRA to protect privacy and the rights of hackers to defend themselves.

The Drug Enforcement Administration abandoned an internal proposal to use surveillance cameras for photographing vehicle license plates near gun shows in the United States to investigate gun-trafficking, the agency’s chief said Wednesday.

DEA Administrator Michelle Leonhart said in a statement that the proposal memorialized in an employee’s email was only a suggestion, never authorized by her agency and never put into action. The AP also learned that the federal Bureau of Alcohol, Tobacco, Firearms and Explosives did not authorize or approve the license plate surveillance plan.

A casual email suggestion warrants two separate high profile denials. That’s power. DEA chief: US abandoned plan to track cars near gun shows.

I’m not sure where else you would investigate gun-trafficking but the bare mention of guns frightens off entire federal agencies.

To protect privacy, who you gonna call?

Tracking NSA/CIA/FBI Agents Just Got Easier

Saturday, March 28th, 2015

I pointed out in The DEA is Stalking You! the widespread use of automobile license reading applications by the DEA. I also suggested that citizens start using their cellphones to take photos of people coming and going from DEA, CIA, FBI offices and posting them online.

The good news is that Big Data has come to the aid of citizens to track NSA/CIA/FBI, etc. agents.

Lisa Vaas writes in Entire Oakland Police license plate reader data set handed to journalist:

Howard Matis, a physicist who works at the Lawrence Berkeley National Laboratory in California, didn’t know that his local police department had license plate readers (LPRs).

But even if they did have LPRs (they do: they have 33 automated units), he wasn’t particularly worried about police capturing his movements.

Until, that is, he gave permission for Ars Technica’s Cyrus Farivar to get data about his own car and its movements around town.

The data is, after all, accessible via public records law.

Ars obtained the entire LPR dataset of the Oakland Police Department (OPD), including more than 4.6 million reads of over 1.1 million unique plates captured in just over 3 years.

Then, to make sense out of data originally provided in 18 Excel spreadsheets, each containing hundreds of thousands of lines, Ars hired a data visualization specialist who created a simple tool that allowed the publication to search any given plate and plot its locations on a map.

How cool is that!?

Of course, your mileage may vary as the original Ars article reports:

In August 2014, the American Civil Liberties Union and the Electronic Frontier Foundation lost a lawsuit to compel the Los Angeles Police Department and the Los Angeles Sheriff’s Department to hand over a mere week’s worth of all LPR data. That case is now on appeal.

The trick being that the state doesn’t mind invading your privacy but is very concerned with you not invading its privacy or knowing enough about its activities to be an informed member of the voting electorate.

If you believe that the government wants to keep information like license reading secret to protect the privacy of other citizens, you need to move to North Korea. I understand they have a very egalitarian society.

Of course these are license reading records collected by the state. Since automobiles are in public view, anyone could start collecting license plate numbers with locations. Now there’s a startup idea. Blanket the more important parts of D.C. inside the Beltway with private license readers. That would be a big data set with commercial value.

To give you an idea of the possibilities, visit Police License Plate Readers at You will find links to a wide variety of license plate reading solutions, including:


A fixed installation device from Vigilant Solutions.

You could wire something together but if you are serious about keeping track of the government keeping track on all of us, you should go with professional grade equipment. As well as adopt an activist response to government surveillance. Being concerned, frightened, “speaking truth to power,” etc. are as effective as doing nothing at all.

Think about a citizen group based license plate data collection. Possible discoveries could include government vehicles at local motels and massage parlors, explaining the donut gaze in some police officer eyes, meetings between regulators and the regulated, a whole range of governmental wrong doing is waiting to be discovered. Think about investing in a mobile license plate reader for your car today!

If you don’t like government surveillance, invite them into the fish bowl.

They have a lot more to hide that you do.

Using Spark DataFrames for large scale data science

Friday, March 27th, 2015

Using Spark DataFrames for large scale data science by Reynold Xin.

From the post:

When we first open sourced Spark, we aimed to provide a simple API for distributed data processing in general-purpose programming languages (Java, Python, Scala). Spark enabled distributed data processing through functional transformations on distributed collections of data (RDDs). This was an incredibly powerful API—tasks that used to take thousands of lines of code to express could be reduced to dozens.

As Spark continues to grow, we want to enable wider audiences beyond big data engineers to leverage the power of distributed processing. The new DataFrame API was created with this goal in mind. This API is inspired by data frames in R and Python (Pandas), but designed from the ground up to support modern big data and data science applications. As an extension to the existing RDD API, DataFrames feature:

  • Ability to scale from kilobytes of data on a single laptop to petabytes on a large cluster
  • Support for a wide array of data formats and storage systems
  • State-of-the-art optimization and code generation through the Spark SQL Catalyst optimizer
  • Seamless integration with all big data tooling and infrastructure via Spark
  • APIs for Python, Java, Scala, and R (in development via SparkR)

For new users familiar with data frames in other programming languages, this API should make them feel at home. For existing Spark users, this extended API will make Spark easier to program, and at the same time improve performance through intelligent optimizations and code-generation.

If you don’t know Spark DataFrames, you are missing out on important Spark capabilities! This post will have to well on the way to recovery.

Even though the reading of data from other sources is “easy” in many cases and support for more is growing, I am troubled by statements like:

DataFrames’ support for data sources enables applications to easily combine data from disparate sources (known as federated query processing in database systems). For example, the following code snippet joins a site’s textual traffic log stored in S3 with a PostgreSQL database to count the number of times each user has visited the site.

That goes well beyond reading data and introduces the concept of combining data, which isn’t the same thing.

For any two data sets that are trivially transparent to you (caveat what is transparent to you may/may not be transparent to others), that example works.

That example fails where data scientists spend 50 to 80 percent of their time: “collecting and preparing unruly digital data.” For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights.

If your handlers are content to spend 50 to 80 percent of your time munging data, enjoy. Not that munging data will ever go away, but documenting the semantics of your data can enable you to spend less time munging and more time on enjoyable tasks.

United States Code (from Office of the Law Revision Counsel)

Friday, March 27th, 2015

United States Code (from Office of the Law Revision Counsel)

The download page says:

Current Release Point

Public Law 113-296 Except 113-287

Each update of the United States Code is a release point. This page provides downloadable files for the current release point. All files are current through Public Law 113-296 except for 113-287. Titles in bold have been changed since the last release point.

A User Guide and the USLM Schema and stylesheet are provided for the United States Code in XML. A stylesheet is provided for the XHTML. PCC files are text files containing GPO photocomposition codes (i.e., locators).

Information about the currency of United States Code titles is available on the Currency page. Files for prior release points are available on the Prior Release Points page. Older materials are available on the Annual Historical Archives page.

You can download as much or as little of the United States Code in XML, XHTML, PCC or PDF format.

Oh, yeah, the 113-287 reference does seem rather cryptic. What? You don’t keep up with Public Law numbers? 😉

The short story is that Congress passed a bill to move material on national parks to volume 54 and that hasn’t happened, yet. If you need more details, see: Title 54 of the U.S. Code: Background and Guidance by the National Park Service.

You can think of this as the outcome of the sausage making process. Interesting in its own right but not terribly helpful in divining the process that produced it.


PS: On Ubuntu, the site displays great on Chrome, don’t know about IE*, and poorly on FireFox.

Data and Goliath – Bruce Schneler – New Book!

Friday, March 27th, 2015

In a recent review of Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World by Bruce Schneler, Steven Aftergood writes in Data and Goliath: Confronting the Surveillance Society that:

“More than just being ineffective, the NSA’s surveillance efforts have actually made us less secure,” he says. Indeed, the Privacy and Civil Liberties Oversight Board found the “Section 215″ program for bulk collection of telephone metadata to be nearly useless, as well as likely illegal and problematic in other ways. But by contrast, it also reported that the “Section 702″ collection program had made a valuable contribution to security. Schneier does not engage on this point.

I’m waiting on my copy of Data and Goliath to arrive but I don’t find it surprising that Bruce overlooked and/or chose to not comment on the Section 702 report.

Starting with the full text, Report on the Surveillance Program Operated Pursuant to Section 702 of the Foreign Intelligence Surveillance Act, at one hundred and ninety-six pages (196), you will be surprised at how few actual facts are recited.

In terms of the efficacy of the 702 program, this is fairly typical:

The Section 702 program has proven valuable in a number of ways to the government’s efforts to combat terrorism. It has helped the United States learn more about the membership, leadership structure, priorities, tactics, and plans of international terrorist organizations. It has enabled the discovery of previously unknown terrorist operatives as well as the locations and movements of suspects already known to the government. It has led to the discovery of previously unknown terrorist plots directed against the United States and foreign countries, enabling the disruption of those plots.

That seems rather short on facts and long on conclusions to me. Yes?

Here’s a case the report singles out as a success:

In one case, for example, the NSA was conducting surveillance under Section 702 of an email address used by an extremist based in Yemen. Through that surveillance, the agency discovered a connection between that extremist and an unknown person in Kansas City, Missouri. The NSA passed this information to the FBI, which identified the unknown person, Khalid Ouazzani, and subsequently discovered that he had connections to U.S.-based Al Qaeda associates, who had previously been part of an abandoned early stage plot to bomb the New York Stock Exchange. All of these individuals eventually pled guilty to providing and attempting to provide material support to Al Qaeda.

Recalling that “early stage plot” means a lot of hot talk with no plan for implementation, which accords with pleas to “attempting to provide material support to Al Qaeda.” That’s grotesque.

Oh, another case:

For instance, in September 2009, the NSA monitored under Section 702 the email address of an Al Qaeda courier based in Pakistan. Through that collection, the agency intercepted emails sent to that address from an unknown individual located in the United States. Despite using language designed to mask their true intent, the messages indicated that the sender was urgently seeking advice on the correct mixture of ingredients to use for making explosives. The NSA passed this information to the FBI, which used a national security letter to identify the unknown individual as Najibullah Zazi, located near Denver, Colorado. The FBI then began intense monitoring of Zazi, including physical surveillance and obtaining legal authority to monitor his Internet activity. The Bureau was able to track Zazi as he left Colorado a few days later to drive to New York City, where he and a group of confederates were planning to detonate explosives on subway lines in Manhattan within the week. Once Zazi became aware that law enforcement was tracking him, he returned to Colorado, where he was arrested soon after. Further investigative work identified Zazi’s co-conspirators and located bomb-making components related to the planned attack. Zazi and one of his confederates later pled guilty and cooperated with the government, while another confederate was convicted and sentenced to life imprisonment. Without the initial tip-off about Zazi and his plans, which came about by monitoring an overseas foreigner under Section 702, the subway-bombing plot might have succeeded.

Sorry, that went by rather fast. The unknown sender in the United States did not know how to make explosives? And despite that, the plot is described as “…planning to detonate explosives on subway lines in Manhattan within the week.” Huh? That’s quite a leap from getting advice on explosives to being ready to execute a complex operation.

What’s wrong with the “terrorists” being tracked by the NSA/FBI? Almost without exception, they lack the skills to make bombs. The FBI fills in, supplying bombs in many cases, Cleveland, 2012, Portland, 2010, and that’s two I remember right off hand. (I don’t have a complete list of terror plots where the FBI supplies the bomb or bomb making materials. Do you? It would save me the work of putting one together. Thanks!)

A more general claim rounds out the “facts” claimed by the report:

A rough count of these cases identifies well over one hundred arrests on terrorism-related offenses. In other cases that did not lead to disruption of a plot or apprehension of conspirators, Section 702 appears to have been used to provide warnings about a continuing threat or to assist in investigations that remain ongoing. Approximately fifteen of the cases we reviewed involved some connection to the United States, such as the site of a planned attack or the location of operatives, while approximately forty cases exclusively involved operatives and plots in foreign countries.

Well, we know that “terrorism-related offense” includes “…attempting to provide material support to Al Qaeda.” And that conspiracy to commit a terrorist act can consist of talking about wanting to commit a terrorist act with no ability to put such a plan in action. Like no knowing how to make a bomb. Fairly serious impediment there, at least for a would be terrorist.

Not to mention that detention has no real relationship to the commission of a crime, as we have stood witness to at Guantanamo Bay (directions).

In Bruce’s defense, like he needs my help!, ;-), no one has an obligation to refute every lie told in support of government surveillance or its highly fictionalized “war on terrorism.” To no small degree, repeating those lies ad nauseam gives them credibility in group think circles, such as inside the beltway in D.C. Especially among agencies whose budgets depend upon those lies and the contractors who profit from them.

Treat yourself to some truth about cybersecurity, order your copy of Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World by Bruce Schneler.

Congressional Influence Model [How To Choose Allies 4 Hackers]

Thursday, March 26th, 2015

Congressional Influence Model by Westley Hennigh.

From the webpage:

This is a collection of data and code for investigating influence in Congress. Specifically, it uses data generated by MapLight and the Center for Responsive Politics to identify opposing interest groups and analyze their political contributions.

Unfortunately, due to size constraints, not all of the campaign finance data can be included in this repo. But if you’re curious you can download it using this scraper (see further instructions there).

I found this following the data for:

When interest groups disagreed on legislation, who did the 113th Congress vote with?

Sorted to show groups most frequently on opposite sides of legislation


To fully appreciate the graphic, see the original at: Congress is a Game, and We Have the Data to Show Who’s Winning by Westley Hennigh.

Where Westley also notes after the graphic:

Amongst more ideologically focused groups the situation is much the same. Conservative Republican interests were very often at odds with both health and welfare and human rights advocates, but Congress stood firmly with conservatives. They were almost twice as likely to vote against the interests of human rights advocates, and more than twice as likely to vote against health & welfare policy organizations.

The force driving this correlation between support by certain groups and favorable votes in Congress isn’t incalculable or hard to guess at. It’s money. The groups above that come out on top control massive amounts of political campaign spending relative to their opponents. The conservative Republican interests in conflict with health and welfare policy groups spent an average of 26 times as much on candidates that won seats in the 113th Congress. They outspent human rights advocates by even more — 300 times as much on average. The Chambers of Commerce, meanwhile, has spent more on lobbying than any other group every year since 1999.

As Westley points out, this is something we all “knew” at some level and now the data makes the correlation between money and policy undeniable.

My first reaction was Westley’s data is a good start towards: How much is that Representative/Senator in the window? The one with the waggly tail., a website where the minimum contribution for legislative votes, taking your calls, etc., is estimated for each member of the United States House and Senate. Interest groups could avoid overpaying for junior members and embarrassing themselves with paltry contributions to more senior members. Think of it as a public price list for legislation.

A How much is that Representative/Senator in the window? The one with the waggly tail. website would be very amusing, but it wouldn’t help me because I don’t have that sort of money. And it isn’t a straight out purchase, which is how they avoid the quid pro quo issue. Many of these interest groups have been greasing the palms of, sorry, contributing to, politicians for years.

In order to gain power by contributions, real power, requires a contribution/issue campaign that spans the political careers of multiple politicians, starting at the state and local level and following those careers into Congress. Which means, of course, getting upset about this or that outrage isn’t enough to sustain the required degree of organization and contributions. Contributions and reminders of contributions have to flow 7 x 365, in good years and lean years, perhaps even more so in lean (non-election) years.

Not to mention that you will need to make friends fast and enemies, permanent ones anyway, very slowly. Perhaps a member of Congress has too much local opposition to favor your side on a minor bill. They have simply be absent rather than vote. You have to learn to live with the reality that your representative/senator has other pressure points. Not unless you want to own one outright. They exist I have no doubt but the asking price would be very high. Easier to get one issue representatives elected than senators but I don’t know how useful that would be in the long term.

After thinking about it for a while, I concluded we know three things for sure:

  • Congress votes with conservatives twice as often as human rights advocates.
  • Conservatives outspend other groups and have for decades.
  • Outspending conservatives would require national/state/local contributions for decades.

Based on those facts, would you choose an ally that:

  • Loses twice as often on their issues as other groups?
  • Doesn’t regularly contributed to campaigns at state/local/federal levels?
  • That has no effective national/state/local organization that has persisted for decades?

How you frame your issues makes a difference in available allies.

Take for example the ACLU and its suit against the NSA to take back the Internet Backbone. The NSA Has Taken Over the Internet Backbone. We’re Suing to Get it Back.

The ACLU complaint against the NSA has issues such as:

48. Plaintiffs are educational, legal, human rights, and media organizations. Their work requires them to engage in sensitive and sometimes privileged communications, both international and domestic, with journalists, clients, experts, attorneys, civil society organizations, foreign government officials, and victims of human rights abuses, among others.

49. By intercepting, copying, and reviewing substantially all international text-based communications—and many domestic communications as well—as they transit telecommunications networks inside the United States, the government is seizing and searching Plaintiffs’ communications in violation of the FAA and the Constitution.

Really makes you feel like girding your loins and putting on body armor doesn’t it? Almost fifty (50) pages of such riveting prose.

Don’t get me wrong, I support the ACLU and deeply appreciate their suing the NSA. The NSA needs to be opposed in every venue by everyone who cares about having any semblance of freedom in the United States.

I hope the ACLU is victorious but at best, the NSA will be forced to obey existing laws, assuming you can trust known liars when they say “…now we are obeying the law, but we can’t let you see that we are obeying the law.” Somehow that doesn’t fill me with confidence, assuming the ACLU is successful.

What happens if we re-phrase the issue of NSA surveillance? So we can choose stronger allies to have on our side? Take the mass collection of credit card data for example. Sweeping NSA Surveillance Includes Credit-Card Transactions, Top Three Phone Companies’ Records by Ryan Gallagher.

What would credit card data enable? Hmmm, can you say a de facto national gun registry? With purchase records for guns and ammunition? What reason other than ownership would I have for buying .460 Weatherby Magnum ammunition?

By framing the issue of surveillance as a gun registration issue, we find the NRA joining with the ACLU and others in ACLU vs. Clapper, No. 13-cv-03994 (WHP), saying:

For more than 50 years since its decision in Nat’l Ass’n for Advancement of Colored People v. State of Ala. ex rel. Patterson, 357 U.S. 449 (1958), the Supreme Court has recognized that involuntary disclosure of the membership of advocacy groups inhibits the exercise of First Amendment rights by those groups. For nearly as long—since the debates leading up to enactment of the Gun Control Act of 1968—the Congress has recognized that government recordkeeping on gun owners inhibits the exercise of Second Amendment rights. The mass surveillance program raises both issues, potentially providing the government not only with the means of identifying members and others who communicate with the NRA and other advocacy groups, but also with the means of identifying gun owners without their knowledge or consent, contrary to longstanding congressional policy repeatedly reaffirmed and strengthened by Congresses that enacted and reauthorized the legislation at issue in this case. The potential effect on gun owners’ privacy is illustrative of the potential effect of the government’s interpretation of the statute on other statutorily protected privacy rights. The injunction should be issued.

That particular suit was unsuccessful at the district court level but that should give you an idea of how “framing” an issue can enable you to attract allies who are more successful than most.

With support of the ACLU, perhaps, just perhaps the NSA will be told to obey the law. Guesses for grabs on how successful that “telling” will be.

With the support of the NRA and similar groups, the very existence of the NSA data archives will come into question. Not beyond possibility that the NSA will be returned to its former, much smaller footprint of legitimate cryptography work.

And what of other NRA positions? (shrugs) I’m sure that any group you look closely enough at will stand for something you don’t like. As I put it to a theologically diverse group forming to create a Bible encoding, “I’m looking for allies, not soul mates. I already have one of those.”


PS: As of April, 2014, Overview of Constitutional Challenges to NSA Collection Activities and Recent Developments, is a summary of legal challenges to the NSA. Dated but I thought it might be helpful.

2nd Amendment-Summary-4-Hackers

Wednesday, March 25th, 2015

As promised, not a deeply technical (legal) analysis of District of Columbia vs. Heller but summary of the major themes in Scalia’s opinion for the majority.

District of Columbia v. Heller, 554 U.S. 570, 128 S. Ct. 2783, 171 L. Ed. 2d 637 (2008) [2008 BL 136680] has the following pagination markers:

* U.S. (official)
** S. Ct. (West Publishing)
*** L. Ed. 2d (Lawyers Editon 2nd)
**** BL (Bloomberg Law)

In the text you will see: [*577] for example which is the start of page 577 in the official version of the opinion. I use the official pagination herein.

Facts: Heller, a police officer applied for a handgun permit, which was denied. Without a permit, possession of a handgun was banned in the District of Columbia. Even if a permit were obtained, the handgun had to be disabled and unloaded. Heller sued the district saying that the Second Amendment protects an individual’s right to possess firearms and that the city’s ban on handguns and the non-functioning requirement, should the handgun be required for self-defense, infringed on that right.

[Observation: When challenging a law on constitutional grounds, get an appropriate plaintiff to bring the suit. I haven’t done the factual background but I rather doubt that Heller was just an ordinary police officer who decided on his own to sue the District of Columbia. Taking a case to the Supreme Court is an expensive proposition. In challenging laws that infringe on hackers, use security researchers, universities, people with clean reputations. Not saying you can’t win with others but on policy debates its better to wear your best clothes.]

Law: Second Amendment: “A well regulated Militia, being necessary to the security of a free State, the right of the people to keep and bear Arms, shall not be infringed.”

Scalia begins by observing:

“[t]he Constitution was written to be understood by the voters; its words and phrases were used in their normal and ordinary as distinguished from [****4] technical meaning.” United States v. Sprague, 282 U. S. 716, 731 (1931); see also Gibbons v. Ogden, 9 Wheat. 1, 188 (1824). [*576]

The crux of Scalia’s argument comes early and is stated quite simply:

The Second Amendment is naturally divided into two parts: its prefatory clause and its operative clause. The former does not limit the latter grammatically, but rather announces a purpose. The Amendment could be rephrased, “Because a well regulated Militia is necessary to the security of a free State, the right of the people to keep and bear Arms shall not be infringed.” [*577]

With that “obvious” construction, Scalia sweeps to one side all arguments that attempt to limit the right to bear arms to a militia context. Its just an observation, not binding in any way on the operative clause. He does retain it for later use to argue the interpretation of the operative clause is consistent with that purpose.

Scalia breaks his analysis of the operative clause into the following pieces:

a. “Right of the People.”

b. “Keep and Bear Arms”

c. Meaning of the Operative Clause.

“Right of the People.” In perhaps the strongest part of the opinion, Scalia observes that “right of the people” occurs in the unamended Constitution and Bill of Rights only two other times, First Amendment (assemby-and-petition clause) and Fourth Amendment (search-and-seizure) clause. The Fourth Amendment has fallen on hard times of late but the First Amendment is still attractive to many. He leaves little doubt that the right to “keep and bear arms” (the next question), is undoubtedly meant to be an individual right. [*579]

“Keep and Bear Arms” Before turning to “keep” and “bear,” Scalia makes two important points with regard to “arms:”

Before addressing the verbs “keep” and “bear,” we interpret their object: “Arms.” The 18th-century meaning is no different from the meaning today. The 1773 edition of Samuel Johnson’s dictionary defined “arms” as “[w]eapons of offence, or armour of defence.” 1 Dictionary of the English Language 106 (4th ed.) (reprinted 1978) (hereinafter Johnson). Timothy Cunningham'[****6] s important 1771 legal dictionary defined “arms” as “any thing that a man wears for his defence, or takes into his hands, or useth in wrath to cast at or strike another.” 1 A New and Complete Law Dictionary; see also N. Webster, American Dictionary of the English Language (1828) (reprinted 1989) (hereinafter Webster) (similar).

The term was applied, then as now, to weapons that were not specifically designed for military use and were not employed in a military capacity. For instance, Cunningham’s legal dictionary gave as an example of usage: “Servants and labourers shall use bows and arrows on Sundays, & c. and not bear other arms.” See also, e.g., An Act for the trial of Negroes, 1797 Del. Laws ch. XLIII, § 6, in 1 First Laws of the State of Delaware 102, 104 (J. Cushing ed. 1981 (pt. 1)); see generally State v. Duke, 42 Tex. 455, 458 (1874) (citing decisions of state courts construing “arms”). Although one founding-era thesaurus limited “arms” (as opposed to “weapons”) to “instruments of offence generally made use of in war,” even that source stated that all firearms constituted “arms.” 1 J. Trusler, The Distinction Between Words Esteemed [*582] Synonymous in the English Language 37 (3d ed. 1794) (emphasis added).

Some have made the argument, bordering on the frivolous, that only those arms in existence in the 18th century are protected by the Second Amendment. We do not interpret constitutional rights that way. Just as the First Amendment protects modern forms of communications, e. g., Reno v. American Civil Liberties Union, 521 U. S. 844, 849 (1997), and the Fourth Amendment applies to modern forms of search, e.g., Kyllo v. United States, 533 U. S. 27, 35-36 (2001), the Second Amendment extends, [**2792] prima facie, to all instruments that constitute bearable arms, even those that were not in existence at the time of the founding. [*581-*582]

Although he says “The 18th-century meaning is no different from the meaning today.” at the outset, the sources cited make it clear that it is the character of an item as a means of offense or defense, generally used in war, that makes it fall into the category “arms.” Which extends to bows and arrows as well as 18th century firearms as well as modern firearms.

“Arms” not limited to 18th Century “Arms”

The second point, particularly relevant to hackers, is that arms are not limited to those existing in the 18th century. Scalia specifically calls out both First and Fourth Amendment cases where rights have evolved along with modern technology. The adaptation to modern technology under those amendments is particularly relevant to making a hackers argument under the Second Amendment.

Posession/Bearing Arms

The meaning of “keep arms” requires only a paragraph or two:

Thus, the most natural reading of “keep Arms” in the Second Amendment is to “have weapons.” [*582]

Which settles the possession of arms question, but what about the right to carry such arms?

The notion of “bear arms” devolves into a lively contest of snipes between Scalia and Stevens. You can read both the majority opinion and the dissent if you are interested but the crucial text reads:

We think that JUSTICE GINSBURG accurately captured the natural meaning of “bear arms.” Although the phrase implies that the carrying of the weapon is for the purpose of “offensive or defensive action,” it in no way connotes participation in a structured military organization.

From our review of founding-era sources, we conclude that this natural meaning was also the meaning that “bear arms” had in the 18th century. In numerous instances, “bear arms” was unambiguously used to refer to the carrying of weapons outside of an organized militia. [*584]

I mention that point just in case some wag argues that cyber weapons should be limited to your local militia or that you don’t have the right to carry such weapons on your laptop, cellphone, USB drive, etc.

Meaning of the Operative Clause

c. Meaning of the Operative Clause. [4] Putting all of these textual elements together, we find that they guarantee the individual right to possess and carry weapons in case of confrontation. This meaning is strongly confirmed by the historical background of the Second Amendment. [5] We look to this because it has always been widely understood that the Second Amendment, like the First and Fourth Amendments, codified a pre-existing right. The very text of the Second Amendment implicitly recognizes the pre-existence of the right and declares only that it “shall not be infringed.” As we said in United [****11] States v. Cruikshank, 92 U. S. 542, 553 (1876), “[t]his is not a right granted by the Constitution. Neither is it in any manner dependent upon that instrument for its existence. The [**2798] second amendment declares [***658] that it shall not be infringed. . . .”[fn16] [*592]

You can’t get much better than a pre-existing right, at least not with the current Supreme Court composition. Certainly sounds like it would extent to defending your computer systems, which the government seems loathe to undertake.

Motivation for the Second Amendment

Skipping over the literalist interpretation of the prefactory clause, Scalia returns to the relationship between the prefatory and operative clause. The opinion goes on for twenty-one (21) pages at this point but an early paragraph captures the gist of the argument if not all of its details:

The debate with respect to the right to keep and bear arms, as with other guarantees in the Bill of Rights, was not over whether it was desirable (all agreed that it was) but over whether it needed to be codified in the Constitution. During the 1788 ratification debates, the fear that the Federal Government would disarm the people in order to impose rule through a standing army or select militia was pervasive in Anti-federalist rhetoric. See, e. g., Letters from The Federal Farmer III (Oct. 10, 1787), in 2 The Complete Anti-Federalist 234, 242 (H. Storing ed. 1981). John Smilie, for example, worried not only that Congress’s “command of the militia” could be used to create a “select militia,” or to have “no militia at all,” but also, as a separate concern, that “[w]hen a select militia is formed; the people in general may be disarmed.” 2 Documentary History of the Ratification of the Constitution 508-509 (M. Jensen ed. 1976) (hereinafter [*599] Documentary Hist.). Federalists responded that because Congress was given no power to abridge the ancient right of individuals to keep and bear arms, such a force could never oppress the people. See, e.g., A Pennsylvanian III (Feb. 20, 1788), in The Origin of the Second Amendment 275, [****15] 276 (D. Young ed., 2d ed. 2001) (hereinafter Young); White, To the Citizens of Virginia (Feb. 22, 1788), in id., at 280, 281; A Citizen of America (Oct. 10, 1787), in id., at 38, 40; Foreign Spectator, Remarks on the Amendments to the Federal Constitution, Nov. 7, 1788, in id., at 556. It was understood across the political spectrum that the right helped to secure the ideal of a citizen militia, which might be necessary to oppose an oppressive military force if the constitutional order broke down.[*598-*599]

Whether you choose to emphasize the disarming of the people by regulation of cyberweapons or the overreaching of the Federal government, the language here is clearly of interest in arguing for cyberweapons under the Second Amendment. The majority opinion on this point is found at pages [*598-*619].

Limitations on “Arms”

The right to possess arms, including cyberweapons, isn’t a slam dunk. The Federal and State governments can place some regulations on the possession of arms. One example that Scalia discusses is United States v. Miller, 307 U. S. 174, 179 (1939). Reading Miller:

…to say only that the Second Amendment [**2816] does not protect those weapons not typically possessed by law-abiding citizens for lawful purposes, such as short-barreled shotguns. That accords with the historical understanding of the scope of the right, see Part III, infra.[fn25] [*625]

So hackers will lose on blue boxes, if you know the reference but quite possibly win on software, code, etc. So far as I know, no one has challeged the right of computer users to protect themselves.

Is there a balancing test for cyber weapons?

The balance of the opinion is concerned with the case at hand and sparring with Justice Breyer but it does have this jewel when it is suggested that the Second Amendment should be subject to a balancing test (a likely argument about cyber weapons):

We know of no other enumerated constitutional right whose core protection has been subjected to a freestanding “interest-balancing” approach. The very enumeration of the right takes out of the hands of government — even the Third Branch of Government — the power to decide on a case-by-case basis whether the right is really worth insisting upon. A constitutional guarantee subject to future judges’ assessments of its usefulness is no constitutional guarantee at all. [15] Constitutional rights are enshrined with the scope they were understood to have when the people adopted [*635] them, whether or not future legislatures or (yes) even future judges think that scope too broad. We would not apply an “interest-balancing” approach to the prohibition of a peaceful neo-Nazi march through Skokie. See National Socialist Party of America v. Skokie, 432 U. S. 43 (1977) (per curiam). The First Amendment contains the freedom-of-speech guarantee that the people ratified, which included exceptions for obscenity, libel, and disclosure of state secrets, but not for the expression of extremely unpopular and wrongheaded views. The Second Amendment is no different. Like the First, it is the very product of an interest balancing by the people — which JUSTICE BREYER would now conduct for them anew. And whatever else it leaves to future evaluation, it surely elevates above all other interests the right of law-abiding, responsible citizens to use arms in defense of hearth and home. [*634-*635]

I rather like the lines:

The very enumeration of the right takes out of the hands of government — even the Third Branch of Government — the power to decide on a case-by-case basis whether the right is really worth insisting upon. A constitutional guarantee subject to future judges’ assessments of its usefulness is no constitutional guarantee at all.

Is the right to privacy no right at all because the intelligence community lapdog FISA court decides in secret when our right to privacy is unnecessary?

Open Issues

Forests have been depopulated to produce the paper required for all the commentaries on District of Columbia v. Heller. What I have penned above is a highly selective summary in hopes of creating interest in a Second Amendment argument for the possession and discussion of cyber weapons.

Open issues include:

  • Evolution of the notion of “arms” for the Second Amendment.
  • What does it mean to posses a cyber weapon? Is code required? Binary?
  • Defensive purposes of knowledge or cyber weapons.
  • Analogies to disarming the public.
  • Others?

As I suggested in A Well Regulated Militia, a Second Amendment argument to protect our rights to cyber weapons could prove to be more successful than other efforts to date.

Unless you like being disarmed while government funded hackers invade your privacy of course.

Let me know if you are interested in sponsoring research on Second Amendment protection for cyber weapons.

PS: Just so you know, I took my own advice and joined the NRA earlier this week. Fights like this can only be won with allies, strong allies.

Who’s Pissed Off at the United States?

Tuesday, March 24th, 2015

Instances of Use of United States Armed Forces Abroad, 1798-2015 by Barbara Salazar Torreon (Congressional Research Service).

From the summary:

This report lists hundreds of instances in which the United States has used its Armed Forces abroad in situations of military conflict or potential conflict or for other than normal peacetime purposes. It was compiled in part from various older lists and is intended primarily to provide a rough survey of past U.S. military ventures abroad, without reference to the magnitude of the given instance noted. The listing often contains references, especially from 1980 forward, to continuing military deployments, especially U.S. military participation in multinational operations associated with NATO or the United Nations. Most of these post-1980 instances are summaries based on presidential reports to Congress related to the War Powers Resolution. A comprehensive commentary regarding any of the instances listed is not undertaken here.

One of the first steps in security analysis is an evaluation of potential attackers. Who has a reason (in their eyes) to go to the time and trouble of attacking you?

Such a list for the United States doesn’t narrow the field by much but it may help avoid overlooking some of the less obvious candidates. To be sure the United States will keep China and North Korea as convenient whipping boys for any domestic cyber misadventures, but that’s just PR. Why would our largest creditor want to screw with our ability to pay them back? All of the antics about China are street theater, far away from where real decisions are made.

What amazes me is despite centuries of misbehavior by American administration after American administration, that places like Vietnam want to have peaceful relations with us. They aren’t carrying a grudge. Hard to say that for the engineers of one U.S. foreign policy disaster after another.

You could also think of the more recent incidents as the starting point of a list of people to hound from public office and/or public service. Either way, I think you will find it useful.

I first saw this in a tweet by the U.S. Dept. of Fear.

Bulk Collection of Signals Intelligence: Technical Options (2015)

Tuesday, March 24th, 2015

Bulk Collection of Signals Intelligence: Technical Options (2015)

From the webpage:

The Bulk Collection of Signals Intelligence: Technical Options study is a result of an activity called for in Presidential Policy Directive 28 (PPD-28), issued by President Obama in January 2014, to evaluate U.S. signals intelligence practices. The directive instructed the Office of the Director of National Intelligence (ODNI) to produce a report within one year “assessing the feasibility of creating software that would allow the intelligence community more easily to conduct targeted information acquisition rather than bulk collection.” ODNI asked the National Research Council (NRC) — the operating arm of the National Academy of Sciences and National Academy of Engineering — to conduct a study, which began in June 2014, to assist in preparing a response to the President. Over the ensuing months, a committee of experts appointed by the Research Council produced the report.

Useful background information for engaging on the policy side of collecting signals intelligence. Since I don’t share the starting assumption that bulk collection of signals intelligence is ever justified inside the United States, it is only of passing interest to me. I concede that in some limited cases surveillance can be authorized but only under the Fourth Amendment and then only by a constitutional court and not a FISA star chamber.

I would have posted a copy for your downloading so you could avoid registration but the work carries this restriction:

Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press

Despite doubting I am on any list any where, still, it isn’t smart to give anyone a free shot. 😉

The main difficulty in challenging such reports is that fictions, invented by the intelligence agencies, are take as facts. Such as the oft reported fiction that bulk collection/retention helps when a new figure is identified. To enable the agencies to consider their past activities. Certainly a theoretical possibility to be sure but how many cases and what were the results of that backtracking are unknown. Quite possibly to the intelligence agencies themselves.

If you have identified someone as a current credible threat, perhaps even on their way to commit an illegal act, who is going to worry about their phone conversations several years ago? Of course, that’s where their “logic” for immediate action runs counter to the fact they are simply inventing work for themselves. The more data they collect, the larger their IT budget and the more people needed just in case they ever want to search it. Complete and total farce.

That’s the other reason I oppose build signals intelligence collection in the United States, it is an incompetent waste of funds. Funds that could be spent on non-manipulative aid to the people of the Middle East (not their governments), which would greatly reduce the odds of anyone being unhappy enough with the United States to commit a terrorist act on its soil. Despite the fact the United States has committed numerous terrorist attacks on theirs.

Sorting [Visualization]

Tuesday, March 24th, 2015

Carlo Zapponi created, a visualization of sorting resource that steps through different sorting algorithms. You can choose from four different initial states, six (6) different sizes (5, 10, 20, 50, 75, 100), and six (6) different colors.

The page defaults to Quick Sort and Heap Sort, but under add algorithms you will find:

I added Wikipedia links for the algorithms. For a larger list see:
Sorting algorithm.

I first saw this in a tweet by Eric Christensen.

Bearing Arms – 2nd Amendment and Hackers – The Constitution

Monday, March 23rd, 2015

All discussions of the right to bear arms in the United States start with the Second Amendment. But since words can’t interpret themselves for specific cases, our next stop is the United States Supreme Court.

One popular resource, The Constitution of the United States of America: Analysis and Interpretation (popularly known as the Constitution Annotated), covers the Second Amendment in a scant five (5) pages.

There is a vast sea of literature on the Second Amendment but there is one case that established the right to bear arms is an individual right and not limited to state militias.

In District of Columbia vs. Heller, 554 U.S. 570 (2008), Justice Scalia writing for the majority found that the right to bear arms was an individual right, for the first time in U.S. history.

The unofficial syllabus notes:

The prefatory clause comports with the Court’s interpretation of the operative clause. The “militia” comprised all males physically capable of acting in concert for the common defense. The Antifederalists feared that the Federal Government would disarm the people in order to disable this citizens’ militia, enabling a politicized standing army or a select militia to rule. The response was to deny Congress power to abridge the ancient right of individuals to keep and bear arms, so that the ideal of a citizens’ militia would be preserved. Pp. 22–28.

Interesting yes? Disarm the people in order to enable “…a politicized standing army (read NSA/CIA/FBI/DHS) or a select militia to rule.”

If citizens are prevented from owning hacking software and information, necessary for their own cybersecurity, have they not been disarmed?

Justice Scalia’s opinion is rich in historical detail and I will be teasing out the threads that seem most relevant to an argument that hacking tools and knowledge should fall under the right to bear arms under the Second Amendment.

In the mean time, some resources that you will find interesting/helpful:

District of Columbia v. Heller in Wikipedia is a quick read and a good way to get introduced to the case and the issues it raises. But only as an introduction, you would not perform surgery based on a newspaper report of a surgery. Yes?

A definite step up in analysis is SCOTUSblog, District of Columbia v. Heller. You will find twenty (20) blog posts on Heller, briefs and documents in the case, plus some twenty (20) briefs supporting the petitioner (District of Columbia) and forty-seven (47) briefs supporting the respondent (Heller). Noting that attorneys could be asked questions about any and all of the theories advanced in the various briefs.

Take this as an illustration of why I don’t visit SCOTUSblog as often as I should. I tend to get lost in the analysis and start chasing threads through the opinions and briefs. One of the many joys being that rarely you find anyone with a hand waving citation “over there, somewhere” as you do in CS literature. Citations are precise or not at all.

No, I don’t propose to drag you through all of the details even of Scalia’s majority opinion but just enough to frame the questions to be answered in making the claim that cyber weapons are the legitimate heirs of arms for purposes of the Second Amendment and entitled to the same protection as firearms.

Do some background reading today and tomorrow. I am re-reading Scalia’s opinion now and will let it soak in for a day or so before posting an outline of it relevant for our purposes. Look for it late on Wednesday, 25 March 2015.

PS: Columbia vs. Heller, 554 U.S. 570 (2008), the full opinion plus dissents. A little over one hundred and fifty (150) pages of very precise writing. Enjoy!

Association Rule Mining – Not Your Typical Data Science Algorithm

Monday, March 23rd, 2015

Association Rule Mining – Not Your Typical Data Science Algorithm by Dr. Kirk Borne.

From the post:

Many machine learning algorithms that are used for data mining and data science work with numeric data. And many algorithms tend to be very mathematical (such as Support Vector Machines, which we previously discussed). But, association rule mining is perfect for categorical (non-numeric) data and it involves little more than simple counting! That’s the kind of algorithm that MapReduce is really good at, and it can also lead to some really interesting discoveries.

Association rule mining is primarily focused on finding frequent co-occurring associations among a collection of items. It is sometimes referred to as “Market Basket Analysis”, since that was the original application area of association mining. The goal is to find associations of items that occur together more often than you would expect from a random sampling of all possibilities. The classic example of this is the famous Beer and Diapers association that is often mentioned in data mining books. The story goes like this: men who go to the store to buy diapers will also tend to buy beer at the same time. Let us illustrate this with a simple example. Suppose that a store’s retail transactions database includes the following information:

If you aren’t familiar with association rule mining, I think you will find Dr. Borne’s post an entertaining introduction.

I would not go quite as far as Dr. Borne with “explanations” for the pop-tart purchases before hurricanes. For retail purposes, so long as we spot the pattern, they could be building dikes out of them. The same is the case for other purchases. Take advantage of the patterns and try to avoid second guessing consumers. You can read more about testing patterns Selling Blue Elephants.