Google Sanctions on France

August 3rd, 2015

Google defies French global ‘right to be forgotten’ ruling by Lee Munson.

From the post:

Last month the French data protection authority – the Commission nationale de l’informatique et des libertés (CNIL) – told Google that successful right to be forgotten requests made by Europeans should be applied across all of the company’s search engines, not just those in Europe.

In response, Google yesterday gave its unequivocal answer to that request: “Non!”

Writing on the company’s Google Europe blog, Peter Fleischer, Global Privacy Counsel, explained how the search giant had complied with the original “right to delist” ruling – which gives EU citizens the right to ask internet search engines to remove embarrassing, sensitive or inaccurate results for search queries that include their name – made by the Court of Justice of the European Union in 2014.

Google does a great job of outlining the consequences of allowing global reach of right to be forgotten rulings:

While the right to be forgotten may now be the law in Europe, it is not the law globally. Moreover, there are innumerable examples around the world where content that is declared illegal under the laws of one country, would be deemed legal in others: Thailand criminalizes some speech that is critical of its King, Turkey criminalizes some speech that is critical of Ataturk, and Russia outlaws some speech that is deemed to be “gay propaganda.”

If the CNIL’s proposed approach were to be embraced as the standard for Internet regulation, we would find ourselves in a race to the bottom. In the end, the Internet would only be as free as the world’s least free place.

We believe that no one country should have the authority to control what content someone in a second country can access. We also believe this order is disproportionate and unnecessary, given that the overwhelming majority of French internet users—currently around 97%—access a European version of Google’s search engine like, rather than or any other version of Google.

As a matter of principle, therefore, we respectfully disagree with the CNIL’s assertion of global authority on this issue and we have asked the CNIL to withdraw its Formal Notice.

The only part of the post where I diverge from Google is with its “we respectfully disagree…” language.

The longer Google delays, the less interest on any possible penalty but I rather doubt that French regulators are going to back off. France is no doubt encouraged by similar efforts in Canada and Russia as reported by Lee Munson.

Google needs to sanction France before a critical mass of nations take up the censorship banner.

What sanctions? Stop servers, along with cloud and other computing services.

See how the French economy and people who depend on it reaction to a crippling loss of service.

The French people are responsible for the fools attempting to be global censors of the Internet. They can damned well turn them out as well.

Disclosure Disrupts the Zero-Day Market

August 3rd, 2015

Robert Lemos writes in Hacking Team Leak Could Lead to Policies Curtailing Security Research:

Within days, Netragard decided to exit the business of brokering exploit sales—a minor part of its overall business—until better regulations and laws could guarantee sold exploits went to legitimate authorities.

The decision underscores that the breach of Hacking Team’s network, and the resulting leak of sensitive business information, is continuing to have major impacts in the security industry.

The disclosure of seven zero-day vulnerabilities—four in Adobe Flash, two in Windows and one in Internet Explorer, according to vulnerability management firm Bugcrowd’s tally—has already enabled commodity attack software sold in underground malware markets to target otherwise protected systems.

“Those exploits were out there, but they were being used in a limited fashion,” Kymberlee Price, senior director of researcher operations at Bugcrowd, told eWEEK. “Now, they are being used extensively.”

Research has shown that a dramatic spike in usage, sometimes as much as a factor of 100,000, can occur following the public release of an exploit in popular software.

Imagine Rick‘s reaction on Pawn Stars if you were trying to sell him a very rare gemstone and the local news reports that 100,000 of them have just been discovered outside of Las Vegas, Nevada.

Public disclosure of zero-day vulnerabilities effectively guts the zero-day market for those techniques.

Now I understand why some security experts and researchers have promoted a cult of secrecy around zero-day vulnerabilities and other exploits.

Public disclosure, that enables customers to avoid exploits and/or put pressure on vendors, guts the market for sale of those same exploits to “legitimate authorities.”

Netragrad wants regulations to limit the sale of exploits, which keeps the exploit market small and the prices high.

I can understand its motivation from an economic point of view.

I am sure the staff at Netragrad sincerely intend:

0-days’s are nothing more than useful tools that when placed in the right hands can benefit the greater good.

That 0-day regulations will maintain the market price for 0-day’s is just happenstance.

If anything, 0-days and other exploits need more immediate and widespread publicity. That will be unfortunate for 0-day exploit sellers but they will be casualties of openness.

Openness is what will eventually create a disparity between vendors who exercise due diligence on cybersecurity and those who don’t.

Without openness, users are left at the mercy of 0-day vendors and “legitimate authorities.”

PS: There has been some indirect empirical research done on the impact of disclosure on exploit markets. See: Before We Knew It – An Empirical Study of Zero-Day Attacks In The Real World by Leyla Bilge and Tudor Dumitras.

Targeting 950 Million Android Phones – Open Source Security Checks?

August 3rd, 2015

How to Hack Millions of Android Phones Using Stagefright Bug, Without Sending MMS by Swati Khandelwal.

From the post:

Earlier this week, security researchers at Zimperium revealed a high-severity vulnerability in Android platforms that allowed a single multimedia text message to hack 950 Million Android smartphones and tablets.

As explained in our previous article, the critical flaw resides in a core Android component called “Stagefright,” a native Android media playback library used by Android to process, record and play multimedia files.

To Exploit Stagefright vulnerability, which is actively being exploited in the wild, all an attacker needed is your phone number to send a malicious MMS message and compromise your Android device with no action, no indication required from your side.

Security researchers from Trend Micro have discovered two new attack scenarios that could trigger Stagefright vulnerability without sending malicious multimedia messages:

  • Trigger Exploit from Android Application
  • Crafted HTML exploit to Target visitors of a Webpage on the Internet

These two new Stagefright attack vectors carry more serious security implications than the previous one, as an attacker could exploit the bug remotely to:

  • Hack millions of Android devices, without knowing their phone numbers and spending a penny.
  • Steal Massive Amount of data.
  • Built a botnet network of Hacked Android Devices, etc.

The specially crafted MP4 file will cause mediaserver‘s heap to be destroyed or exploited,” researchers explained how an application could be used to trigger Stagefright attack.

Swati has video demonstrations of both of the new attack vectors and covers defensive measures for users.

Does the presence of such a bug in software from Google, which has access to almost unlimited programming talent and to hear its tale, the best programming talent in the business, make you curious about security for the Internet of Things (IoT)?

Or has Google been practicing “good enough” software development and cutting corners on testing for bugs and security flaws?

Now that I think about it, Android is an open source project and as we all know, given enough eyeballs, all bugs are shallow (Linus’s Law).

Hmmm, perhaps there aren’t enough eyes or eyes with a view towards security issues reviewing the Android codebase?

Is it the case the Google is implicitly relying on the community to discover subtle security issues in Android software?

Or to ask a more general question: Who is responsible for security checks on open source software? If everyone is responsible, I take that to mean no one is responsible.

Mapping the world of Mark Twain (subject confusion)

August 2nd, 2015

Mapping the world of Mark Twain by Andrew Hill.

From the post:

Mapping Mark Twain

This weekend I was looking through Project Gutenberg and found something even better than a single book, I found the complete works of Mark Twain. I remembered how geographic the stories of Twain are and so knew immediately I had found a treasure chest. For the last few days, I’ve been parsing the books line-by-line and trying to find the localities that make up the world of Mark Twain. In the end, the data has over 20,000 localities. Even counting the cases where sir names are mistaken for places, it is a really cool dataset. What I’ll show you here is only the tip of the iceberg. I put the results together as an interactive map that maybe will inspire you to take a journey with Twain on your own, extend your life a little.

Sounds great!

Warning: Subject Confusion

Mapping the world of Mark Twain (the map)!

The blog entry: has the same name as the map:

Both are excellent and the blog entry includes details on how you can construct similar maps.

Topic maps disambiguate names that would otherwise lead to confusion!

What names do you need to disambiguate?

Or do you need to avoid subject confusion with names used by others? (Unknown to you.)


August 1st, 2015

Lasp: A Language for Distributed, Eventually Consistent Computations by Christopher S. Meiklejohn and Peter Van Roy.

From the webpage:

Why Lasp?

Lasp is a new programming model designed to simplify large scale, fault-tolerant, distributed programming. Lasp is being developed as part of the SyncFree European research project. It leverages ideas from distributed dataflow extended with convergent replicated data types, or CRDTs. This supports computations where not all participants are online together at a given moment. The initial design supports synchronization-free programming by combining CRDTs together with primitives for composing them inspired by functional programming. This lets us write long-lived fault-tolerant distributed applications, including ones with nonmonotonic behavior, in a functional paradigm. The initial prototype is implemented as an Erlang library built on top of the Riak Core distributed systems infrastructure.


Other resources include:

Lasp-dev, the mailing list for Lasp developers.

Lasp at Github.

I was reminded to post about Lasp by this post from Christopher Meiklejohn:

This post is a continuation of my first post about leaving Basho Technologies after almost 4 years.

It has been quite a long time in the making, but I’m finally happy to announce that I am the recipient of a Erasmus Mundus fellowship in their Joint Doctorate in Distribute Computing program. I will be pursuing a full-time Ph.D., with my thesis devoted to developing the Lasp programming language for distributed computing with the goals of simplifying deterministic, distributed, edge computation.

Starting in February 2016, I will be moving to Belgium to begin my first year of studies at the Université catholique de Louvain supervised by Peter Van Roy followed by a second year in Lisbon at IST supervised by Luís Rodrigues.

If you like this article, please consider supporting my writing on gittip.

Looks like exciting developments are ahead for Lash!

Congratulations to Christopher Meiklejohn!

POC – BIND9 TKEY CVE-2015-5477 DoS

August 1st, 2015

Rob Graham has posted a proof of concept (POC) for BIND9 TKEY CVE-2015-5477 DoS.

If you don’t memorize Common Vulnerability and Exposures (CVE) as they appear, CVE-2015-5477 gives the following description:

named in ISC BIND 9.x before 9.9.7-P2 and 9.10.x before 9.10.2-P3 allows remote attackers to cause a denial of service (REQUIRE assertion failure and daemon exit) via TKEY queries.

The POC may or may not be of interest to you but in these security conscious times, the main CVE page will be. The entire CVE database is available for download if you want to try your hand at creative indexing.

There are a number of other valuable resources at the CVE page so take the time to explore while you are there.

Knowledge Map At The Washington Post (Rediscovery of HyperText)

August 1st, 2015

How The Washington Post built — and will be building on — its “Knowledge Map” feature by Shan Wang.

From the post:

The Post is looking to create a database of “supplements” — categorized pieces of text and graphics that help give context around complicated news topics — and add it as a contextual layer across lots of different Post stories.

The Washington Post’s Knowledge Map aims to diminish that frustration by embedding context and background directly in a story. (We wrote about it briefly when it debuted earlier this month.) Highlighted links and buttons within the story, allowing readers to click on and then read brief overviews — called “supplements” — on the right hand side of the same page, without having to leave the page (currently the text and supplements are not tethered, so if you scroll away in the main story, there’s no easy way to jump back to the phrase or name you clicked on initially).

Knowledge Map sprouted a few months ago out of a design sprint (based on a five-day brainstorming method outlined by Google Ventures) that included the Post’s New York-based design and development team WPNYC and members of the data science team in the D.C. office, as well as engineers, designers, and other product people. After narrowing down a list of other promising projects, the team presented to the Post newsroom and to its engineering team an idea for providing readers with better summaries and context for the most complicated, long-evolving stories.

That idea of having context built into a story “really resonated” with colleagues, Sampsel said, so her team quickly created a proof-of-concept using an existing Post story, recruiting their first round of testers for the prototype via Craigslist. Because they had no prior data on what sort of key phrases or figures readers might want explained for any given story, the team relied on trial and error to settle on the right level of detail.

Not to take anything away from the Washington Post but doesn’t that scenario sounds a lot like HTML, <a> links with Javascript “hover” content? Perhaps the content is a bit long for hover, perhaps a pop-up window on mouseOver? Hold the context data locally for response time reasons.

Has the potential of hypertext been so muted by advertising, graphics, interactivity and > 1 MB pages that it takes a “design sprint” to bring some of that potential back to the fore?

I’m very glad that:

That idea of having context built into a story “really resonated” with colleagues,

but it isn’t a new idea.

Perhaps the best way to move the Web forward at this point would be to re-read (or read) some of the early web conference proceedings.

Rediscover what the web was like before being Google-driven was an accurate description of the web.

Other suggestions?

Things That Are Clear In Hindsight

August 1st, 2015

Sean Gallagher recently tweeted:

Oh look, the Triumphalism Trilogy is now a boxed set.


In case you are unfamiliar with the series, The Tipping Point, Blink, Outliers.

Although entertaining reads, particularly The Tipping Point (IMHO), Gladwell does not describe how to recognize a tipping point in advance of it being a tipping point, nor how to make good decisions without thinking (Blink) or how to recognize human potential before success (Outliers).

Tipping points, good decisions and human potential can be recognized only when they are manifested.

As you can tell from Gladwell’s book sales, selling the hope of knowing the unknowable, remains a viable market.

Robotic Article Tagging (OpenOffice Idea)

July 31st, 2015

The New York Times built a robot to help make article tagging easier by Justin Ellis.

From the post:

If you write online, you know that a final, tedious part of the process is adding tags to your story before sending it out to the wider world.

Tags and keywords in articles help readers dig deeper into related stories and topics, and give search audiences another way to discover stories. A Nieman Lab reader could go down a rabbit hole of tags, finding all our stories mentioning Snapchat, Nick Denton, or Mystery Science Theater 3000.

Those tags can also help newsrooms create new products and find inventive ways of collecting content. That’s one reason The New York Times Research and Development lab is experimenting with a new tool that automates the tagging process using machine learning — and does it in real time.

The Times R&D Editor tool analyzes text as it’s written and suggests tags along the way, in much the way that spell-check tools highlight misspelled words:

Great post but why not take the “…in much the way that spell-check tools highlight misspelled words” just a step further?

Apache OpenOffice already has spell-checking, so why not improve it to have automatic tagging?

You may or may not know that Open Document Format (ODF) 1.2 was just published as an ISO standard!

Which is the format used by Apache OpenOffice.

Open Document Format (ODF) 1.2 supports RDFa for inline metadata.

Now, imagine for a moment using standard office suite software (Apache OpenOffice) to choose a metadata dictionary and have your content automatically tagged as you type or to insert a document and tags are automatically inserted into the text.

Does that sound like a killer application for your corner of the woods?

A universal dictionary of RDFa tags might be a real memory hog but how many different tags would you need day to day? That’s even an empirical question that could be answered by indexing your documents for the past six (6) months.

With very little effort on the part of users, you can transform your documents from unstructured text to tagged (and proofed) text.

Assemble at the Apache OpenOffice (or LibreOffice) projects if an easy-to-use, easy-to-modify tagging system for office suite software appeals to you.

For other software projects supporting ODF, see: OpenDocument software.

PS: Work is current underway at the ODF TC (OASIS) on robust change tracking support. All we are missing is you.

Windows 10: Steady as you go

July 31st, 2015

Windows 10: You might be wise to wait before upgrading by Graham Cluley.

If Windows 10 isn’t your first Windows rodeo, you know the reasons for Graham’s advice on waiting a while to upgrade to Windows 10.

For example, Microsoft delivers a massive Windows 10 patch to fix early bugs by Jamie Hinks.

Doesn’t hurt to let someone else debug the early version. ;-)

Who Is Tipping Scales to Cyber Attackers?

July 30th, 2015

You don’t have to read very far into Scott Gainey’s The Economics of Cybersecurity – Are Scales Tipped to the Attacker? to get the impression that Scott accepts cyberinsecurity as a default state of affairs.

From the post:

An argument can certainly be made that the economics of cybersecurity largely favor the attacker. While the takedown of Darkode was a win for the good guys, at least temporarily, the unfortunate reality is there remains a multitude of other underground forums where criminals can gain easy access to the tools and technical support needed to organize and execute an attack. A simple search can get you quick access to virtually any tool needed for the job. Our role as executives and security professionals is to make sure these adversaries roaming these virtual havens of nastiness have to spend an inordinate amount of resources to try and achieve their objectives.

Many organizations are working to tip the scales back in their favor through a more integrated approach to security that not only includes increased spending and coordination across technology use and deployment; but also are looking at how they can improve overall efficacy through improved people training and policy management. These changes obviously come at a cost.

Many organizations are asking the natural question – how much do I really need to spend on security in order to tip the scales in my favor? In order to answer that question you must first quantify the impact and risk of a cyber attack.

The current economics of software and hardware creation shift the burden of security defects to the end user. That’s why the questions posed in Scott’s post are by users trying to tip the security scales back into their favor.

That starts the discussion in the wrong place. Users to address security issues with more software produced by the same processes that put them at risk? Can you see any reason or that to not fill me with confidence?

Moreover, fixing cybersecurity issues with software, at the source of its creation, places the cost of that fix on the person/s best able to make the repair. Which in turn saves thousands of other users the cost of defending against that particular cyberrisk.

In the short run, we will all have to battle cyberinsecurity but let’s also take names and assign responsibility for the defects that we do encounter.

The Islamic State/Social Media – US Military/Popular Entertainment

July 29th, 2015

For all of the government sponsored hysteria over the use of social media by the Islamic State, there has been no, repeat no hard evidence of its being “successful.”

Or at least not by any rational definition of “successful.” OK, so one or two impressionable teens in the UK attempt to join the Islamic State. How is that level of threat even marketable?

The situation is even worse in the United States where the FBI badgers emotionally unstable people into saying they want to go help the Islamic State and then arrest them for attempting to provide material assistance. That’s a real stretch.

Why not compare the “success” of the US military in using popular entertainment to carry its message versus that of the Islamic State using social media?

Tom Secker has recently produced a treasure trove of documents on the influence of the US military on popular entertainment, especially television shows.

From his post:

In the biggest public release of documents from the DOD’s propaganda office I recently received over 1500 pages of new material. Just under 1400 pages come from the US Army’s Entertainment Liaison Office: regular activity reports covering January 2010 to April 2015. Another over 100 pages of reports come from the US Air Force’s office, covering 2013.

The request I filed asked for all such reports since the last release covering 2005-2006 (Army documents here, Air Force documents here) but due a variety of excuses the release was limited to these 1500 pages. The Air Force said that no documents were available from 2015 ‘due to ongoing computer outages in the Los Angeles office’. If you believe that then you’ll believe anything. While the Army documents are fully digitised and easily searchable, the Air Force ones are mid-resolution scans of printed out emails/online network files.

Meanwhile, the documents between 2006-2010 (Army) and 2006-2013 (Air Force) appear to have been destroyed in keeping with the file retention policy. We already knew that the DHS only retains documents from their entertainment office for six years before shredding them like a CREEP finance report. For the military it appears to be even less than that, though given the absurd excuse offered by the Air Force it is possible they have a lot more records than they are admitting to having.

Nonetheless, this is the largest and most up-to-date release of documents from the world of Entertainment Liaison Offices. It substantially increases our knowledge of the scale and type of involvement the US military has in popular entertainment, particularly TV shows. However, details of changes to films and TV shows requested by the DOD in exchange for their co-operation are conspicuously absent (no surprises there).

Given the disparity in the size and scope of the United States versus Islamic State media campaigns, it appears the United States cannot tolerate any challenge to its world-view.

As a lifelong US citizen, I don’t find that surprising (disappointing but not surprising) at all.

PS: Be sure to check out the documents that Tom has obtained!

Unix™ for Poets

July 29th, 2015

Unix™ for Poets by Kenneth Ward Church.

A very delightful take on using basic Unix tools for text processing.

Exercises cover:

1. Count words in a text

2. Sort a list of words in various ways

  • ascii order
  • dictionary order
  • ‘‘rhyming’’ order

3. Extract useful info from a dictionary

4. Compute ngram statistics

5. Make a Concordance

Fifty-three (53) pages of pure Unix joy!


Text Processing in R

July 29th, 2015

Text Processing in R by Matthew James Denny.

From the webpage:

This tutorial goes over some basic concepts and commands for text processing in R. R is not the only way to process text, nor is it really the best way. Python is the de-facto programming language for processing text, with a lot of builtin functionality that makes it easy to use, and pretty fast, as well as a number of very mature and full featured packages such as NLTK and textblob. Basic shell scripting can also be many orders of magnitude faster for processing extremely large text corpora — for a classic reference see Unix for Poets. Yet there are good reasons to want to use R for text processing, namely that we can do it, and that we can fit it in with the rest of our analyses. I primarily make use of the stringr package for the following tutorial, so you will want to install it:

Perhaps not the best tool for text processing but if you are inside R and have text processing needs, this will get you started.

The Declining Half-Life of Secrets

July 29th, 2015

The Declining Half-Life of Secrets And the Future of Signals Intelligence by Peter Swire.

Peter Swire writes:

The nature of secrets is changing. The “half-life of secrets” is declining sharply for many signals intelligence and other intelligence activities as secrets that may have been kept successfully for 25 years or more are exposed well before.

For evidence, one need look no further than the 2015 breach at the Office of Personnel Management (OPM), of personnel records for 22 million U.S. government employees and family members. For spy agencies, theft of the security clearance records is uniquely painful – whoever gains access to the breached files will have an unparalleled ability to profile individuals in the intelligence community and subject them to identity theft.

OPM is just one instance in a long string of high-profile breaches, where hackers gain access to personal information, trade secrets, or classified government material. The focus of the discussion here, though, is on complementary trends in information technology, including the continuing effects of Moore’s Law, the sociology of the information technology community, and changed sources and methods for signals intelligence. This article is about those risks of discovery and how the intelligence community must respond.

My views on this subject were formed during my experience as one of five members of President Obama’s Review Group on Intelligence and Communications Technology in 2013. There is a crucial difference between learning about a wiretap on the German Chancellor from three decades ago and learning that a wiretap has targeted the Current German Chancellor, Angela
Merkel, while she is still in office and able to object effectively. In government circles, this alertness to negative consequences is sometimes called “the front-page test,” which describes how our actions will look if they appear on the front page of the newspaper. The front-page test becomes far more important to decision-makers when secrets become known sooner. Even if the secret operation is initially successful, the expected costs of disclosure become higher as the
average time to disclosure decreases.

Peter generously attributes secrecy in the intelligence community to fear of “mosaic theory,” that is that an opponent may be gathering any and all information in an effort to indirectly discover what it cannot discover directly.

While application of “mosaic theory” to an intelligence agency isn’t impossible, revelations from the Pentagon Papers to date have shown criminal misconduct, concealing incompetence, career protection, and a host of other unsavory motives are at least as likely as application of “mosaic theory.”

The intelligence community should recognize secrecy for the sake of concealing criminal misconduct, incompetence, career protection, etc., weakens its claim for protection of legitimate secrets.

Only the intelligence community can clean its own house. The alternative is random disclosure of secrets of varying importance.

This is the first paper of the Cybersecurity Intitiative.

From their about page:

There is perhaps no issue that has grown more important, more rapidly, on so many different levels, than cybersecurity. It affects personal privacy, business prosperity and the wider economy, as well as national security and international relations. It is a field that matters for everything from human rights and corporate profits to fundamental issues of war and peace. And with the rapid growth in both the number of people and devices coming online across the globe, the security of information systems is only going to grow in importance. Yet, while ever more amounts are spent each year, our collective understanding of the problem remains immature and both public policy and private sector efforts have failed to match the scale or complexity of this challenge for us all. The Internet has connected us, but the policies and debates that surround the security of our networks are too often disconnected, disjoint, and stuck in an unsuccessful status quo.

This is what New America’s Cybersecurity Initiative is designed to address. We believe that it takes a wider network to face the network of diverse security issues. Success in this endeavor will require collaboration – across organizations, issue areas, professional fields and business sectors, as well as local, state, and international borders. By highlighting bold new ideas, bringing in new voices with fresh perspectives, breaking down issue and organizational barriers while building up a new field of study, encouraging new research approaches to the next generation of cybersecurity issues, connecting and creating new constituencies, and providing vibrant media and policy platforms to support that creativity, we can aid in pushing forward the cyber policy needed right now and better set us up for success tomorrow.

Prior attempts at cybersecurity have failed. (full stop) Can you guess what the outcome will be from repeating old ideas?

Follow the Cybersecurity Initiative if you want new ideas.

The next Web standard could be music notation

July 28th, 2015

The next Web standard could be music notation by Peter Kirn.

From the post:

The role of the music score is an important one, as a lingua franca – it puts musical information in a format a lot of people can read. And it does that by adhering to standards.

Now with computers, phones, and tablets all over the planet, can music notation adapt?

A new group is working on bringing digital notation as a standard to the Web. The World Wide Web Consortium (W3C) – yes, the folks who bring you other Web standards – formed what they’re describing as a “community group” to work on notation.

That doesn’t mean your next Chrome build will give you lead sheets. W3C are hosting, not endorsing the project – not yet. And there’s a lot of work to be done. But many of the necessary players are onboard, which could mean some musically useful progress.

The news arrived in my inbox by way of Hamburg-based Steinberg. That’s no surprise; we knew back in 2013 that the core team behind Sibelius had arrived at Steinberg after a reorganization at Avid pushed them out of the company they original started.

The other big player in the group is MakeMusic, developers of Finale. And they’re not mincing words: they’re transferring the ownership of the MusicXML interchange format to the new, open group:
MakeMusic Transfers MusicXML Development to W3C []

The next step: make notation work on the Web. Sibelius were, while not the first to put notation on the Web, the first to popularize online sharing as a headline feature in a mainstream notation tool. Sibelius even had a platform for sharing and selling scores, complete with music playback. But that was dependent on a proprietary plug-in – now, the browser is finally catching up, and we can do all of the things Scorch does right in browser.

So, it’s time for an open standard. And the basic foundation already exists. The new W3C Music Notation Community Group promises to “maintain and update” two existing standards – MusicXML and the awkwardly-acronym’ed SMuFL (Standard Music Font Layout). Smuffle sounds like a Muppet from Sesame Street, but okay.

For the W3C group:

Music notation has a long history across cultures. It will be interesting to see what subset of music notation is captured by this effort.

Moving FASTR in the US Senate

July 28th, 2015

Moving FASTR in the US Senate by Peter Suber.

From the post:

FASTR will go to markup tomorrow in the Senate Homeland Security and Governmental Affairs Committee (HSGAC).

Here’s a recap of my recent call-to-action post on FASTR, with some new details and background.

FASTR is the strongest bill ever introduced in Congress requiring open access to federally-funded research.

We already have the 2008 NIH policy, but it only covers one agency. We already have the 2013 Obama directive requiring about two dozen federal agencies to adopt OA mandates, but the next President could rescind it.

FASTR would subsume and extend the NIH policy. FASTR would solidify the Obama directive by grounding these agency policies in legislation. Moreover, FASTR would strengthen the NIH policy and Obama directive by requiring reuse rights or open licensing. It has bipartisan support in both the House and the Senate.

FASTR has been introduced in two sessions of Congress (February 2013 and March 2015), and its predecessor, FRPAA (Federal Research Public Access Act), was introduced in three (May 2006, April 2010, February 2012). Neither FASTR nor FRPAA has gotten to the stage of markup and a committee vote. That’s why tomorrow’s markup is so big.

For the reasons why FASTR is stronger than the Obama directive, see my 2013 article comparing the two.

For steps you can take to support FASTR, see the action pages from the Electronic Frontier Foundation (EFF) and Scholarly Publishing and Academic Resources Coalition (SPARC).

Even though I will be in a day long teleconference tomorrow, I will be contacting my Senators to support FASTR.

How about you?

IoT Pinger (Wandora)

July 28th, 2015

IoT Pinger (Wandora)

From the webpage:

This is an upcoming feature and is not included yet in the public release.

The IoT (Internet of Things) pinger is a general purpose API consumer intended to aggregate data from several different sources providing data via HTTP. The IoT Panel is found in the Wandora menu bar and presents most of the pinger’s configuration options. The Pinger searches the current Topic Map for topics with an occurrence with Source Occurrence Type. Those topics are expected to correspond to an API endpoint defined by corresponding occurrence data. The pinger queries each endpoint every specified time interval and saves the response as an occurrence with Target Occurrence Type. The pinger process can be configured to stop at a set time using the Expires toggle. Save on tick saves the current Topic Map in the specified folder after each tick of the pinger in the form iot_yyyy_mm_dd_hh_mm_ss.jtm.

Now there’s an interesting idea!

Looking forward to the next release!

Big Data to Knowledge (Biomedical)

July 28th, 2015

Big Data to Knowledge (BD2K) Development of Software Tools and Methods for Biomedical Big Data in Targeted Areas of High Need (U01).


Open Date (Earliest Submission Date) September 6, 2015

Letter of Intent Due Date(s) September 6, 2015

Application Due Date(s) October 6, 2015,

Scientific Merit Review February 2016

Advisory Council Review May 2016

Earliest Start Date July 2016

From the webpage:

The purpose of this BD2K Funding Opportunity Announcement (FOA) is to solicit development of software tools and methods in the three topic areas of Data Privacy, Data Repurposing, and Applying Metadata, all as part of the overall BD2K initiative. While this FOA is intended to foster new development, submissions consisting of significant adaptations of existing methods and software are also invited.

The instructions say to submit early so that corrections to your application can be suggested. (Take the advice.)

Topic maps, particularly with customized subject identity rules, are a nice fit to the detailed requirements you will find at the grant site.

Ping me if you are interested in discussing why you should include topic maps in your application.

Android Phones: Precursor to the Internet of Things (IoT)

July 28th, 2015

Graham Cluley does a great job in: Gaping hole in Android lets hackers break in with just your phone number! summarizing how easily your Android phone can be breached. Requirement? Knowing your phone number.

There is a lot of talk about the need for security for the Internet of Things (IoT) but talk isn’t going to keep you secure on the IoT.

Think about it for a moment, the same type of folks that brought you Three Mile Island, exploding Ford Pintos, Chernobyl, Bhopal, Fukushima Daiichi, and the myriad product recalls that litter the daily news, they are going to provide for your security on the IoT?

The time has come and past for liability for software/hardware that allows breaches of security. That is the one remedy that has not been imposed on the computer industry to force the production of more secure code.

If you want your car, SUV, motorcycle, computer, TV, freezer, refrigerator, etc. to be as easy to breach as your Android Phone, say nothing.

If you want some minimal amount of privacy/security in the IoT, call for liability for software/hardware security holes now!

Free Access to Law = ‘Terrorism’

July 27th, 2015

Georgia claims that publishing its state laws for free online is ‘terrorism’ by Michael Hiltzik.

There isn’t much in the way of government stupidity that surprises me but this caught my eye.

The gist of the case is that the State of Georgia is suing Carl Malamud for making the text of Georgia laws, annotated at the State’s expense, available free to the public.

If you want to fight this type of government idiocy, donate to Public.Resource.Org.

Learning Data Science Using Functional Python

July 26th, 2015

Learning Data Science Using Functional Python by Joel Grus.

Something fun to start the week off!

Apologies for the “lite” posting of late. I am munging some small but very ugly data for a report this coming week. The data sources range from spreadsheets to forms delivered in PDF, in no particular order and some without the original numbering. What fun!

Complaints about updating URLs that were redirects were meet with replies that “private redirects” weren’t of interest and they would continue to use the original URLs. Something tells me the responsible parties didn’t quite get what URL redirects are about.

Another day or so and I will be back at full force with more background on the Balisage presentation and more useful posts every day.

Black Hat USA 2015

July 25th, 2015

I know you already have registered, etc. but that is one presentation I hope you catch and blog about:


TrackingPoint is an Austin startup known for making precision-guided firearms. These firearms ship with a tightly integrated system coupling a rifle, an ARM-powered scope running a modified version of Linux, and a linked trigger mechanism. The scope can follow targets, calculate ballistics and drastically increase its user’s first shot accuracy. The scope can also record video and audio, as well as stream video to other devices using its own wireless network and mobile applications.

In this talk, we will demonstrate how the TrackingPoint long range tactical rifle works. We will discuss how we reverse engineered the scope, the firmware, and three of TrackingPoint’s mobile applications. We will discuss different use cases and attack surfaces. We will also discuss the security and privacy implications of network-connected firearms.

TrackingPoint should get security points for not basing their product on Windows XP.


Lessons learned on Linux-based systems should be applicable to weaker operating systems as well.

Enjoy the conference!

Sora high performance software radio is now open source

July 25th, 2015

Sora high performance software radio is now open source by Jane Ma.

From the post:

Microsoft researchers today announced that their high-performance software radio project is now open sourced through GitHub. The goal for Microsoft Research Software Radio (Sora) is to develop the most advanced software radio possible, capable of implementing the latest wireless communication technology easily and efficiently.

"We believe that a fully open source Sora will better support the research community on more scientific innovation," said Kun Tan, a senior research on the software radio project team.

Conventionally, the critical lower layer processing in wireless communication systems, i.e., the physical layer (PHY) and medium access control (MAC), are typically implemented in hardware (ASIC chips), due to high-computational and real-time requirements. However, designing ASIC is very costly and inflexible since ASIC chips are fixed. Once delivered, it cannot be changed or upgraded. The lack of flexibility and programmability makes experimental research in wireless communication very difficult. Software Radio (or SDR), on the contrary, proposes implementing all these low-level PHY and MAC processes through software, which is practical for development, debugging and updating. The challenge, however, is how the software can stay up to date with hardware in terms of performance.

See also: Microsoft's Wireless and Networking research group

Sora was developed to solve this significant challenge. Sora is a fully programmable high-performance software radio that is capable of implementing state-of-the-art wireless technologies (Wi-Fi, LTE, MIMO, etc.). Sora is based on software running on a low-cost, commodity multi-core PC with a general purpose OS, i.e., Windows. A multi-core PC, plugged in to a PCIe radio control board, connecting to a third-party radio front-end with antenna, becomes a powerful software radio platform. The PC interface board transfers the raw wireless (I/Q) signals between the RF front-end and the PC memory through fast DMA. All signals are processed in the software running in the PC.

An avalanche of wireless signals will accompanying the Internet of Things (IoT). Intercepting all of them with custom hardware would be prohibitively expensive.

Thanks to Microsoft, you can skip the custom hardware step.

Remember: The question is who is listening?, not if?.

Programming Languages Used for Music

July 24th, 2015

Programming Languages Used for Music by Tim Thompson.

From the history page:

The PLUM (Programming Languages Used for Music) list is maintained as a service for those who, like me, are interested in programming languages that are used for a musical purpose. The initial content was based on a list that Carter Scholz compiled and posted to netnews in 1991.

There are a wide variety of languages used for music. Some are conventional programming languages that have been enhanced with libraries and environments for use in musical applications. Others are specialized languages developed explicitly for music.

The focus of entries in this list is on the languages, and not on particular applications. In other words, a musical application written in a particular programming language does not immediately qualify for inclusion in the list, unless the application is specifically intended to enhance the use of that language for musical development by other programmers.

Special thanks go to people who have provided significant comments and information for this list: Bill Schottstaedt, Dave Phillips, and Carter Scholz.

Corrections to existing entries and suggestions for improving the list should be mailed to the PLUM maintainer:

Tim Thompson
Home Page:

If you are experimenting with Clojure and music, these prior efforts may be inspirational.


How to Spot an Extremist

July 24th, 2015


Cameron unveils plan to tackle extremism in UK

How you look is a matter of heredity so I probably should not say that David Cameron “looks like” an extremist. (Even if he does.)

Why resort to appearances when Cameron easily convinces any rational listener that he is an extremist, both in print and on radio?

Consider what Cameron says awaits young people who join the Islamic State:

“If you are a boy, they will brainwash you, strap bombs to your body and blow you up. If you are a girl, they will enslave and abuse you. That is the sick and brutal reality of ISIL.

What Cameron does not say is that if you stay in the UK:

If you are a boy, they will brainwash you and have you kill innocent women and children with highly sophisticated drones. You will inflict harm on people who wish your country would leave them alone. Your efforts will support Cameron and his trainers playing out the Game of Thrones in the Middle East.

If you are a girl, you will be enslaved by a consumer culture built on making you feel insecure and frightened, you will be sexually harassed both in school and at work. You will eventually appreciate being second-class citizen in a vassal state with poor cooking.

That is the sick and brutal reality of the UK.

No, I’m still not a supporter of the Islamic State but I do see David Cameron as a deluded extremist.

Left to their own devices, the people of the Middle East are capable of choosing or deposing whatever governments they want. Something that the West is unwilling to allow to happen. Do you wonder why?

Exploring the Enron Spreadsheet/Email Archive

July 23rd, 2015

I forgot to say yesterday that if you cite the work of Felienne Hermans and Emerson Murphy-Hill Enron archive, use this citation:

  author    = {Felienne Hermans and
               Emerson Murphy-Hill},
  title     = {Enron's Spreadsheets and Related Emails: A Dataset and Analysis},
  booktitle = {37th International Conference on Software Engineering, {ICSE} '15},
  note     =  {to appear}

A couple of interesting tidbits from this morning.

Non-Matching Spreadsheet Names

If you look at:

(local)/84_JUDY_TOWNSEND_000_1_1.PST/townsend-j/JTOWNSE (Non-Privileged)/Inbox/_1687004514.eml

You will find that (sender), sent an email with Tport Max Rates Calculations 10-27-01.xls attached, to fletcjv@NU.COM and cc:ed “Concannon” and “Townsend” . (Potential subjects in bold.)

I selected this completely at random, save for finding an email that using the word “spreadsheet.”

If you look in the spreadsheet archive, you will not find “Tport Max Rates Calculations 10-27-01.xls,” at least not by that name. You will find: “judy_townsend__17745__Tport Max Rates Calculations 10-27-01.xlsx.”

I don’t know when that conversion took place but thought it was worth noting. BTW, the spreadsheet archive has 15,871 .xlsx files and 58 .xls files. Michelle Lokay has thirty-two of the fifty-eight (58) .xls files but they all appear to be duplicated by files with the .xlsx extension.

Given the small number, I suspect an anomaly in a bulk conversion process. When I do group operations on the spreadsheets I will be using the .xlsx extension only to avoid duplicates.

Dirty, Very Dirty Data

I was just randomly opening spreadsheets when I encountered this jewel:


Using rows to format column headers. There are worse examples, try:


No columns headers at all! (On this tab.)

I am beginning to suspect that the conversion to .xslx format was to enable the use of better tooling to explore the originally .xls files.

Be sure to register for Balisage 2015 if you want to see the outcome of all this running around!

Tomorrow I think we are going to have a conversation about indexing email with Solr. Having all 15K spreadsheets doesn’t tell me which ones were spoken of the most often in email.

Enron, Spreadsheets and 7z

July 22nd, 2015

Sam Hunting and I are working on a presentation for Balisage that involves a subset of the Enron dataset focused on spreadsheets.

You will have to attend Balisage to see the floor show but I will be posting notes about our preparations for the demo under the category Enron and/or Spreadsheets.

Origin of the Enron dataset on Spreadsheets

First things first, the subset of the Enron dataset focused on spreadsheets was announced by Felienne Hermans in A modern day Pompeii: Spreadsheets at Enron.

The data set: Hermans, Felienne (2014): Enron Spreadsheets and Emails. figshare.

Feilienne has numerous presentations and publications on spreadsheets and issues with spreadsheets.

I have always thought of spreadsheets as duller versions of tables.

Felienne, on the other hand, has found intrigue, fraud, error, misunderstanding, opacity, and the usual chicanery of modern business practice.

Whether you want to “understand” a spreadsheet depends on whether you need plausible deniability or if you are trying to detect efforts at plausible deniability. Auditors for example.

Felienne’s Enron spreadsheet data set is a great starting point for investigating spreadsheets and their issues.

Unpacking the Archives with 7z

The email archive comes in thirteen separate files, eml.7z.001 – eml.7z.013.

At first I tried to use 7z to assemble the archive, decompress it and grep the results without writing it out. No go.

On a subsequent attempt, just unpacking the multi-part file, a message appeared announcing a name conflict and asking what to do with the conflict.

IMPORTANT POINT: Thinking I don’t want to lose any data, I foolishly said to rename files to avoid naming conflicts.

You are probably laughing at this point because you can see where this is going.

The command I used to first extract the files reads: 7z e eml.7z.001 (remembering that in the case of name conflicts I said to rename the conflicting file).

But if you use 7z e, all the files are written to a single directory. Which of course means for every single file write, it has to check for conflicting file names. Opps!

After more than twenty-four (24) hours of ever slowing output (# of files was at 528,000, approximately), I killed the process and took another path.

I used 7z x eml.7z001 (correct command), which restores all of the original directories and therefore there are no file name conflicts. File writing I/O jumped up to 20MB/sec+, etc.

Still took seventy-eight (78) minutes to extract but there were other heavy processes going on at the same time.

Like deleting the 528K+ files in the original unpacked directory. Did you know that rm has an argument limit? I’m sure you won’t encounter it often but it can be a real pain when you do. I was deleting all the now unwanted files from the first run when I encountered it.

A shell limitation according to: Argument List Too Long. A 128K limit to give you an idea of the number of files you need to encounter before hitting this issue.

The Lesson

Unpack the Enron email archive with: 7z x eml.7z.001.

Tomorrow I will be posting about using Unix shell tools to explore the email data.

PS: Register for Balisage today!

Road Rage with Flair!

July 22nd, 2015

Zero-day in Fiat Chrysler feature allows remote control of vehicles by Robert Abel.

From the post:

Fiat Chrysler owners should update their vehicles’ software after a pair of security researchers were able to exploit a zero-day vulnerability to remotely control the vehicle’s the engine, transmission, wheels and brakes among other systems.

Chris Valasek, director of vehicle security at IOActive, and security researcher Charlie Miller, a member of the company’s advisory board, said the vulnerability was found in late 2013 to 2015 models that have the Uconnect feature, according to Wired.

Anyone who knows who knows the car’s IP address may gain access to a vulnerable vehicle through its cellular connection. Attackers can then target a chip in the vehicle’s entertainment hardware unit to rewrite its firmware to send commands to internal computer networks controlling physical components.

If that sounds bad, you really need to read the Wired article Hackers Remotely Kill a Jeep on the Highway—With Me in It by Andy Greenberg.

Here’s a paragraph from the Wired article to get you hooked:

Though I hadn’t touched the dashboard, the vents in the Jeep Cherokee started blasting cold air at the maximum setting, chilling the sweat on my back through the in-seat climate control system. Next the radio switched to the local hip hop station and began blaring Skee-lo at full volume. I spun the control knob left and hit the power button, to no avail. Then the windshield wipers turned on, and wiper fluid blurred the glass.

You won’t be disappointed because the hack continues onto the transmission, brakes, steering (not perfected, yet) and other systems.

Hard to say when this will appear as routine download with a nice GUI. Perhaps with automatic display of prospective targets within visual range.

The upside is a resurgence of interest in classic cars.

Your security status will be reflected in the lack of remotely controllable devices.

For the truly security conscious, secretaries may replace voice dictation systems on vulnerable networks.

Ibis on Impala: Python at Scale for Data Science

July 21st, 2015

Ibis on Impala: Python at Scale for Data Science by Marcel Kornacker and Wes McKinney.

From the post:

Ibis: Same Great Python Ecosystem at Hadoop Scale

Co-founded by the respective architects of the Python pandas toolkit and Impala and now incubating in Cloudera Labs, Ibis is a new data analysis framework with the goal of enabling advanced data analysis on a 100% Python stack with full-fidelity data. With Ibis, for the first time, developers and data scientists will be able to utilize the last 15 years of advances in high-performance Python tools and infrastructure in a Hadoop-scale environment—without compromising user experience for performance. It’s exactly the same Python you know and love, only at scale!

In this initial (unsupported) Cloudera Labs release, Ibis offers comprehensive support for the analytical capabilities presently provided by Impala, enabling Python users to run Big Data workloads in a manner similar to that of “small data” tools like pandas. Next, we’ll extend Impala and Ibis in several ways to make the Python ecosystem a seamless part of the stack:

  • First, Ibis will enable more natural data modeling by leveraging Impala’s upcoming support for nested types (expected by end of 2015).
  • Second, we’ll add support for Python user-defined logic so that Ibis will integrate with the existing Python data ecosystem—enabling custom Python functions at scale.
  • Finally, we’ll accelerate performance further through low-level integrations between Ibis and Impala with a new Python-friendly, in-memory columnar format and Python-to-LLVM code generation. These updates will accelerate Python to run at native hardware speed.

See: Getting Started with Ibis and How to Contribute (same authors, opposite order) in order to cut to the chase and get started.