## Archive for the ‘Open Source’ Category

### Unpatched Windows Vulnerability – Cost of Closed Source Software

Friday, September 8th, 2017

From the post:

Malware developers can abuse a programming error in the Windows kernel to prevent security software from identifying if, and when, malicious modules have been loaded at runtime.

Continue on with Cimpanu for a good overview or catch Windows’ PsSetLoadImageNotifyRoutine Callbacks: the Good, the Bad and the Unclear (Part 1).

Symantec says proactive security includes:

• Inventory of Authorized and Unauthorized Devices
• Inventory of Authorized and Unauthorized Software
• Secure Configurations for Hardware & Software
• Constant Vulnerability Assessment and Remediation
• Malware Defense

But since Windows is closed source software, you can’t remedy the vulnerability. Whatever your cyberdefenses, closed source MS Windows leaves you vulnerable.

Eternal (possibly) vulnerability – the cost of closed source software.

It’s hard to think of a better argument for open source software.

Open source software need not be free, just open source so you can fix it if broken.

PS: Open source enables detection of government malware.

### Continue Flash? As what? Example of insecure coding?

Wednesday, August 2nd, 2017

From the post:

Last week we heard the good news that Adobe is officially killing Flash in 2020.

This news was well received by developers and end users alike. Well, at least most people liked the demise of Adobe Flash. But it seems that Adobe Flash has still some fans left.

A group of developers at GitHub have come up with a petition to “save Adobe Flash”. Just a few days after the announcement by Adobe, Juha Linstedt, a web developer with username “Pakastin” on GitHub started a petition calling on Adobe to allow for open source Flash, which he thinks is part of Internet history.

Losing Flash altogether will impair access to resources developed using Flash but even as open source, preserving Flash strikes me as the equivalent of preserving small pox for later study.

If Adobe does open source the necessary components, it could have value as examples of how not to code an application. Or for testing of code auditing tools.

### Kaspersky: Is Source Code Disclosure Meaningful?

Thursday, July 6th, 2017

Responding to a proposed ban of Kaspersky Labs software, Eugene Kaspersky, chief executive of Kaspersky, is quoted in Russia’s Kaspersky Lab offers up source code for US government scrutiny, as saying:

The chief executive of Russia’s Kaspersky Lab says he’s ready to have his company’s source code examined by U.S. government officials to help dispel long-lingering suspicions about his company’s ties to the Kremlin.

In an interview with The Associated Press at his Moscow headquarters, Eugene Kaspersky said Saturday that he’s also ready to move part of his research work to the U.S. to help counter rumors that he said were first started more than two decades ago out of professional jealousy.

“If the United States needs, we can disclose the source code,” he said, adding that he was ready to testify before U.S. lawmakers as well. “Anything I can do to prove that we don’t behave maliciously I will do it.”

Personally I think Kaspersky is about to be victimized by anti-Russia hysteria, where repetition of rumors, not facts, are the coin of the realm.

Is source code disclosure is meaningful? A question applicable to Kasperky disclosures to U.S. government officials, or Microsoft or Oracle disclosures of source code to foreign governments.

My answer is no, at least if you mean source code disclosure limited to governments or other clients.

Here’s why:

• Limited competence: For the FBI in particular, source code disclosure is meaningless. Recall the FBI blew away $170 million in the Virtual Case File project with nothing to show and no prospect of a timeline, after four years of effort. • Limited resources: Guido Vranken‘s The OpenVPN post-audit bug bonanza demonstrates that after two (2) manual audits, vulnerabilities remain to be found in OpenVPN. Unlike OpenVPN, any source code given to a government will be reviewed at most once and then only by a limited number of individuals. Contrast that with OpenVPN, which has been reviewed for years by a large number of people and yets flaws remain to be discovered. • Limited staff: Closely related to my point about limited resources, the people in government who are competent to undertake a software review are already busy with other tasks. Most governments don’t have a corps of idle but competent programmers waiting for source code disclosures to evaluate. Whatever source code review takes place, it will be the minimum required and that only as other priorities allow. If Kaspersky Labs were to open source but retain copyright on their software, then their source code could be reviewed by: • As many competent programmers as are interested • On an ongoing basis • By people with varying skills and approaches to software auditing Setting a new standard, that is open source but copyrighted for security software, would be to the advantage of leaders in Gartner’s Magic Quadrant, others, not so much. It’s entirely possible for someone to compile source code and avoid paying a license fee but seriously, is anyone going to pursue pennies on the ground when there are$100 bills blowing overhead? Auditing, code review, transparency, trust. (I know, the RIAA chases pennies but it’s run by delusional paranoids.)

• Angst among its more poorly managed competitors will soar.
• Example for government mandated open source but copyright for domestic sales. (Think China, EU, Russia.)
• Front page news featuring Kaspersky Labs as breaking away from the pack.

Entirely possible for Kaspersky to take advantage of the narrow-minded nationalism now so popular in some circles of the U.S. government. Not to mention changing the landscape of security software to its advantage.

### China Draws Wrong Lesson from WannaCry Ransomware

Tuesday, May 23rd, 2017

Chinese state media says US should take some blame for cyberattack

From the post:

China’s cyber authorities have repeatedly pushed for what they call a more “equitable” balance in global cyber governance, criticizing U.S. dominance.

The China Daily pointed to the U.S. ban on Chinese telecommunication provider Huawei Technologies Co Ltd, saying the curbs were hypocritical given the NSA leak.

Beijing has previously said the proliferation of fake news on U.S. social media sites, which are largely banned in China, is a reason to tighten global cyber governance.

The newspaper said that the role of the U.S. security apparatus in the attack should “instill greater urgency” in China’s mission to replace foreign technology with its own.

The state-run People’s Daily compared the cyber attack to the terrorist hacking depicted in the U.S. film “Die Hard 4”, warning that China’s role in global trade and internet connectivity opened it to increased risks from overseas.

China is certainly correct to demand a place at the table for China and other world powers in global cyber governance.

But China is drawing the wrong lesson from the WannaCry ransomeware attacks if that is used as a motivation for closed source Chinese software to replace “foreign” technology.

NSA staffers may well be working for Microsoft and/or Oracle, embedding NSA produced code in their products. With closed source code, it isn’t possible to verify the absence of such code or to prevent its introduction.

Sadly, the same is true if closed source code is written by Chinese programmers, some of who may have agendas, domestic or foreign, of their own.

The only defense to rogue code is to invest in open source projects. Not everyone will read every line of code but being available for being read, is a deterrent to obvious subversion of an applications security.

China should have “greater urgency” to abandon closed source software, but investing in domestic closed source only replicates the mistake of investing in foreign closed source software.

Opensource projects cover every office, business and scientific need.

Chinese government support for Chinese participation in existing and new opensource projects can make these projects competitors to closed and potential spyware products.

The U.S. made the closed source mistake for critical cyber infrastructure. China should not make the same mistake.

### Getting Started in Open Source: A Primer for Data Scientists

Saturday, December 31st, 2016

From the post:

The phrase "open source” evokes an egalitarian, welcoming niche where programmers can work together towards a common purpose — creating software to be freely available to the public in a community that sees contribution as its own reward. But for data scientists who are just entering into the open source milieu, it can sometimes feel like an intimidating place. Even experienced, established open source developers like Jon Schlinkert have found the community to be less than welcoming at times. If the author of more than a thousand projects, someone whose scripts are downloaded millions of times every month, has to remind himself to stay positive, you might question whether the open source community is really the developer Shangri-la it would appear to be!

And yet, open source development does have a lot going for it:

• Users have access to both the functionality and the methodology of the software (as opposed to just the functionality, as with proprietary software).
• Contributors are also users, meaning that contributions track closely with user stories, and are intrinsically (rather than extrinsically) motivated.
• Everyone has equal access to the code, and no one is excluded from making changes (at least locally).
• Contributor identities are open to the extent that a contributor wants to take credit for her work.
• Changes to the code are documented over time.

So why start a blog post for open source noobs with a quotation from an expert like Jon, especially one that paints such a dreary picture? It's because I want to show that the bar for contributing is… pretty low.

Ask yourself these questions: Do you like programming? Enjoy collaborating? Like learning? Appreciate feedback? Do you want to help make a great open source project even better? If your answer is 'yes' to one or more of these, you're probably a good fit for open source. Not a professional programmer? Just getting started with a new programming language? Don't know everything yet? Trust me, you're in good company.

Becoming a contributor to an open source project is a great way to support your own learning, to get more deeply involved in the community, and to share your own unique thoughts and ideas with the world. In this post, we'll provide a walkthrough for data scientists who are interested in getting started in open source — including everything from version control basics to advanced GitHub etiquette.

Two of Rebecca’s points are more important than the rest:

• the bar for contributing is low
• contributing builds community and a sense of ownership

Will 2017 be the year you move from the sidelines of open source and into the game?

### Open Source Software & The Department of Defense

Monday, August 29th, 2016

Open Source Software & The Department of Defense by Ben FitzGerald, Peter L. Levin, and Jacqueline Parziale.

A great resource for sharing with Department of Defense (DoD) staff who may be in positions to influence software development, acquisition policies.

In particular you may want to point to the “myths” about security and open source software:

Discussion of open source software in national security is often dismissed out of hand because of technical security
concerns. These are unfounded.

To debunk a few myths:

• Using open source licensing does not mean that changes to the source code must be shared publicly.
• The ability to see source code is not the same as the ability to modify deployed software in production.
• Using open source components is not equivalent to creating an entire system that is itself open sourced.

As In-Q-Tel’s Chief Information Security Officer Dan Geer explains, security is “the absence of unmitigatable surprise.”23 It is particularly difficult to mitigate surprise with closed proprietary software, because the source code, and therefore the ability to identify and address its vulnerabilities, is hidden. “Security through obscurity” is not an effective defense against today’s cybersecurity threats.

In this context, open source software can generate better security outcomes than proprietary alternatives. Conventional anti-malware scanning and intrusion detection are inadequate for many reasons, including their “focus on known vulnerabilities” that miss unknown threats, such as zero-day exploits. As an example, a DARPA-funded team built a flight controller for small quadcopter drones based on an open source autopilot readily downloaded from the Internet. A red team “found no security flaws in six weeks with full access [to the] source code,” making their UAV the most secure on the planet.24

Except that “security” to a DoD contractor has little to do with software security.

No, for a DoD contractor, “security” means change orders, which trigger additional software development cycles, which are largely unauditable, software testing, changes to documentation, all of which could be negatively impacted by “…an open source autopilot.”

If open source is used, there are fewer billing opportunities and that threatens the “security” of DoD contractors.

The paper makes a great case for why the DoD should make greater use of open source software and development practices, but the DoD will have to break the strangle hold of a number of current DoD contractors to do so.

### U.S. Government Open Source Pilot – Hidden Costs? (Vulnerabilities?)

Monday, August 8th, 2016

Federal Source Code Policy: Achieving Efficiency, Transparency, and Innovation through Reusable and Open Source Software by Tony Scott and Anne E. Rung.

From the post:

The U.S. Government is committed to improving the way Federal agencies buy, build, and deliver information technology (IT) and software solutions to better support cost efficiency, mission effectiveness, and the consumer experience with Government programs. Each year, the Federal Government spends more than $6 billion on software through more than 42,000 transactions.1 A significant proportion of software used by the Government is comprised of either preexisting Federal solutions or commercial solutions. These solutions include proprietary, open source, and mixed source2 code and often do not require additional custom code development. When Federal agencies are unable to identify an existing Federal or commercial software solution that satisfies their specific needs, they may choose to develop a custom software solution on their own or pay for its development. When agencies procure custom-developed source code, however, they do not necessarily make their new code (source code or code) broadly available for Federal Government-wide reuse. Even when agencies are in a position to make their source code available on a Government-wide basis, they do not make such code available to other agencies in a consistent manner. In some cases, agencies may even have difficulty establishing that the software was produced in the performance of a Federal Government contract. These challenges may result in duplicative acquisitions for substantially similar code and an inefficient use of taxpayer dollars. This policy seeks to address these challenges by ensuring that new custom-developed Federal source code be made broadly available for reuse across the Federal Government.3 This is consistent with the Digital Government Strategy’s “Shared Platform” approach, which enables Federal employees to work together—both within and across agencies—to reduce costs, streamline development, apply uniform standards, and ensure consistency in creating and delivering information.4 Enhanced reuse of custom-developed code across the Federal Government can have significant benefits for American taxpayers, including decreasing duplicative costs for the same code and reducing Federal vendor lock-in.5 This policy also establishes a pilot program that requires agencies, when commissioning new custom software, to release at least 20 percent of new custom-developed code as Open Source Software (OSS) for three years, and collect additional data concerning new custom software to inform metrics to gauge the performance of this pilot.6 (footnotes omitted) This open source pilot is a good example of government leadership. After open source has become the virtual default of private industry, the government decided to conduct a three-year pilot project to assess the concept. Not a bad idea but someone needs to ramp up to track every open source release from the federal government. Such releases need to be evaluated for the costs of new security bugs introduced into the software ecosystem and poor programming practices on software development. Otherwise, a rosy picture of reduced duplicative costs for the same code may conceal higher software costs due to widespread security vulnerabilities. Trust is ok, verification is better. ### Danger! Danger! Oracle Attorney Defends GPL Saturday, May 28th, 2016 From the header: Annette Hurst is an attorney at Orrick, Herrington & Sutcliffe who represented Oracle in the recent Oracle v. Google trial. This op-ed represents her own views and is not intended to represent those of her client or Ars Technica. The Oracle v. Google trial concluded yesterday when a jury returned a verdict in Google’s favor. The litigation began in 2010, when Oracle sued Google, saying that the use of Java APIs in Android violated copyright law. After a 2012 trial, a judge held that APIs can’t be copyrighted at all, but that ruling was overturned on appeal. In the trial this month, Google successfully argued that its use of Java APIs, about 11,500 lines of code in all, was protected by “fair use.” I won’t propogate Annette’s rant but you can read it for yourself at: http://arstechnica.com/tech-policy/2016/05/op-ed-oracle-attorney-says-googles-court-victory-might-kill-the-gpl/. What are free software supporters to make of their long time deranged, drooling critic expressing support for GPL? Should they flee as pursued by wraiths on wings? Should they stuff their cloaks in their ears? Are these like the lies of Suraman? Or perhaps better, Wormtongue? My suggestion? Point to Annette’s rant to alert others but don’t repeat it, don’t engage it, just pass over it in silence. Repeating evil counsel gives it legitimacy. Yours. ### Vulnerable 7-Zip As Poster Child For Open Source Friday, May 13th, 2016 From the post: But users be warned. Cisco Talos recently discovered multiple vulnerabilities in 7-Zip that are more serious than regular security flaws. As explained in a blog post by Marcin Noga and Jaeson Schultz, two members of the Cisco Talos Security Intelligence & Research Group: “These type of vulnerabilities are especially concerning since vendors may not be aware they are using the affected libraries. This can be of particular concern, for example, when it comes to security devices or antivirus products. 7-Zip is supported on all major platforms, and is one of the most popular archive utilities in-use today. Users may be surprised to discover just how many products and appliances are affected.” Cisco Talos has identified two flaws in particular. The first (CVE-2016-2335) is an out-of-bounds read vulnerability that exists in the way 7-Zip handles Universal Disk Format (UDF) files. An attacker could potentially exploit this vulnerability to achieve arbitrary code execution. The “many products and appliances” link results in: If you use the suggested search string: Every instance of software running a vulnerable 7-Zip library is subject to this hack. A number likely larger than the total 2,490,000 shown by these two searches. For open source software, you can check to see if it has been upgraded to 7-Zip, version 16.0. If you have non-open source software, how are you going to check for the upgrade? Given the lack of liability under the usual EULA, are you really going to take a vendor’s word for the upgrade? The vulnerable 7-Zip library is a great poster child for open source software. Not only for the discovery of flaws but to verify vendors have properly patched those flaws. ### EU Too Obvious With Wannabe A Monopoly Antics Wednesday, April 20th, 2016 If you ever had any doubts (I didn’t) that the EU is as immoral as any other government, recent moves by the EU in the area of software will cure those. EU antitrust regulators said that by requiring mobile phone manufacturers to pre-install Google Search and the Google Chrome browser to get access to other Google apps, the U.S. company was harming consumers by stifling competition. Show of hands. How many of you think the EU gives a sh*t about consumers? Yeah, that’s what I thought as well. Or as Chee quotes European Competition Commissioner Margrethe Vestager: “We believe that Google’s behavior denies consumers a wider choice of mobile apps and services and stands in the way of innovation by other players,” she said. Hmmm, “other players.” Those don’t sound like consumers, those sound like people who will be charging consumers. If you need confirmation of that reading, consider Anti-innovation: EU excludes open source from new tech standards by Glyn Moody. From the post: “Open” is generally used in the documents to denote “open standards,” as in the quotation above. But the European Commission is surprisingly coy about what exactly that phrase means in this context. It is only on the penultimate page of the ICT Standardisation Priorities document that we finally read the following key piece of information: “ICT standardisation requires a balanced IPR [intellectual property rights] policy, based on FRAND licensing terms.” It’s no surprise that the Commission was trying to keep that particular detail quiet, because FRAND licensing—the acronym stands for “fair, reasonable, and non-discriminatory”—is incompatible with open source, which will therefore find itself excluded from much of the EU’s grand new Digital Single Market strategy. That’s hardly a “balanced IPR policy.” Glyn goes on to say that FRAND licensing is the result of lobbying by American technical giants but seems unlikely. The EU has attempted to favor EU-origin “allegedly” competitive software for years. I say “allegedly” because the EU never points to competitive software in its antitrust proceedings that was excluded, only to the speculation that but for those evil American monopolists, there would be this garden of commercial and innovative European software. You bet. There is a lot of innovative European software, but it hasn’t been produced in the same mindset that afflicts officials at the EU. They are fixated on an out-dated software sales/licensing model. Consider the rising number of companies based on nothing but open source if you want a sneak peek at the market of the future. Being mired in market models from the past, the EU sees only protectionism (the Google complaint) and out-dated notions of software licensing (FRAND) as foundations for promoting a software industry in Europe. Not to mention the provincialism of the EU makes it the enemy of a growing software industry in Europe. Did you know that EU funded startups are limited to hiring EU residents? (Or so I have been told, by EU startups.) That certainly works that way with EU awards. There is nothing inconsistent with promoting open source and a vibrant EU software industry, so long as you know something about both. Knowing nothing about either has led the EU astray. ### Open Source Clojure Projects Monday, March 14th, 2016 Daniel Higginbotham of Clojure for the Brave and True, has posted this listing of open source Clojure projects with the blurb: Looking to improve your skills and work with real code? These projects are under active development and welcome new contributors. You can see the source at: https://github.com/braveclojure/open-source, where it says: Pull requests welcome! Do you know of any other open source Clojure projects that welcome new contributors? Like yours? Just by way of example, marked as “beginner friendly,” you will find: alda – A general purpose music programming language Avi – A lively vi (a spec & implementation of vim) clj-rethinkdb – An idomatic RethinkDB client for Clojure For the more sure-footed: ClojureCL – Parallel computations on the GPU with OpenCL 2.0 in Clojure Enjoy! ### 2015 Open Source Yearbook (without email conscription) Sunday, March 13th, 2016 Publication of the 2015 Open Source Yearbook is good news! Five or six “clicks” and having my email conscripted to obtain a copy, not so much. For your reading pleasure with one-click access: Impressive work, but marred by convoluted access and email conscription. If you want to make a resource “freely” available, do so. Don’t extort contact information for “free” information. I’m leading conference calls tomorrow or else I would be reading the 2015 Open Source Yearbook during my calls! ### Government Source Code Policy Thursday, March 10th, 2016 Government Source Code Policy From the webpage: The White House committed to adopting a Government-wide Open Source Software policy in its Second Open Government National Action Plan that “will support improved access to custom software code developed for the Federal Government,” emphasizing that using and contributing back to open source software can fuel innovation, lower costs, and benefit the public.[1] In support of that commitment, today the White House Office of Management and Budget (OMB) is releasing a draft policy to improve the way custom-developed Government code is acquired and distributed moving forward. This policy is consistent with the Federal Government’s long-standing policy of ensuring that “Federal investments in IT (information technology) are merit-based, improve the performance of our Government, and create value for the American people.”[2] This policy requires that, among other things: (1) new custom code whose development is paid for by the Federal Government be made available for reuse across Federal agencies; and (2) a portion of that new custom code be released to the public as Open Source Software (OSS). We welcome your input on this innovative draft policy. We are especially interested in your comments on considerations regarding the release of custom code as OSS. The draft policy proposes a pilot program requiring covered agencies to release at least 20 percent of their newly-developed custom code, in addition to the release of all custom code developed by Federal employees at covered agencies as part of their official duties, subject to certain exceptions as noted in the main body of the policy.[3] In some absolute sense this is a step forward from the present practices of the government with regard to source code that it develops or pays to have developed. On the other hand, what’s difficult about saying that all code (not 20%) developed by or at the direction of the federal government is deposited under an Apache license within 90 days of its posting to any source code repository. Subject to national security exceptions and then notice has to be given with the decision to be reviewed in the local DC federal court. Short, simple, clear time constraints and a defined venue for review. Anytime someone dodges the easy, obvious solution, there is a reason for that dodging. Not a reason or desire to benefit you. Unless you are the person orchestrating the dodge. ### Bumping into Stallman, again [Stallmanism] Tuesday, February 2nd, 2016 From the post: This is the second time I’m talking at the same conference as Richard Stallman, after the Ind.ie Tech Summit in Brighton, this time was at the Fri Software Days in Fribourg, Switzerland. One day before my presentation, I got an email from the organizers, letting me know that Stallman would like me to rename the title of my talk to remove any mentions of “Open Source Software” and replace them with “Free Software”. The email read like this: Is it feasible to remove the terms “Open-Source” from the title of your presentation and replace them by “Free-libre software”? It’s the wish of M. Stallman, that will probably attend your talk. Frederick didn’t change his title or presentation, while at the same time handling the issue much better than I would have. Well, after I got through laughing my ass off that Stallman would presume to dictate word usage to anyone. Word usage, for any stallmanists in the crowd, is an empirical question of how many people use a word with a common meaning. At least if you want to be understood by others. ### The Semasiology of Open Source [How Do You Define Source?] Wednesday, January 20th, 2016 The Semasiology of Open Source by Robert Lefkowitz (Then, VP Enterprise Systems & Architecture, AT&T Wireless) 2004. Audio file. Robert’s keynote from the Open Source Convention (OSCON) 2004 in Portland, Oregon. From the description: Semasiology, n. The science of meanings or sense development (of words); the explanation of the development and changes of the meanings of words. Source: Webster’s Revised Unabridged Dictionary, 1996, 1998 MICRA, Inc. “Open source doesn’t just mean access to the source code.” So begins the Open Source Definition. What then, does access to the source code mean? Seen through the lens of an Enterprise user, what does open source mean? When is (or isn’t) it significant? And a catalogue of open source related arbitrage opportunities. If you haven’t heard this keynote, I hadn’t, do yourself a favor and make time to listen to it. I do have one complaint: It’s not long enough. 😉 Enjoy! ### History of Apache Storm and lessons learned Thursday, December 31st, 2015 From the post: Apache Storm recently became a top-level project, marking a huge milestone for the project and for me personally. It’s crazy to think that four years ago Storm was nothing more than an idea in my head, and now it’s a thriving project with a large community used by a ton of companies. In this post I want to look back at how Storm got to this point and the lessons I learned along the way. The topics I will cover through Storm’s history naturally follow whatever key challenges I had to deal with at those points in time. The first 25% of this post is about how Storm was conceived and initially created, so the main topics covered there are the technical issues I had to figure out to enable the project to exist. The rest of the post is about releasing Storm and establishing it as a widely used project with active user and developer communities. The main topics discussed there are marketing, communication, and community development. Any successful project requires two things: 1. It solves a useful problem 2. You are able to convince a significant number of people that your project is the best solution to their problem What I think many developers fail to understand is that achieving that second condition is as hard and as interesting as building the project itself. I hope this becomes apparent as you read through Storm’s history. All projects are different but the requirements for success: 1. It solves a useful problem 2. You are able to convince a significant number of people that your project is the best solution to their problem sound universal to me! To clarify point #2, “people” means “other people.” Preaching to a mirror or choir isn’t going to lead to success. Nor will focusing on “your problem” as opposed to “their problem.” PS: New Year’s Eve advice – Don’t download large files. 😉 Slower than you want to think. Suspect people on my subnet are streaming football games and/or porno videos, perhaps both (screen within screen). I first saw this in a tweet by Bob DuCharme. ### Building Software, Building Community: Lessons from the rOpenSci Project Tuesday, November 17th, 2015 Building Software, Building Community: Lessons from the rOpenSci Project by Carl Boettiger, Scott Chamberlain, Edmund Hart, Karthik Ram. Abstract: rOpenSci is a developer collective originally formed in 2011 by graduate students and post-docs from ecology and evolutionary biology to collaborate on building software tools to facilitate a more open and synthetic approach in the face of transformative rise of large and heterogeneous data. Born on the internet (the collective only began through chance discussions over social media), we have grown into a widely recognized effort that supports an ecosystem of some 45 software packages, engages scores of collaborators, has taught dozens of workshops around the world, and has secured over$480,000 in grant support. As young scientists working in an academic context largely without direct support for our efforts, we have first hand experience with most of the the technical and social challenges WSSSPE seeks to address. In this paper we provide an experience report which describes our approach and success in building an effective and diverse community.

Given the state of world affairs, I can’t think of a better time for the publication of this article.

The key lesson that I urge you to draw from this paper is the proactive stance of the project in involving and reaching out to build a community around this project.

Too many projects (and academic organizations for that matter) take the approach that others know they exist and so they sit waiting for volunteers and members to queue up.

Very often they are surprised and bitter that the queue of volunteers and members is so sparse. If anyone dares to venture that more outreach might be helpful, the response is nearly always, sure, you go do that and let us know when it is successful.

How proactive are you in promoting your favorite project?

PS: The rOpenSci website.

### DegDB (Open Source Distributed Graph Database) [Tackling Who Pays For This Data?]

Tuesday, November 17th, 2015

The Design Doc/Ramble reads in part:

Problems With Existing Graph Databases

• Owned by private companies with no incentive to share.
• Public databases are used by few people with no incentive to contribute.
• Large databases can’t fit on one machine and are expensive to traverse.
• Existing distributed graph databases require all nodes to be trusted.

Incentivizing Hosting of Data

Every request will have either a debit (with attached bitcoin) or credit (with bitcoin promised on delivery) payment system. The server nodes will attempt to estimate how much it will cost to serve the data and if there isn’t enough bitcoin attached, will drop the request. This makes large nodes want to serve as much popular data as possible, because it allows for faster responses as well as not having to pay other nodes for their data. At the same time, little used data will cost more to access due to requiring more hops to find the data and “cold storage” servers can inflate the prices thus making it profitable for them.

Incentivizing Creation of Data

Data Creation on Demand

A system for requesting certain data to be curated can be employed. The requestor would place a bid for a certain piece of data to be curated, and after n-sources add the data to the graph and verify its correctness the money would be split between them.
This system could be ripe for abuse by having bots automatically fulfilling every request with random data.

Creators Paid on Usage

This method involves the consumers of the data keeping track of their data sources and upon usage paying them. This is a trust based model and may end up not paying creators anything.

The one “wow” factor of this project is the forethought to put the discussion of “who pays for this data?” up front and center.

We have all seen the failing model that starts with:

For only $35.00 (U.S.) you can view this article for 24 hours. That makes you feel like you are almost robbing the publisher at that price. (NOT!) Right. I’m tracking down a citation to make sure a quote or data is correct and I am going to pay$35.00 (U.S.) to have access for 24 hours. Considering that the publishers with those pricing models have already made back their costs of production and publication plus a profit from institutional subscribers (challenge them for the evidence if they deny), a very low micro-payment would be more suitable. Say $00.01 per paragraph or something on that order. Payable out of a deposit with the publisher. I would amend the Creators Paid on Usage section to have created content unlocked only upon payment (set by the creator). Over time, creators would develop reputations for the value of their data and if you choose to buy from a private seller with no online history, that’s just your bad. Imagine that for the Paris incident (hypothetical, none of the following is true), I had the school records for half of the people carrying out that attack. Not only do I have the originals but I also have them translated into English, assuming some or all of them are in some other language. I could cast that data (I’m not fond of the poverty of triples) into a graph format and make it know as part of a distributed graph system. Some of the data, such as the identities of the people for who I had records, would appear in the graphs of others as “new” data. Up to the readers of the graph to decide if the data and the conditions for seeing it are acceptable to them. Data could even carry a public price tag. That is if you want to pay a large enough sum, then the data in question will be opened up for everyone to have access to it. I don’t know of any micropayment systems that are eating at the foundations of traditional publishers now but there will be many attempts before one eviscerates them one and all. The choices we face now of “free” (read unpaid for research, writing and publication, which excludes many) versus the “pay-per-view” model that supports early 20th century models of sloth, cronyism and gate-keeping, aren’t the only ones. We need to actively seek out better and more nuanced choices. ### Microsoft open sources Distributed Machine Learning Toolkit… Friday, November 13th, 2015 From the post: Researchers at the Microsoft Asia research lab this week made the Microsoft Distributed Machine Learning Toolkit openly available to the developer community. The toolkit, available now on GitHub, is designed for distributed machine learning — using multiple computers in parallel to solve a complex problem. It contains a parameter server-based programing framework, which makes machine learning tasks on big data highly scalable, efficient and flexible. It also contains two distributed machine learning algorithms, which can be used to train the fastest and largest topic model and the largest word-embedding model in the world. The toolkit offers rich and easy-to-use APIs to reduce the barrier of distributed machine learning, so researchers and developers can focus on core machine learning tasks like data, model and training. The toolkit is unique because its features transcend system innovations by also offering machine learning advances, the researchers said. With the toolkit, the researchers said developers can tackle big-data, big-model machine learning problems much faster and with smaller clusters of computers than previously required. For example, using the toolkit one can train a topic model with one million topics and a 20-million word vocabulary, or a word-embedding model with 1000 dimensions and a 20-million word vocabulary, on a web document collection with 200 billion tokens utilizing a cluster of just 24 machines. That workload would previously have required thousands of machines. This has been a banner week for machine learning! On November 9th, Google open sourced TensorFlow. On November 12th, Single Artificial Neuron Taught to Recognize Hundreds of Patterns (why neurons have thousands of synapses) is published. On November 12th, Microsoft open sources its Distributed Machine Learning Toolkit. Not every week is like that for machine learning but it is impressive when that many major stories drop in a week! I do like the line from the Microsoft announcement: For example, using the toolkit one can train a topic model with one million topics and a 20-million word vocabulary, or a word-embedding model with 1000 dimensions and a 20-million word vocabulary, on a web document collection with 200 billion tokens utilizing a cluster of just 24 machines. (emphasis added) Prices are falling all the time and a 24 machine cluster should be within the reach of most startups if not most individuals now. Next year? Possibly within the reach of a large number of individuals. What are your machine learning plans for 2016? ### Quartz to open source two mapping tools Thursday, November 12th, 2015 From the post: News outlet Quartz is developing a searchable database of compiled map data from all over the world, and a tool to help journalists visualise this data. The database, called Mapquery, received$35,000 (£22,900) from the Knight Foundation Prototype Fund on 3 November.

Keith Collins, project lead, said Mapquery will aim to make the research stage in the creation of maps easier and more accessible, by creating a system for finding, merging and refining geographic data.

Mapquery will not be able to produce visual maps itself, as it simply provides a database of information from which maps can be created – so Quartz will also open source Mapbuilder as the “front end” that will enable journalists to visualise the data.

Quartz aims to have a prototype of Mapquery by April, and will continue to develop Mapbuilder afterwards.

That’s news to look forward to in 2016!

I’m real curious where Quartz is going to draw the boundary around “map data?” The post mentions Mapquery including “historical boundary data,” which would be very useful for some stories, but is traditional “map data.”

What if Mapquery could integrate people who have posted images with geographic locations? So a reporter could quickly access a list of potential witnesses for events the Western media doesn’t cover?

Live feeds of the results of US bombing raids against ISIS for example. (Doesn’t cover out of deference to the US military propaganda machine or for other reasons I can’t say.)

Looking forward to more news on Mapquery and Mapbuilder!

I first saw this in a tweet by Journalism Tools.

### Treasure Trove of R Scripts…

Wednesday, October 7th, 2015

From the post:

In this repository hosted at github, the datadolph.in team is sharing all of the R codebase that it developed to analyze large quantities of data.

datadolph.in team has benefited tremendously from fellow R bloggers and other open source communities and is proud to contribute all of its codebase into the community.

The codebase includes ETL and integration scripts on –

• R-Solr Integration
• R-Mongo Interaction
• R-MySQL Interaction
• Fetching, cleansing and transforming data
• Classification (identify column types)
• Default chart generation (based on simple heuristics and matching a dimension with a measure)

I count twenty-two (22) R scripts in this generous donation back to the R community!

Enjoy!

### Getting started with open source machine learning

Monday, September 14th, 2015

From the post:

Despite all the flashy headlines from Musk and Hawking on the impending doom to be visited on us mere mortals by killer robots from the skies, machine learning and artificial intelligence are here to stay. More importantly, machine learning (ML) is quickly becoming a critical skill for developers to enhance their applications and their careers, better understand data, and to help users be more effective.

What is machine learning? It is the use of both historical and current data to make predictions, organize content, and learn patterns about data without being explicitly programmed to do so. This is typically done using statistical techniques that look for significant events like co-occurrences and anomalies in the data and then factoring in their likelihood into a model that is queried at a later time to provide a prediction for some new piece of data.

Common machine learning tasks include classification (applying labels to items), clustering (grouping items automatically), and topic detection. It is also commonly used in natural language processing. Machine learning is increasingly being used in a wide variety of use cases, including content recommendation, fraud detection, image analysis and ecommerce. It is useful across many industries and most popular programming languages have at least one open source library implementing common ML techniques.

Reflecting the broader push in software towards open source, there are now many vibrant machine learning projects available to experiment with as well as a plethora of books, articles, tutorials, and videos to get you up to speed. Let’s look at a few projects leading the way in open source machine learning and a few primers on related ML terminology and techniques.

Grant rounds up a starting list of primers and projects if you need an introduction to machine learning.

Enjoy!

### Leaping the chasm from proprietary to open: …

Tuesday, September 8th, 2015

by Bryan Cantrill.

Full illumos history mentioned in talk: https://www.youtube.com/watch?v=-zRN7XLCRhc

Very high energy presentation starting with the early history of software. Great coverage of the history of Solaris.

My favorite quip:

Every thing you think about Oracle is true, is actually truer than you think it could be.

You will greatly enjoy the disclaimer story.

Natural law wrong. – “…assertion that APIs can be copyrighted!”

Opensource projects by Joyent:

SmartDataCenter: https://github.com/joyent/sdc

My take away? Despite all the amusing stories and tales, I would have to pick “use a weak copy-left license.”

### Who is in Charge of Android Security?

Wednesday, August 5th, 2015

Just the other day I posted Targeting 950 Million Android Phones – Open Source Security Checks?. Today my email had a link to: Nearly 90 percent of Android devices vulnerable to endless reboot bug by Allen Greenberg.

Allen points to: Android MediaServer Bug Traps Phones in Endless Reboots by Wish Wu, which reads in part:

We have discovered a new vulnerability that allows attackers to perform denial of service (DoS) attacks on Android’s mediaserver program. This causes a device’s system to reboot and drain all its battery life. In more a severe case, where a related malicious app is set to auto-start, the device can be trapped in an endless reboot and rendered unusable.

The vulnerability, CVE-2015-3823, affects Android versions 4.0.1 Jelly Bean to 5.1.1 Lollipop. Around 89% of the Android users (roughly 9 in 10 Android devices active as of June 2015) are affected. However, we have yet to discover active attacks in the wild that exploit this vulnerability.

This discovery comes hot on the heels of two other major vulnerabilities in Android’s media server component that surfaced last week. One can render devices silent while the other, Stagefright, can be used to install malware through a multimedia message.

Wow! Three critical security bugs in Android in a matter of weeks.

Which makes me ask the question: Who (the hell) is in Charge of Android Security?

Let’s drop the usual open source answer to complaints about the software: “…well, if you have an issue with the software you should contribute a patch…” and wise up that commercial entities are making money off the Android “open source” project.

People can and should contribute to open source projects but at the same time, commercial vendors should not foist avoidance of security bugs off onto the public.

Commercial vendors are already foisting security bugs off on the public because so far, not for very much longer, they have avoided liability for the same. They simply don’t invest in the coding practices that would avoid the security bugs that are so damaging to enterprises and individuals alike.

The same was true in the history of products liability. It is a very complex area of law that is developing rapidly and someday soon the standard EULA will fall and there will be no safety net under software vendors.

There are obvious damages from security bugs and there are vendors who could have avoided the security bugs in the first place. It is only a matter of time before courts discover that the same bugs (usually unchecked input) is causing damages over and over again and that checking input avoids the bug in the majority of cases.

Who can choose to check input or not? That’s right, the defendant with the deep pockets, the software vendor.

Who is in charge of security for your software?

PS: I mentioned the other day that the CVE database is available for download. That would be the starting point for developing a factual basis for known/avoidable bug analysis for software liability. I suspect that has been done and I am unaware of it. Suggestions?

### Targeting 950 Million Android Phones – Open Source Security Checks?

Monday, August 3rd, 2015

From the post:

Earlier this week, security researchers at Zimperium revealed a high-severity vulnerability in Android platforms that allowed a single multimedia text message to hack 950 Million Android smartphones and tablets.

As explained in our previous article, the critical flaw resides in a core Android component called “Stagefright,” a native Android media playback library used by Android to process, record and play multimedia files.

To Exploit Stagefright vulnerability, which is actively being exploited in the wild, all an attacker needed is your phone number to send a malicious MMS message and compromise your Android device with no action, no indication required from your side.

Security researchers from Trend Micro have discovered two new attack scenarios that could trigger Stagefright vulnerability without sending malicious multimedia messages:

• Trigger Exploit from Android Application
• Crafted HTML exploit to Target visitors of a Webpage on the Internet

These two new Stagefright attack vectors carry more serious security implications than the previous one, as an attacker could exploit the bug remotely to:

• Hack millions of Android devices, without knowing their phone numbers and spending a penny.
• Steal Massive Amount of data.
• Built a botnet network of Hacked Android Devices, etc.

The specially crafted MP4 file will cause mediaserver‘s heap to be destroyed or exploited,” researchers explained how an application could be used to trigger Stagefright attack.

Swati has video demonstrations of both of the new attack vectors and covers defensive measures for users.

Does the presence of such a bug in software from Google, which has access to almost unlimited programming talent and to hear its tale, the best programming talent in the business, make you curious about security for the Internet of Things (IoT)?

Or has Google been practicing “good enough” software development and cutting corners on testing for bugs and security flaws?

Now that I think about it, Android is an open source project and as we all know, given enough eyeballs, all bugs are shallow (Linus’s Law).

Hmmm, perhaps there aren’t enough eyes or eyes with a view towards security issues reviewing the Android codebase?

Is it the case the Google is implicitly relying on the community to discover subtle security issues in Android software?

Or to ask a more general question: Who is responsible for security checks on open source software? If everyone is responsible, I take that to mean no one is responsible.

### Blue Light Special: Windows Server 2003

Wednesday, July 15th, 2015

“Blue light special” is nearly a synonym for KMart. If you search for “blue light special” at Wikipedia, you will be redirected to the entry for Kmart.

A “blue light special” consisted of a blue police light being turned on and a KMart employee announcing the special to all shoppers in the store.

As of Tuesday, July 14, 2015, there are now blue light specials on Windows Server 2003. Well, sans the blue police light and the KMart employee. But hackers will learn of vulnerabilities in Windows Server 2003 and there will be no patches to close off those opportunities.

The last patches for Windows Server 2003 were issued on Tuesday and are described at: Microsoft releases 14 bulletins on Patch Tuesday, ends Windows Server 2003 support.

You can purchase, from Microsoft, special support contracts but as the experience of the US Navy has shown, that can be an expensive proposition ($9.1 million per year). That may sound like a lot of income, and it is to a small to medium company, but remember that$9.1 million is 0.00010% of Microsoft’s revenue as shown in its 2014 Annual Report.

I don’t know who to ask at Microsoft but they could should making Windows XP, Windows Server 2003, etc. into open source projects.

Some 61% of businesses are reported to still be using Windows Server 2003. Support beyond the end of life for Windows Server 2003 will be \$600 per server, for the first year with higher fees to follow.

Although open sourcing Windows Server 2003 might cut into some of the maintenance contract income, it would greatly increase the pressure on businesses to migrate off of Windows Server 2003 as hackers get first hand access to this now ancient code base.

In some ways, open sourcing Windows XP, Windows Server 2003 could be a blue light special that benefits all shoppers.

Microsoft obtains the obvious benefits of greater demand, initially, for formal support contracts and in the long run, the decreasing costs of maintaining ancient code bases, plus new income from migrations.

People concerned with the security, or lack thereof in ancient systems gain first hand knowledge of those systems and bugs to avoid in the future.

IT departments benefit from having stronger grounds to argue that long delayed migrations must be undertaken or face the coming tide of zero-day vulnerabilities based on source code access.

Users benefit in the long run from the migration to modern computing architectures and their features. A jump comparable to going from a transistor radio to a smart phone.

### Open Source Tensor Libraries For Data Science

Wednesday, March 18th, 2015

From the post:

Data scientists frequently find themselves dealing with high-dimensional feature spaces. As an example, text mining usually involves vocabularies comprised of 10,000+ different words. Many analytic problems involve linear algebra, particularly 2D matrix factorization techniques, for which several open source implementations are available. Anyone working on implementing machine learning algorithms ends up needing a good library for matrix analysis and operations.

But why stop at 2D representations? In a recent Strata + Hadoop World San Jose presentation, UC Irvine professor Anima Anandkumar described how techniques developed for higher-dimensional arrays can be applied to machine learning. Tensors are generalizations of matrices that let you look beyond pairwise relationships to higher-dimensional models (a matrix is a second-order tensor). For instance, one can examine patterns between any three (or more) dimensions in data sets. In a text mining application, this leads to models that incorporate the co-occurrence of three or more words, and in social networks, you can use tensors to encode arbitrary degrees of influence (e.g., “friend of friend of friend” of a user).

In case you are interested, Wikipedia has a list of software packages for tensor analaysis.

Not mentioned by Wikipedia: Facebook open sourcing TH++ last year, a library for tensor analysis. Along with fblualibz, which includes a bridge between Python and Lua (for running tensor analysis).

Uni10 wasn’t mentioned by Wikipedia either.

Good starting place: Big Tensor Mining, Carnegie Mellon Database Group.

Suggest you join an existing effort before you start duplicating existing work.

### Thank Snowden: Internet Industry Now Considers The Intelligence Community An Adversary, Not A Partner

Saturday, February 14th, 2015

From the post:

We already wrote about the information sharing efforts coming out of the White House cybersecurity summit at Stanford today. That’s supposedly the focus of the event. However, there’s a much bigger issue happening as well: and it’s the growing distrust between the tech industry and the intelligence community. As Bloomberg notes, the CEOs of Google, Yahoo and Facebook were all invited to join President Obama at the summit and all three declined. Apple’s CEO Tim Cook will be there, but he appears to be delivering a message to the intelligence and law enforcement communities, if they think they’re going to get him to drop the plan to encrypt iOS devices by default:

In an interview last month, Timothy D. Cook, Apple’s chief executive, said the N.S.A. “would have to cart us out in a box” before the company would provide the government a back door to its products. Apple recently began encrypting phones and tablets using a scheme that would force the government to go directly to the user for their information. And intelligence agencies are bracing for another wave of encryption.

Disclosure: I have been guilty of what I am about to criticize Mike Masnick about and will almost certainly be guilty of it in the future. That, however, does not make it right.

What would you say is being assumed in the Mike’s title?

Guesses anyone?

What if it read: U.S. Internet Industry Now Considers The U.S. Intelligence Community An Adversary, Not A Partner?

Does that help?

The trivial point is that the “Internet Industry” isn’t limited to the U.S. and Mike’s readership isn’t either.

More disturbing though is that the “U.S. (meant here descriptively) Internet Industry” at one point did consider the “U.S. (again descriptively) Intelligence Community” as a partner at one point.

That being the case and seeing how Mike duplicates that assumption in his title, how should countries besides the U.S. view the reliability (in terms of government access) of U.S. produced software?

That’s a simple enough question.

The assumption of partnership between the “U.S. Internet Industry” and the “U.S. Intelligence Community” would have me running to back an alternative to China’s recent proposal for source code being delivered to the government (in that case China).

Rather than every country having different import requirements for software sales, why not require the public posting of commercial software source for software sales anywhere?

Posting of source code doesn’t lessen your rights to the code (see copyright statutes) and it makes detection of software piracy trivially easy since all commercial software has to post its source code.

Oh, some teenager might compile a copy but do you really think major corporations in any country are going to take that sort of risk? It just makes no sense.

As far as the “U.S. Intelligence Community” concerns, remember “The treacherous are ever distrustful…” The ill-intent of the world they see is a reflection of their own malice towards others. Or after years of systematic abuse, the smoldering anger of the abused.

### WorldWide Telescope (MS) Goes Open Source!

Thursday, January 8th, 2015

Microsoft is Open‐Sourcing WorldWide Telescope in 2015

From the post:

Why is this great news?

Millions of people rely on WorldWide Telescope (WWT) as their unified astronomical image and data environment for exploratory research, teaching, and public outreach. With OpenWWT, any individual or organization will be able to adapt and extend the functionality of WorldWide Telescope to meet any research or educational need. Extensions to the software will continuously enhance astronomical research, formal and informal learning, and public outreach.

What is WWT, and where did it come from?

WorldWide Telescope began in 2007 as a research project, led from within Microsoft Research. Early partners included astronomers and educators from Caltech, Harvard, Johns Hopkins, Northwestern, the University of Chicago, and several NASA facilities. Thanks to these collaborations and Microsoft’s leadership, WWT has reached its goal of creating a free unified contextual visualization of the Universe with global reach that lets users explore multispectral imagery, all of which is deeply connected to scholarly publications and online research databases.

The WWT software was designed with rich interactivity in mind. Guided tours which can be created within the program, offer scripted paths through the 3D environment, allowing media-rich interactive stories to be told, about anything from star formation to the discovery of the large scale structure of the Universe. On the web, WWT is used as both as a standalone program and as an API, in teaching and in research—where it offers unparalleled options for sharing and contextualizing data sets, on the “2D” multispectral sky and/or within the “3D” Universe.

How can you help?

Open-sourcing WWT will allow the people who can best imagine how WWT should evolve to meet the expanding research and teaching challenges in astronomy to guide and foster future development. The OpenWWT Consortium’s members are institutions who will guide WWT’s transition from Microsoft Research to a new host organization. The Consortium and hosting organization will work with the broader astronomical community on a three-part mission of: 1) advancing astronomical research, 2) improving formal and informal astronomy education; and 3) enhancing public outreach.

Join us. If you and your institution want to help shape the future of WWT to support your needs, and the future of open-source software development in Astronomy, then ask us about joining the OpenWWT Consortium.

To contact the WWT team, or inquire about joining the OpenWWT Consortium, contact Doug Roberts at doug-roberts@northwestern.edu.

What a nice way to start the day!

I’m Twitter follower #30 for OpenWWT. What Twitter follower are you going to be?

If you are interested in astronomy, teaching, interfaces, coding great interfaces, etc., there is something of interest for you here.

Enjoy!

### Seldon

Friday, December 26th, 2014

From the post:

It feels that these days we live our whole digital lives according mysterious algorithms that predict what we’ll want from apps and websites. A new open-source product could help those building the products we use worry less about writing those algorithms in the first place.

As increasing numbers of companies hire in-house data science teams, there’s a growing need for tools they can work with so they don’t need to build new software from scratch. That’s the gambit behind the launch of Seldon, a new open-source predictions API launching early in the new year.

Seldon is designed to make it easy to plug in the algorithms needed for predictions that can recommend content to customers, offer app personalization features and the like. Aimed primarily at media and e-commerce companies, it will be available both as a free-to-use self-hosted product and a fully hosted, cloud-based version.

If you think Inadvertent Algorithmic Cruelty is a problem, just wait until people who don’t understand the data or the algorithms start using them in prepackaged form.

Packaged predictive analytics are about as safe as arming school crossing guards with .600 Nitro Express rifles to ward off speeders. As attractive as the second suggestion sounds, there would be numerous safety concerns.

Different but no less pressing safety concerns abound with packaged predictive analytics. Being disconnected from the actual algorithms, can enterprises claim immunity for race, gender or sexual orientation based discrimination? Hard to prove “intent” when the answers in question were generated in complete ignorance of the algorithmic choices that drove the results.

At least Seldon is open source and so the algorithms can be examined, should you be interested in how results are calculated. But open source algorithms are but one aspect of the problem. What of the data? Blind application of algorithms, even neutral ones, can lead to any number of results. If you let me supply the data, I can give you a guarantee of the results from any known algorithm. “Untouched by human hands” as they say.

When you are given recommendations based on predictive analytics do you ask for the data and/or algorithms? Who in your enterprise can do due diligence to verify the results? Who is on the line for bad decisions based on poor predictive analytics?

I first saw this in a tweet by Gregory Piatetsky.