## Archive for August, 2015

### Rendering big geodata on the fly with GeoJSON-VT

Monday, August 31st, 2015

From the post:

Despite the amazing advancements of computing technologies in recent years, processing and displaying large amounts of data dynamically is still a daunting, complex task. However, a smart approach with a good algorithmic foundation can enable things that were considered impossible before.

Let’s see if Mapbox GL JS can handle loading a 106 MB GeoJSON dataset of US ZIP code areas with 33,000+ features shaped by 5.4+ million points directly in the browser (without server support):

An observation from the post:

It isn’t possible to render such a crazy amount of data in its entirety at 60 frames per second, but luckily, we don’t have to:

• at lower zoom levels, shapes don’t need to be as detailed
• at higher zoom levels, a lot of data is off-screen

The best way to optimize the data for all zoom levels and screens is to cut it into vector tiles. Traditionally, this is done on the server, using tools like Mapnik and PostGIS.

Could we create vector tiles on the fly, in the browser? Specifically for this purpose, I wrote a new JavaScript library — geojson-vt.

It turned out to be crazy fast, with its usefulness going way beyond the browser:

In addition to being a great demonstration of the visualization of geodata, I mention this post because it offers insights into the visualization of topic maps.

• at lower zoom levels, shapes don’t need to be as detailed
• at higher zoom levels, a lot of data is off-screen

What do you think the equivalents would be for topic map navigation?

If we think of “shapes don’t need to be as detailed” for a crime topic map, could it be that all offenders, men, women, various ages, races and religions are lumped into an “offender” topic?

And if we think of “a lot of data is off-screen,” is that when we have narrowed a suspect pool down by gender, age, race, etc.?

Those dimensions would vary by the subject of the topic map and would require considering “merging” as a function of the “zoom” into a set of subjects.

Suggestions?

PS: BTW, do work through the post. For geodata this looks very good.

### How IP Stifles Innovation

Monday, August 31st, 2015

What Will Become of the World’s First Open Source GPU? by Nicole Hemsoth.

From the post:

Open source hardware and microprocessor projects are certainly nothing new, and while there has been great momentum on the CPU front, there have not been efforts to release an open source GPU into the wild. However, at the Hot Chips conference this week, a team of researchers revealed their plans for MIAOW, a unique take on open source hardware that leverages a subset of AMD’s Southern Islands ISA that is used for AMD’s own GPU and can run OpenCL codes at what appears to be an impressive performance point that is comparable to existing single-precision GPU results.

As an open source project, it is reasonable to think that once it is further refined, some clever startup might decide to take the chip into full production. However, as one might imagine, there are likely going to be some serious IP infringement issues to address. Since the entire scope of the project is based on a pared-down variant of the AMD ISA for its own GPUs, the team will either need to work within AMD’s confines to continue pushing such a project or the effort, no matter how well proven it is in FPGA prototyping or actual silicon, could be a series of lawsuits waiting to happen.

Dr. Karu Sankaralingam, who led the team’s effort at the University of Wisconsin, where the project is based, says that building an open source or any other hardware project is bound to incur legal wrangling, in part because the IP almost has to be reused in one form or another. Generally, he says that for open source hardware projects like this one, the best defense is to use anything existing as a base but focus innovation on building on top of that. He says that to date, AMD has not been involved in the project beyond a few individuals offering some insight on various architectural elements. In other words, if the team is able to roll this beyond research and into any kind of volume, AMD will likely have words.

See Nicole’s post for technical details.

Sad that the future of such a project must depend on the largess of an IP holder.

If you were a venture capitalist, would you invest in an IP minefield waiting to explode? Likely not.

Patents on chips should require a production version of the chip. And then a patent only for three (3) years. If you haven’t captured the market in three (3) years, you are doing something wrong or there not much IP in your patent.

IP protection should apply to new ideas and inventions, not traps laid for the unwary by the non-productive.

### unglue.it

Monday, August 31st, 2015

From the webpage:

unglue (v. t.) 2. To make a digital book free to read and use, worldwide.

New to me, possibly old to you.

I “discovered” this site while looking at Intermediate Python.

From the general FAQ:

### Basics

#### How It Works

What is Unglue.it?

Unglue.it is a a place for individuals and institutions to join together to make ebooks free to the world. We work together with authors, publishers, or other rights holders who want their ebooks to be free but also want to be able to earn a living doing so. We use Creative Commons licensing as an enabling tool to “unglue” the ebooks.

What are Ungluing Campaigns?

We have three types of Ungluing Campaigns: Pledge Campaigns, Buy-to-Unglue Campaigns and Thanks-for-Ungluing campaigns.

• In a Pledge Campaign, book lovers pledge their support for ungluing a book. If enough support is found to reach the goal (and only then), the supporter’s credit cards are charged, and an unglued ebook is released.
• In a Buy-to-Unglue Campaign, every ebook copy sold moves the book’s ungluing date closer to the present. And you can donate ebooks to your local library- that’s something you can’t do in the Kindle or Apple Stores!
• In a Thanks-for-Ungluing Campaign, the ebook is already released with a Creative Commons license. Supporters can express their thanks by paying what they wish for the license and the ebook.

What is Crowdfunding?

Crowdfunding is collectively pooling contributions (or pledges) to support some cause. Using the internet for coordination means that complete strangers can work together, drawn by a common cause. This also means the number of supporters can be vast, so individual contributions can be as large or as small as people are comfortable with, and still add up to enough to do something amazing.

Want to see some examples? Kickstarter lets artists and inventors solicit funds to make their projects a reality. For instance, webcomic artist Rich Burlew sought 57,750 to reprint his comics in paper form — and raised close to a million. In other words, crowdfunding is working together to support something you love. By pooling resources, big and small, from all over the world, we can make huge things happen. What will supplement and then replace contemporary publishing models remains to be seen. In terms of experiments, this one looks quite promising. If you use unglue.it, please ping me with your experience. Thanks! ### Intermediate Python Monday, August 31st, 2015 Intermediate Python by Muhammad Yasoob Ullah Khalid. Description: Python is an amazing language with a strong and friendly community of pro- grammers. However, there is a lack of documentation on what to learn after getting the basics of Python down your throat. Through this book I aim to solve this problem. I would give you bits of information about some interesting topics which you can further explore. The topics which are discussed in this book open up your mind towards some nice corners of Python language. This book is an outcome of my desire to have something like this when I was beginning to learn Python. If you are a beginner, intermediate or even an advanced programmer there is something for you in this book. Read online at Python Tips or get the donation version at Gumroad. I first saw this in a tweet by Christophe Lalanne. ### Self-Censorship and Terrorism (Hosting Taliban material) Monday, August 31st, 2015 British Library declines Taliban archive over terror law fears From the BBC: The British Library has declined to store a large collection of Taliban-related documents because of concerns regarding terrorism laws. The collection, related to the Afghan Taliban, includes official newspapers, maps and radio broadcasts. Academics have criticised the decision saying it would be a valuable resource to understand the ongoing insurgency in Afghanistan. The library said it feared it could be in breach of counter-terrorism laws. It said it had been legally advised not to make the material accessible. The Terrorism Acts of 2000 and 2006 make it an offence to “collect material which could be used by a person committing or preparing for an act of terrorism” and criminalise the “circulation of terrorist publications”. The Home Office declined to comment saying it was a matter for library. Of course the Home Office has no comment. The more it can bully people and institutions into self-censorship the better. A number of academics have pointed out the absurdity of the decision. But there is some risk and most institutions are “risk adverse,” which also explains why governments tremble at the thought “terrorist publications.” While governments and some libraries try to outdo each other in terms of timidity, the rest of us should be willing to take that risk. Take that risk for freedom of inquiry and the sharing of knowledge. Putting a finger in the eye of timid governments and institutions strikes me as a good reason as well. No promises but perhaps individuals offering to and hosting parts of the Taliban collection will shame timid institutions into hosting it and similar collections (like the alleged torrents of pro-Islamic State tweets). I am willing to host some material from the Taliban archive. It doesn’t have to be the interesting parts (which everyone will want). Are you? PS: No, I’m not a Taliban sympathizer, at least in so far as I understand what the Taliban represents. I am deeply committed to enabling others to reach their own conclusions based on evidence about the Taliban and others. We might agree and we might not. That is one of the exciting (government drones read “dangerous”) aspects of intellectual freedom. ### The 27 Worst Charts Of All Time Sunday, August 30th, 2015 The 27 Worst Charts Of All Time by Walter Hickey. Walter starts his post with: Impressively bad. Yes? See Walter’s post for twenty-six (26) other examples of what not to do. ### Network motif discovery: A GPU approach [subgraph isomorphism and topics] Sunday, August 30th, 2015 Network motif discovery: A GPU approach by Wenqing Lin ; Xiaokui Xiao ; Xing Xie ; Xiao-Li Li. Abstract: The identification of network motifs has important applications in numerous domains, such as pattern detection in biological networks and graph analysis in digital circuits. However, mining network motifs is computationally challenging, as it requires enumerating subgraphs from a real-life graph, and computing the frequency of each subgraph in a large number of random graphs. In particular, existing solutions often require days to derive network motifs from biological networks with only a few thousand vertices. To address this problem, this paper presents a novel study on network motif discovery using Graphical Processing Units (GPUs). The basic idea is to employ GPUs to parallelize a large number of subgraph matching tasks in computing subgraph frequencies from random graphs, so as to reduce the overall computation time of network motif discovery. We explore the design space of GPU-based subgraph matching algorithms, with careful analysis of several crucial factors that affect the performance of GPU programs. Based on our analysis, we develop a GPU-based solution that (i) considerably differs from existing CPU-based methods, and (ii) exploits the strengths of GPUs in terms of parallelism while mitigating their limitations in terms of the computation power per GPU core. With extensive experiments on a variety of biological networks, we show that our solution is up to two orders of magnitude faster than the best CPU-based approach, and is around 20 times more cost-effective than the latter, when taking into account the monetary costs of the CPU and GPUs used. For those of you who need to dodge the paywall: Network motif discovery: A GPU approach From the introduction (topic map comments interspersed): Given a graph $G$, a network motif in $G$ is a subgraph $g$ of $G$, such that $g$ appears much more frequently in $G$ than in random graphs whose degree distributions are similar to that of $G$ [1]. The identification of network motifs finds important applications in numerous domains. For example, network motifs are used (i) in system biology to predict protein interactions in biological networks and discover functional sub-units [2], (ii) in electronic engineering to understand the characteristics of circuits [3], and (iii) in brain science to study the functionalities of brain networks [4]. Unlike a topic map considered as graph $G$, the network motif problem is to discover subgraphs of $G$. In a topic map, the subgraph constituting a topic and its defined isomophism with other topics, is declarable. …Roughly speaking, all existing techniques adopt a common two-phase framework as follows: Subgraph Enumeration: Given a graph $G$ and a parameter $k$, enumerate the subgraphs $g$ of $G$ with $k$ vertices each; Frequency Estimation: Rather than enumerating subgraph $g$ of $G$ (a topic map), we need only collect those subgraphs for the isomorphism test. …To compute the frequency of $g$ in a random graph $G$, however, we need to derive the number of subgraphs of $G$ that are isomorphic to $g$ – this requires a large number of subgraph isomorphism tests [14], which are known to be computationally expensive. Footnote 14 is a reference to: Practical graph isomorphism, II by Brendan D. McKay, Adolfo Piperno. See also: nauty and Traces by Brendan McKay and Adolfo Piperno. nauty and Traces are programs for computing automorphism groups of graphs and digraphs [*]. They can also produce a canonical label. They are written in a portable subset of C, and run on a considerable number of different systems. This is where topic map subgraph $g$ isomorphism issue intersects with the more general subgraph isomorphism case. Any topic can have properties (nee internal occurrences) in addition to those thought to make it “isomorphic” to another topic. Or play roles in associations that are not played by other topics representing the same subject. Motivated by the deficiency of existing work, we present an in-depth study on efficient solutions for network motif discovery. Instead of focusing on the efficiency of individual subgraph isomorphism tests, we propose to utilize Graphics Processing Units (GPUs) to parallelize a large number of isomorphism tests, in order to reduce the computation time of the frequency estimation phase. This idea is intuitive, and yet, it presents a research challenge since there is no existing algorithm for testing subgraph isomorphisms on GPUs. Furthermore, as shown in Section III, existing CPU-based algorithms for subgraph isomorphism tests cannot be translated into efficient solutions on GPUs, as the characteristics of GPUs make them inherently unsuitable for several key procedures used in CPU-based algorithms. The punch line is that the authors present a solution that: …is up to two orders of magnitude faster than the best CPU-based approach, and is around 20 times more cost-effective than the latter, when taking into account the monetary costs of the CPU and GPUs used. Since topic isomophism is defined as opposed to discovered, this looks like a fruitful avenue to explore for topic map engine performance. ### DataPyR Saturday, August 29th, 2015 Twenty (20) lists of programming resources on data science, Python and R. A much easier collection of resources to scan than attempting to search for resources on any of these topics. At the same time, you have to visit each resource and mine it for an answer to any particular problem. For example, there is a list of Python Packages for Datamining, which is useful, but even more useful would be a list of common datamining tasks with pointers to particular data mining libraries. That would enable users to search across multiple libraries by task, as opposed to exploring each library. Expand that across a set of resources on data science, Python and R and you’re talking about saving time and resources across the entire community. I first saw this in a tweet by Kirk Borne. ### Mass Shootings [Don’t] Fudg[e] The Numbers Friday, August 28th, 2015 Mass Shootings Are Horrifying Enough Without Fudging The Numbers by Bob Rudis (@hrbrmstr). From the post: Business Insider has a piece titled “We are now averaging more than one mass shooting per day in 2015”, with a lead paragraph of: As of August 26th, the US has had 247 mass shootings in the 238 days of 2015. They go on to say that the data they used in their analysis comes from the Mass Shootings Tracker. That site lists 249 incidents of mass shootings from January 1st to January 28th. The problem is you can’t just use simple, inflammatory math to make the point about the shootings. A shooting did not occur every day. In fact, there were only 149 days with shootings. Let’s take a look at the data. We’ll first verify that we are working with the same data that’s on the web site by actually grabbing the data from the web site: Complete with R code and graphs to show days with multiple mass shootings on one day. Be mindful that the Mass Shooting Tracker counts four (4) or more people being shot as a mass shooting. Under earlier definitions four (4) or more people had to be murdered for it to be a mass shooting. BTW, there were ninety-one (91) days with no mass shootings this year. (so far) ### Conspiring with Non-Indicted Co-Conspirators Friday, August 28th, 2015 It is now dangerous to share social media contact information with others. In a recent padding of the FBI statistics on terrorism: In August 2014, a 24-year-old New York City resident (CC-1) learned via social media that El Gammal had posted social media comments that supported ISIL. Minutes later, CC-1 contacted El Gammal. Over the next several months, CC-1 and El Gammal continued corresponding over the Internet, although CC-1 deleted many of these exchanges. In the midst of these communications, in October 2014, El Gammal traveled to Manhattan, New York, where CC-1 was enrolled in college, and contacted and met with CC-1. While in New York City, El Gammal also contacted another co-conspirator (CC-2), who lived in Turkey, about CC-1’s plans to travel to the Middle East. El Gammal later provided CC-1 with social media contact information for CC-2. Thereafter, El Gammal and CC-2 had multiple social media exchanges about CC-1 traveling to the Middle East. In addition, CC-1 began communicating with CC-2, introducing himself as a friend of “Gammal’s.” In late January 2015, CC-1 abruptly left New York City for Istanbul. After CC-1 arrived in Turkey, El Gammal continued to communicate with him over the Internet, providing advice on traveling toward Syria and on meeting with CC-2. After CC-1 arrived in Syria, he received military-type training from ISIL between early February and at least early May 2015. On May 7, 2015, CC-1 reported to El Gammal that “everything [was] going according to plan.” There is a big jump between sharing social media contact information with someone going overseas, even to Turkey, and what the lay summary calls: …assisting a New York college student to travel to Syria to obtain military training from ISIL… Hardly. You and I might talk about the government of the United States, among other things, but sharing social media contacts of people living in the D.C. area doesn’t make me a co-conspirator in some future unlawful act you commit. In the FBI’s view: “As alleged, Gammal helped a college student in New York receive terrorist training in Syria through a contact in Turkey, in order to support ISIL,” said Assistant Director in Charge Rodriguez. “These relationships were allegedly made and solidified through the internet while Gammal was in Arizona. This is another example of how social media is utilized for nefarious and criminal purposes around the world. That’s an absurdity wrapped in a paranoid imagination. Gammal could have just as well provided a cellphone number for CC2. Would that make cellphones the origin of “nefarious and criminal purposes around the world?” The Islamic State is and has been a distasteful organization with questionable tactics. However, much of its stature in the world is due to the hyping of the organization by the FBI and others. Want to see ISIS diminished? Stop treating it as a serious adversary, which it’s not. People will fairly quickly lose interest when it is no long front page news. ### eXist: XQuery FLWOR Expression Updates Friday, August 28th, 2015 Fixes for FLWOR expressions #763 From the webpage: This is a major update to FLWOR processing in eXist to align it with the 3.0/3.1 specification. The “for”, “let”, “where”, “order by” and “group by” clauses are now implemented as separate classes (instead of being attached to a main “for” or “let”), and may thus appear in any place the spec allows. A FLWOR expression may have multiple “order by”, “group by” or “where” clauses. The previous implementation was too restrictive, allowing those clauses only in specific positions. The “group by” clause, which was added back in 2007 and has been updated a few times since then, was completely rewritten and does now also support collations. The “allowing empty” spec within a “for” is now respected. You need the nightly build version to benefit from these latest changes. (As of August 28, 2015. Eventually the changes will make it into a release candidate and then into a stable version.) Currently available versions include: Stable Release, version 2.2 Release Candidate, version 3.0 RC 1 (not recommended for production) Nightly Builds, Use at your own risk I first saw this in a tweet by Jonathan Robie. ### Blocking Flashers Friday, August 28th, 2015 From the post: Last month, Firefox blocked all Flash content by default – as it waited for Adobe to patch a critical security hole that was being actively exploited in malicious attacks. The news came hot on the heels of Facebook’s security chief calling for Flash to be put out of its misery permanently. And from next Tuesday, September 1st, Google’s Chrome browser will be blocking Flash ads by default. In a notice posted on Google Plus, the company says that the change is being made to improve performance for users. Be aware that Graham has previously said that simply disabling Flash in your browsers may not be enough to protect yourself from Flash vulnerabilities. Considering the security issues known to exist with Adobe Flash, I see no reason to place much confidence in any patching that Adobe produces for Flash. If they were capable of doing it correctly, it would already be done. The best strategy is that if a webpage or PDF or Word document requires Flash, don’t look. In the case of documents, return to sender requesting they use less insecure software for communication purposes. (Government offices take note! Flash should be banned from all submissions to government agencies.) ### Twitter Doubles Down on Censorship Thursday, August 27th, 2015 From the post: When Twitter killed embarrassing-political-tweet archive Politwoops in June, the site’s founders probably looked to the 30 other countries where it was running and said, well, it might just be a matter of time before those are strangled in the crib. Consider them strangled. Twitter told the Open State Foundation on Friday that it had suspended API access to Diplotwoops and all remaining Politwoops sites in those 30 countries. Part of Twitter’s explanation reads as follows: Imagine how nerve-racking – terrifying, even – tweeting would be if it was immutable and irrevocable? No one user is more deserving of that ability than another. Indeed, deleting a tweet is an expression of the user’s voice. Do you wonder if Twitter will use that justification when the NSA comes knocking? I have to imagine that Twitter comes down on the side of the my-edited-history folks of the EU and recently the UK. I find the idea that digital records will shift under our feet far more terrifying than tweets being “immutable and irrevocable.” You? ### Abandon All Hope Prior To IE 11 Wednesday, August 26th, 2015 Stay up-to-date with Internet Explorer From the post: As we shared in May, Microsoft is prioritizing helping users stay up-to-date with the latest version of Internet Explorer. Today we would like to share important information on migration resources, upgrade guidance, and details on support timelines to help you plan for moving to the latest Internet Explorer browser for your operating system. Microsoft offers innovative and transformational services for a mobile-first and cloud-first world, so you can do more and achieve more; Internet Explorer is core to this vision. In today’s digital world, billions of people use Internet-connected devices, powered by cloud service-based applications, spanning both work and life experiences. Running a modern browser is more important than ever for the fastest, most secure experience on the latest Web sites and services, connecting anytime, anywhere, on any device. Microsoft recommends enabling automatic updates to ensure an up-to-date computing experience—including the latest version of Internet Explorer—and most consumers use automatic updates today. Commercial customers are encouraged to test and accept updates quickly, especially security updates. Regular updates provide significant benefits, such as decreased security risk and increased reliability, and Windows Update can automatically install updates for Internet Explorer and Windows. For customers not yet running the latest browser available for your operating system, we encourage you to upgrade and stay up-to-date for a faster, more secure browsing experience. Beginning January 12, 2016, the following operating systems and browser version combinations will be supported:  Windows Platform Internet Explorer Version Windows Vista SP2 Internet Explorer 9 Windows Server 2008 SP2 Internet Explorer 9 Windows 7 SP1 Internet Explorer 11 Windows Server 2008 R2 SP1 Internet Explorer 11 Windows 8.1 Internet Explorer 11 Windows Server 2012 Internet Explorer 10 Windows Server 2012 R2 Internet Explorer 11 After January 12, 2016, only the most recent version of Internet Explorer available for a supported operating system will receive technical support and security updates. For example, customers using Internet Explorer 8, Internet Explorer 9, or Internet Explorer 10 on Windows 7 SP1 should migrate to Internet Explorer 11 to continue receiving security updates and technical support. For more details regarding support timelines on Windows and Windows Embedded, see the Microsoft Support Lifecycle site. I can’t comment on the security of IE 11 but it will create a smaller footprint for support. Perhaps some hackers will be drawn away for easier pickings on earlier versions. You are already late planning your migration path to IE 11. What IE version are you going to be running on January 12, 2016? ### Spreadsheets are graphs too! Wednesday, August 26th, 2015 Spreadsheets are graphs too! by Felienne Hermans. Presentation with transcript. Felienne starts with a great spreadsheet story: When I was in grad school, I worked with an investment bank doing spreadsheet research. On my first day, I went to the head of the Excel team. I said, ‘Hello, can I have a list of all your spreadsheets?’ There was no such thing. ‘We don’t have a list of all the spreadsheets,’ he said. ‘You could ask Frank in Accounting or maybe Harry over at Finance. He’s always talking about spreadsheets. I don’t really know, but I think we might have 10,000 spreadsheets.’ 10,000 spreadsheets was a gold mine of research, so I went to the IT department and conducted my first spreadsheet scan with root access in Windows Explorer. Within one second, it had already found 10,000 spreadsheets. Within an hour, it was still finding more, with over one million Excel files located. Eventually, we found 2.5 million spreadsheets. In short, spreadsheets run the world. She continues to outline spreadsheet horror stories and then demonstrates how complex relationships between cells can be captured by Neo4j. Which are much easier to query with Cypher than SQL! While I applaud: I realized that spreadsheet information is actually very graphy. All the cells are connected to references to each other and they happen to be in a worksheet or on the spreadsheet, but that’s not really what matters. What matters is the connections. I would be more concerned with the identity of the subjects between which connections have been made. Think of it as documenting the column headers from a five year old spreadsheet, that you are now using by rote. Knowing the connections between cells is a big step forward. Knowing what the cells are supposed to represent is an even bigger one. ### Trademark Litigation Attorney Needed – Contingency Fee Case Wednesday, August 26th, 2015 No, not for me but for Grsecurity. Important Notice Regarding Public Availability of Stable Patches by Brad Spengler & The PaX Team. From the webpage: Grsecurity has existed for over 14 years now. During this time it has been the premier solution for hardening Linux against security exploits and served as a role model for many mainstream commercial applications elsewhere. All modern OSes took our lead and implemented to varying degrees a number of security defenses we pioneered; some have even been burned into silicon in newer processors. Over the past decade, these defenses (a small portion of those we’ve created and have yet to release) have single-handedly caused the greatest increase in security for users worldwide. …. A multi-billion dollar corporation had made grsecurity a critical component of their embedded platform. This in itself isn’t a problem, nor is it necessarily (albeit extremely unwise) that they’re using an old, unsupported kernel and a several year old, unsupported version of grsecurity that they’ve modified. This seems to be the norm for the embedded Linux industry, seemingly driven by a need to mark a security checkbox at the lowest cost possible. So it’s no surprise that they didn’t bother to hire us to perform the port properly for them or to actively maintain the security of the kernel they’re providing to their paid customers. They are publishing a “grsecurity” for a kernel version we never released a patch for. We provided evidence to their lawyers of one of their employees registering on our forums and asking for free help with backporting an EFI fix to their modified version of grsecurity based off a very old patch of ours (a test patch that wasn’t even the last one released for that major kernel version). The company’s lawyers repeatedly claimed the company had not modified the grsecurity code in any way and that therefore all the references to “grsecurity” in their product were therefore only nominative use of the trademark to refer to our external work. They would therefore not cease using our trademark and would continue to do so despite our objections. This final assertion occurred three months after our initial cease and desist letter. They also threatened to request “all available sanctions and attorneys’ fees” were we to proceed with a lawsuit against them. This announcement is our public statement that we’ve had enough. Companies in the embedded industry not playing by the same rules as every other company using our software violates users’ rights, misleads users and developers, and harms our ability to continue our work. Though I’ve only gone into depth in this announcement on the latest trademark violation against us, our experience with two GPL violations over the previous year have caused an incredible amount of frustration. These concerns are echoed by the complaints of many others about the treatment of the GPL by the embedded Linux industry in particular over many years. With that in mind, today’s announcement is concerned with the future availability of our stable series of patches. We decided that it is unfair to our sponsors that the above mentioned unlawful players can get away with their activity. Therefore, two weeks from now, we will cease the public dissemination of the stable series and will make it available to sponsors only. The test series, unfit in our view for production use, will however continue to be available to the public to avoid impact to the Gentoo Hardened and Arch Linux communities. If this does not resolve the issue, despite strong indications that it will have a large impact, we may need to resort to a policy similar to Red Hat’s, described here or eventually stop the stable series entirely as it will be an unsustainable development model. If you know a trademark attorney or would like to donate the services of one (large corporations have them by the bag full), consider contacting Grsecurity. Bottom feeders, as are described in Brad’s post, remind me of the “pigs in their stys with all their backing, what they need is a damned good whacking.” Please re-post, forward, distribute, etc. ### Looking for Big Data? Look Up! Tuesday, August 25th, 2015 Gaia’s first year of scientific observations From the post: After launch on 19 December 2013 and a six-month long in-orbit commissioning period, the satellite started routine scientific operations on 25 July 2014. Located at the Lagrange point L2, 1.5 million km from Earth, Gaia surveys stars and many other astronomical objects as it spins, observing circular swathes of the sky. By repeatedly measuring the positions of the stars with extraordinary accuracy, Gaia can tease out their distances and motions through the Milky Way galaxy. For the first 28 days, Gaia operated in a special scanning mode that sampled great circles on the sky, but always including the ecliptic poles. This meant that the satellite observed the stars in those regions many times, providing an invaluable database for Gaia’s initial calibration. At the end of that phase, on 21 August, Gaia commenced its main survey operation, employing a scanning law designed to achieve the best possible coverage of the whole sky. Since the start of its routine phase, the satellite recorded 272 billion positional or astrometric measurements, 54.4 billion brightness or photometric data points, and 5.4 billion spectra. The Gaia team have spent a busy year processing and analysing these data, en route towards the development of Gaia’s main scientific products, consisting of enormous public catalogues of the positions, distances, motions and other properties of more than a billion stars. Because of the immense volumes of data and their complex nature, this requires a huge effort from expert scientists and software developers distributed across Europe, combined in Gaia’s Data Processing and Analysis Consortium (DPAC). In case you missed it: Since the start of its routine phase, the satellite recorded 272 billion positional or astrometric measurements, 54.4 billion brightness or photometric data points, and 5.4 billion spectra. It sounds like big data. Yes? 😉 Public release of the data is pending. Check back at the Gaia homepage for the latest news. ### Your Fridge Joined Ashley Madison? Monday, August 24th, 2015 From the post: Security researchers have discovered a potential way to steal users’ Gmail credentials from a Samsung smart fridge. Pen Test Partners discovered the MiTM (man-in-the-middle) vulnerability that facilitated the exploit during an IoT hacking challenge run by Samsung at the recent DEF CON hacking conference. The hack was pulled off against the RF28HMELBSR smart fridge, part of Samsung’s line-up of Smart Home appliances which can be controlled via their Smart Home app. While the fridge implements SSL, it fails to validate SSL certificates, thereby enabling man-in-the-middle attacks against most connections. The internet-connected device is designed to download Gmail Calendar information to an on-screen display. Security shortcomings mean that hackers who manage to jump on to the same network can potentially steal Google login credentials from their neighbours. The certainty of online transactions diminishes with the spread of the internet-of-things (IoT). Think about it. My email, packets with my router address, etc. may appear in massive NSA data vacuum bags. What defense to I have other than “I didn’t send, receive, etc.?” I’m not logging or at least not preserving logs of every bit of every keystroke on my computer. Are you? And if you did, how would you authenticate it to the NSA? Of course, the authenticity, or subject identity in topic map terms, of an email in Ashley Madison data, should depend on a number of related factors to establish identity. From the user profile associated with an email for example. Are sexual profiles as unique as fingerprints? Authenticity hasn’t been raised in the NSA phone surveillance debate but if you think “phone call tracking is connecting dots,” it isn’t likely to come up. I used to call the time-of-day service after every client call. It wasn’t a secretive messaging technique, it was to clear the last called buffer on the phone. Sometimes a lack of a pattern is just that, lack of a pattern. ### Linux on the Mainframe Monday, August 24th, 2015 Linux Foundation Launches Open Mainframe Project to Advance Linux on the Mainframe From the post: The Linux Foundation, the nonprofit organization dedicated to accelerating the growth of Linux and collaborative development, announced the Open Mainframe Project. This initiative brings together industry experts to drive innovation and development of Linux on the mainframe. Founding Platinum members of the Open Mainframe Project include ADP, CA Technologies, IBM and SUSE. Founding Silver members include BMC, Compuware, LC3, RSM Partners and Vicom Infinity. The first academic institutions participating in the effort include Marist College, University of Bedfordshire and The Center for Information Assurance and Cybersecurity at University of Washington. The announcement comes as the industry marks 15 years of Linux on the mainframe. In just the last few years, demand for mainframe capabilities have drastically increased due to Big Data, mobile processing, cloud computing and virtualization. Linux excels in all these areas, often being recognized as the operating system of the cloud and for advancing the most complex technologies across data, mobile and virtualized environments. Linux on the mainframe today has reached a critical mass such that vendors, users and academia need a neutral forum to work together to advance Linux tools and technologies and increase enterprise innovation. “Linux today is the fastest growing operating system in the world. As mobile and cloud computing become globally pervasive, new levels of speed and efficiency are required in the enterprise and Linux on the mainframe is poised to deliver,” said Jim Zemlin executive director at The Linux Foundation. “The Open Mainframe Project will bring the best technology leaders together to work on Linux and advanced technologies from across the IT industry and academia to advance the most complex enterprise operations of our time.” Linux Foundation Collaborative Projects, visit: http://collabprojects.linuxfoundation.org/ Open Mainframe Project, visit: https://www.openmainframeproject.org/ In terms of ancient topic map history, recall that both topic maps and DocBook arose out of what became the X-Windows series by O’Reilly. If you are familiar with the series, you can imagine the difficulty of adapting it to the nuances of different vendor releases and vocabularies. Several of the volumes from the X-Windows series are available in the O’Reilly OpenBook Project. I mention that item of topic map history because documenting mainframe Linux isn’t going to be a trivial task. A useful index across documentation from multiple authors is going to require topic maps or something very close to it. One last bit of trivia, the X-Windows project can be found at www.x.org. How’s that for cool? A single letter name. ### Popcorn Time Information Banned in Denmark Monday, August 24th, 2015 Not content to prosecute actual copyright violators, Denmark is prosecuting people who spread information about software that can violate copyrights. That right! Information, not links to pirated content, not the software, just information about the software. From the post: While arrests of file-sharers and those running sites that closely facilitate infringement are nothing new, this week’s arrests appear to go way beyond anything seen before. The two men are not connected to the development of Popcorn Time and have not been offering copyrighted content for download. Both sites were information resources, offering recent news on Popcorn Time related developments, guides, FAQ sections and tips on how to use the software. Both men stand accused of distributing knowledge and guides on how to obtain illegal content online and are reported to have confessed. I wonder what “confessed” means under these circumstances? Confessed to providing up-to-date and useful information on Popcorn Time? That’s hardly a crime by any stretch of the imagination. I realize there is a real shortage of crime in Denmark, http://www.nationmaster.com/country-info/profiles/Denmark/Crime: but that’s no excuse to get overly inventive with regard to intellectual property crimes. Before I forget: Those looking for a clearer (and live) idea of what the site looked like before it was taken down should check out getpopcorntime.co.uk, which was previously promoted by PopcornTime.dk as an English language version of their site. Whenever you encounter banned sites or information, be sure to pass the banned information along. Censorship has no legitimate role on the Internet. If you don’t want to see particular content, don’t look. What other people choose to look at is their business and none of yours. Child porn is the oft-cited example for censorship on the Internet. I agree it is evil, etc., but why concentrate on people sharing child porn? Shouldn’t the police be seeking the people making child porn? Makes you wonder doesn’t it? Are the police ineffectually swatting (sorry) at the distribution of child porn and ignoring the real crimes of making child porn? With modern day image recognition, you have to wonder why the police aren’t identifying more children in child porn? Or are they so wedded to ineffectual but budget supporting techniques that they haven’t considered the alternatives? I am far more sympathetic to the use of technology to catch the producers of child porn than to state functionaries attempting to suppress the free interchange of information on the Internet. ### Lisp for the Modern Web Sunday, August 23rd, 2015 Lisp for the Modern Web by Vito Van. From the post: What to Expect This piece is about how to build a modern web application with Common Lisp in the backend, from scratch. You may need to have some knowledge about Front End Development, cause we won’t explain the steps for building the client. Why Lisp? Again It is awesome. I don’t think we need another reason for using Lisp, do we? Life is short, let’s be awesome! It’s been more than half a century since Lisp first appeared, she’s like the The One Ring in the Middle-earth. The one who mastered the spell of Lisp, will rule the world, once again. Other reasons. If you need some other reasons beside awesome, here is some articles about Lisp, enjoy them. I have never been fond of triumphalism so let me give you a more pragmatic reason for using Lisp: Ericka Chickowski reports in Angler Climbing To Top Of Exploit Heap that Angler makes up 82% of the exploit kits in use. Angler targets? Adobe Flash and Java. Any questions? You can write vulnerable code in Lisp just as you can any other language. But then it will be your mistake and not something broken at the outset. I first saw this in a tweet by Christophe Lalanne. ### 1962 United States Tourist Map Sunday, August 23rd, 2015 Part of the joy of this map comes from being old enough to remember maps similar to this one. Critics can scan the map for what isn’t represented as tourist draws. Consider it to be a snapshot of the styles and interests. Most notable absence? Cape Canaveral. I suspect its absence reflects the lead time involved in the drafting and publishing of a map at the time. Explorer 1 (1958) and the first American in space, Alan Shepard (1961), both preceded this map. Enjoy! ### Decoding Satellite-Based Text Messages… [Mini-CIA] Sunday, August 23rd, 2015 From the post: [Carl] just found a yet another use for the RTL-SDR. He’s been decoding Inmarsat STD-C EGC messages with it. Inmarsat is a British satellite telecommunications company. They provide communications all over the world to places that do not have a reliable terrestrial communications network. STD-C is a text message communications channel used mostly by maritime operators. This channel contains Enhanced Group Call (EGC) messages which include information such as search and rescue, coast guard, weather, and more. Not much equipment is required for this, just the RTL-SDR dongle, an antenna, a computer, and the cables to hook them all up together. Once all of the gear was collected, [Carl] used an Android app called Satellite AR to locate his nearest Inmarsat satellite. Since these satellites are geostationary, he won’t have to move his antenna once it’s pointed in the right direction. You may have to ally with a neighbor who is good with a soldering iron but considering the amount of RF in the air, you should be able to become the mini-CIA for your area. Not that the data itself may be all that interesting, but munging cellphone data with video surveillance of street traffic, news and other feeds, plus other RF sources, will hone your data handling skills. For example, have you ever wondered how many of your neighbors obey watering restrictions during droughts? One way to find out is to create a baseline set of data for water usage (meters now report digitally) and check periodically when drought restrictions are in effect. Nothing enlivens a town or county meeting like a color-coded chart of water cheats. (That will also exercise your mapping skills as well.) Using topic maps will facilitate merging your water surveillance data other data, such as high traffic patterns for some locations of different cars. Or the periods of cars arriving and departing from some location. ### Cisco 2015 Midyear Security Report Sunday, August 23rd, 2015 Cisco 2015 Midyear Security Report A must read for this graphic is nothing else: Select (“click”) for a larger version. The top three? 1. Buffer Errors – 471 2. Input Validation – 244 3. Resource Management Errors – 238 If we assume that #4, Permissions, Privileges and Access Control – 155 and Information Leak/Disclosure – 138, are not within a vendor’s control, the remaining 295 are. Added to the top three vulnerabilities, vendor preventable vulnerabilities total 1245 out of 1541 or 81% of the vulnerabilities in the graphic. Cisco has an answer for why this pattern repeats year after year: The problem lies in insufficient attention being paid to the secure development lifecycle. Security safeguards and vulnerability tests should be built in as a product is being developed. Instead, vendors wait until the product reaches the market and then address its vulnerabilities. You can’t say that Cisco is anti-vendor, being a software vendor itself. Under current law, it is cheaper for software vendors to fix only the vulnerabilities that are discovered (for free) by others. Yes, the key phrase is “under current law.” Strict liability and minimum (say5K) damages for

1. Buffer Errors
2. Input Validation
3. Resource Management Errors

would be a large step towards eliminating 62% of the vulnerabilities each year.

Vendors would not have to hunt for every possible vulnerability, just those that fall into those three categories. (To blunt the argument that hunting vulnerabilities is sooo difficult. Perhaps but I propose to eliminate only three of them.)

Strict liability would eliminate all the tiresome EULA issues for all plaintiffs.

Mandatory minimum damages would make finding lawyers to bring the suits easy.

Setting specific vulnerability criteria limits the cry that “perfect” software isn’t possible. True, but techniques to avoid buffer overflows existed in the 1960’s.

Users aren’t asking for perfection but that they and their files have a higher status than digital litter.

### TinkerPop3 Promo in One Paragraph

Saturday, August 22nd, 2015

Marko A. Rodriguez tweeted https://news.ycombinator.com/item?id=10104282 as a “single paragraph” explanation of why you should prefer TinkerPop3 over TinkerPop2.

Of course, I didn’t believe the advantages could be contained in a single paragraph but you be the judge:

Check out http://tinkerpop.com. Apache TinkerPop 3.0.0 was released in June 2015 and it is a quantum leap forward. Not only is it now apart of the Apache Software Foundation, but the Gremlin3 query language has advanced significantly since Gremlin2. The language is much cleaner, provides declarative graph pattern matching constructs, and it supports both OLTP graph databases (e.g. Titan, Neo4j, OrientDB) and OLAP graph processors (e.g. Spark, Giraph). With most every graph vendor providing TinkerPop-connectivity, this should make it easier for developers as they don’t have to learn a new query language for each graph system and developers are less prone to experience vendor lock-in as their code (like JDBC/SQL) can just move to another underlying graph system.

Are my choices developer lock-in versus vendor lock-in? That’s a tough call. 😉

Do check out TinkerPop3!

### 100 open source Big Data architecture papers for data professionals

Saturday, August 22nd, 2015

From the post:

Big Data technology has been extremely disruptive with open source playing a dominant role in shaping its evolution. While on one hand it has been disruptive, on the other it has led to a complex ecosystem where new frameworks, libraries and tools are being released pretty much every day, creating confusion as technologists struggle and grapple with the deluge.

If you are a Big Data enthusiast or a technologist ramping up (or scratching your head), it is important to spend some serious time deeply understanding the architecture of key systems to appreciate its evolution. Understanding the architectural components and subtleties would also help you choose and apply the appropriate technology for your use case. In my journey over the last few years, some literature has helped me become a better educated data professional. My goal here is to not only share the literature but consequently also use the opportunity to put some sanity into the labyrinth of open source systems.

One caution, most of the reference literature included is hugely skewed towards deep architecture overview (in most cases original research papers) than simply provide you with basic overview. I firmly believe that deep dive will fundamentally help you understand the nuances, though would not provide you with any shortcuts, if you want to get a quick basic overview.

Jumping right in…

You will have a great background in Big Data if you read all one hundred (100) papers.

What you will be missing is an overview that ties the many concepts and terms together into a coherent narrative.

Perhaps after reading all 100 papers, you will start over to map the terms and concepts one to the other.

That would both useful and controversial within the field of Big Data!

Enjoy!

I first saw this in a tweet by Kirk Borne.

### Images for Social Media

Friday, August 21st, 2015

23 Tools and Resources to Create Images for Social Media

From the post:

Through experimentation and iteration, we’ve found that including images when sharing to social media increases engagement across the board — more clicks, reshares, replies, and favorites.

Using images in social media posts is well worth trying with your profiles.

As a small business owner or a one-man marketing team, is this something you can pull off by yourself?

At Buffer, we create all the images for our blogposts and social media sharing without any outside design help. We rely on a handful of amazing tools and resources to get the job done, and I’ll be happy to share with you the ones we use and the extras that we’ve found helpful or interesting.

If you tend to scroll down numbered lists (like I do), you will be left thinking the creators of the post don’t know how to count:

because:

the end of the numbered list, isn’t 23.

If you look closely, there are several lists of unnumbered resources. So, you’re thinking that they do know how to count, but some of the items are unnumbered.

Should be, but it’s not. There are thirteen (13) unnumbered items, which added to fifteen (15), makes twenty-eight (28).

So, I suspect the title should read: 28 Tools and Resources to Create Images for Social Media.

In any event, its a fair collection of tools that with some effort on your part, can increase your social media presence.

Enjoy!

Friday, August 21st, 2015

Parens of the Dead: A screencast series of zombie-themed games written with Clojure and ClojureScript.

Three episodes posted thus far:

Episode 1: Lying in the Ground

Starting with an empty folder, we’ll lay the technical groundwork for our game. We’ll get a Clojure web server up and running, compiling and sending ClojureScript to the browser.

Episode 2: Frontal Assualt

In this one, we create most of the front-end code. We take a look at the data structure that describes the game, using that to build up our UI.

Episode 3: What Lies Beneath

The player has only one action available; revealing a tile. We’ll start implementing the central ‘reveal-tile’ function on the backend, writing tests along the way.

Another innovative instruction technique!

Suggestions:

1) Have your volume control available because I found the sound in the screencasts to be very soft.

2) Be prepared to move very quickly as episode one, for example, is only eleven minutes long.

3) Download the code and walk through it at a slower pace.

Enjoy!

### Pandering for Complaints

Friday, August 21st, 2015

Yesterday I mentioned that the UK has joined the ranks of censors of Google and is attempting to fine tune search results for a given name. Censorship of Google Spreads to the UK.

Today, Simon Rice of the Information Commissioner’s Office, posted: Personal data in leaked datasets is still personal data.

Simon starts off by mentioning the Ashley Madison data dumps and then says:

Anyone in the UK who might download, collect or otherwise process the leaked data needs to be aware they could be taking on data protection responsibilities defined in the UK’s Data Protection Act.

Similarly, seeking to identify an individual from a leaked dataset will be an intrusion into their private life and could also lead to a breach of the DPA.

Individuals will have a range of personal reasons for having created an account with particular online services (or even had an account created without their knowledge) and any publication of further personal data without their consent can cause them significant damage or distress.

It’s worth noting too that any individual or organisation seeking to rely on the journalism exemption should be reminded that this is not a blanket exemption to the DPA and be encouraged to read our detailed guide on how the DPA applies to journalism.

Talk about chilling free speech. You shouldn’t even look to see if the data is genuine. Just don’t look!

You could let your “betters” in the professional press tell you what they want you to know, but I suspect you are brighter than that. What are the press motives behind what you see and what you don’t?

To make matters even worse, Simon closes with a solicitation for complaints:

If you find your personal data being published online then you have a right to go to that publisher and request that the information is removed. This applies equally to information being shared on social media. If the publisher is based in the UK and fails to remove your information you can complain to the ICO.

I don’t have a lot of extra webspace but if you get a complaint from the ICO, I’m willing to host whatever data I can. It won’t be much so don’t get too excited about free space.

We all need to step up and offer storage space for content censored by the UK and others.

### Disclosing Government Contracts

Friday, August 21st, 2015

From the post:

A huge bunch of flowers to Rick Messick for his excellent post asking two key questions about open contracting. And some luxury cars, expensive seafood and a vat or two of cognac.

Our lavish offerings all come from Slovakia, where in 2013 the Government Public Procurement Office launched a new portal publishing all its government contracts. All these items were part of the excessive government contracting uncovered by journalists, civil society and activists. In the case of the flowers, teachers investigating spending at the Department of Education uncovered florists’ bills for thousands of euros. Spending on all of these has subsequently declined: a small victory for fiscal probity.

The flowers, cars, and cognac help to answer the first of two important questions that Rick posed: Will anyone look at contracting information? In the case of Slovakia, it is clear that lowering the barriers to access information did stimulate some form of response and oversight.

The second question was equally important: “How much contracting information should be disclosed?”, especially in commercially sensitive circumstances.

These are two of key questions that we have been grappling with in our strategy at the Open Contracting Partnership. We thought that we would share our latest thinking below, in a post that is a bit longer than usual. So grab a cup of tea and have a read. We’ll be definitely looking forward to your continued thoughts on these issues.

Not a short read so do grab some coffee (outside of Europe) and settle in for a good read.

Disclosure: I’m financially interested in government disclosure in general and contracts in particular. With openness there comes more effort to conceal semantics and increase the need for topic maps to pierce the darkness.

I don’t think openness reduces the amount of fraud and misconduct in government, it only gives an alignment between citizens and the career interests of a prosecutor a sporting chance to catch someone out.

Disclosure should be as open as possible and what isn’t disclosed voluntarily, well, one hopes for brave souls who will leak the remainder.

Support disclosure of government contracts and leakers of the same.

If you need help “connecting the dots,” consider topic maps.