Archive for June, 2014


Tuesday, June 24th, 2014

10 Tips to Immediately Improve User Onboarding by Pieter Walraven.

From the post:

User onboarding is an art. It can be deceivingly simple, but anyone that has ever designed a new user journey knows it’s incredibly hard.

For starters, there’s some tough questions to answer. What main value does my product offer? Who am I talking to? What is the one most important thing new users need to see? What does my product do? Why do we even exist?!

Luckily, there’s many great products out there with tested and optimized onboarding flows to get inspiration from (read: steal).

To make your life easier, I’ve analyzed some of the web’s most popular onboarding flows. I’ve also included some gaming-inspired learnings from my time as Product Manager at social games developer Playfish as well as insights from the onboarding design of Pie, the smart team chat app I’m currently working on.

Let’s dive in!

See Pieter’s post for the details but the highlights are:

  1. Don’t have a tutorial
  2. Let the user do it
  3. Don’t teach me all at once
  4. Let me experience the ‘wow!’
  5. Repeat to create a habit
  6. Use fewer words
  7. Don’t break flow
  8. Be adaptive
  9. Remove noise
  10. Use conventions

In terms of training/education, very little of this is new. For #6 “Use fewer words,” remember Strunk & White’s – “#13 Omit needless words.” Or compare #9 “remove noise” with Strunk & White #14 “Avoid a succession of loose sentences.”

Any decent UI/UX guide is going to give these rules in one form or another.

But it is important that they are repeated by different people and in various forms. Why? Open five applications at random on your phone or computer. How many out of those five have an interface that is immediately usable by a new user?

The message of what is required for good UI design is well known. Where that message fails is in the application of those principles.

At least to judge from current UIs. Yes?

Any “intuitive” UIs you would like to suggest as examples?

Towards building a Crowd-Sourced Sky Map

Monday, June 23rd, 2014

Towards building a Crowd-Sourced Sky Map by Dustin Lang, David W. Hogg, and, Bernhard Scholkopf.


We describe a system that builds a high dynamic-range and wide-angle image of the night sky by combining a large set of input images. The method makes use of pixel-rank information in the individual input images to improve a “consensus” pixel rank in the combined image. Because it only makes use of ranks and the complexity of the algorithm is linear in the number of images, the method is useful for large sets of uncalibrated images that might have undergone unknown non-linear tone mapping transformations for visualization or aesthetic reasons. We apply the method to images of the night sky (of unknown provenance) discovered on the Web. The method permits discovery of astronomical objects or features that are not visible in any of the input images taken individually. More importantly, however, it permits scientific exploitation of a huge source of astronomical images that would not be available to astronomical research without our automatic system.

If you have any astronomical photographs, you can contribute to a more complete knowledge of the night sky.

Scientific instruments moved beyond the reach of the citizen scientist in the late 19th/early 20th century and now data from instruments large and small are returning to the citizen scientist, whose laboratory is a local or cloud-based computer.


How Mapbox Works

Monday, June 23rd, 2014

How Mapbox Works

From the post:

Mapbox is a platform for creating and using maps. That means that it’s a collection of tools and services that connect together in different ways: you can draw a map on the website, use it on an iPhone, and get raw data from an API.

Let’s look at the parts and how they connect.

Great post!

Just in time if you have been considering Iraq in 27 Maps and how some of the maps are just “wrong” from your point of view.

Using modern mapping technology, users are no longer relegated to passive acceptance of the maps of others.

The Lambda Calculus for Absolute Dummies (like myself)

Monday, June 23rd, 2014

The Lambda Calculus for Absolute Dummies (like myself) by Joscha Bach.

From the post:

If there is one highly underrated concept in philosophy today, it is computation. Why is it so important? Because computationalism is the new mechanism. For millennia, philosophers have struggled when they wanted to express or doubt that the universe can be explained in a mechanical way, because it is so difficult to explain what a machine is, and what it is not. The term computation does just this: it defines exactly what machines can do, and what not. If the universe/the mind/the brain/bunnies/God is explicable in a mechanical way, then it is a computer, and vice versa.

Unfortunately, most people outside of programming and computer science don’t know exactly what computation means. Many may have heard of Turing Machines, but these things tend to do more harm than good, because they leave strong intuitions of moving wheels and tapes, instead of what it really does: embodying the nature of computation.

If you have ever struggled with the Lamda Calculus entry at Wikipedia, you will appreciate this well written introduction to the same subject.

I would re-title the post: Lamda Calculus by a Gifted Author.

I first heard about this post from Kirk Lowery.

Wolfram Programming Cloud Is Live!

Monday, June 23rd, 2014

Wolfram Programming Cloud Is Live! by Stephen Wolfram.

From the post:

Twenty-six years ago today we launched Mathematica 1.0. And I am excited that today we have what I think is another historic moment: the launch of Wolfram Programming Cloud—the first in a sequence of products based on the new Wolfram Language.

Wolfram Programming Cloud

My goal with the Wolfram Language in general—and Wolfram Programming Cloud in particular—is to redefine the process of programming, and to automate as much as possible, so that once a human can express what they want to do with sufficient clarity, all the details of how it is done should be handled automatically.

I’ve been working toward this for nearly 30 years, gradually building up the technology stack that is needed—at first in Mathematica, later also in Wolfram|Alpha, and now in definitive form in the Wolfram Language. The Wolfram Language, as I have explained elsewhere, is a new type of programming language: a knowledge-based language, whose philosophy is to build in as much knowledge about computation and about the world as possible—so that, among other things, as much as possible can be automated.

The Wolfram Programming Cloud is an application of the Wolfram Language—specifically for programming, and for creating and deploying cloud-based programs.

How does it work? Well, you should try it out! It’s incredibly simple to get started. Just go to the Wolfram Programming Cloud in any web browser, log in, and press New. You’ll get what we call a notebook (yes, we invented those more than 25 years ago, for Mathematica). Then you just start typing code.

I am waiting to validate my email address to access the Wolfram portal.

It will take weeks to evaluate some of the claims made for the portal but I can attest that the Wolfram site in general remains very responsive, despite what must be snowballing load today.

That in and of itself is a good sign.


I first saw this in a tweet by Christophe Lalanne.

Iraq in 27 Maps

Monday, June 23rd, 2014

27 maps that explain the crisis in Iraq by Zack Beauchamp, Max Fisher and Dylan Matthews.

From the post:

The current Iraq crisis began in early June, when the extremist group Islamic State of Iraq and the Levant (ISIS), which already controls parts of Syria, seized much of northern Iraq, including the major city of Mosul. The conflict has roots in Iraq’s complicated history, its religious and ethnic divisions, and of course in the Iraq War that began with the 2003 US-led invasion. These 27 maps are a rough guide to today’s crisis and the deeper forces behind it.

I am not at all sure if “explain” is the right word to use for these maps relative to the crisis in Iraq. Perhaps “illuminate” the complexity of the crisis in Iraq would be more accurate.

Moreover, these maps have the potential, in digital form, to act as interfaces to the complex religious, ethnic and historical background to the current crisis.

Western governments, to say nothing of governments in the Middle East, should be cautious about waving the “extremist” label around. Labeling any group as “extremist” reduces the options on all sides.

Call me maybe: Elasticsearch

Sunday, June 22nd, 2014

Call me maybe: Elasticsearch by Kyle Kingsbury.

Kyle attempts to answer the question: How safe is data in Elasticsearch?.

I say “attempts,” Kyle does a remarkable job of documenting unanswered questions and conditions that can lead to data loss with Elasticsearch. But you will find there is no final answer to the safety question, despite deep analysis and research.

Klye is an Elasticsearch user and does provide some guidance on making your Elasticsearch installation safer, not safe but safer.

Must reading for all serious users of Elasticsearch.

I first saw this in a tweet by Andrew Purtell.

Wikipedia Usage Statistics

Sunday, June 22nd, 2014

Wikipedia Usage Statistics by Paul Houle.

From the post:

The Wikimedia Foundation publishes page view statistics for Wikimedia projects here; this serveris rate-limited so it took roughly a month to transfer this 4 TB data set into S3 Storage in the AWS cloud. The photo on the left is of a hard drive containing a copy of the data that was produced with AWS Import/Export.

Once in S3, it is easy to process this data with Amazon Map/Reduce using the Open Source telepath software.

The first product developed from this is SubjectiveEye3D.

It’s your turn

Future projects require that this data be integrated with semantic data from :BaseKB and that has me working on tools such as RDFeasy. In the meantime, a mirror of the Wikipedia pagecounts from Jan 2008 to Feb 2014 is available in a requester pays bucket in S3 , which means you can use it in the Amazon Cloud for free and download data elsewhere for the cost of bulk network transfer.

Interesting isn’t it?

That “open” data can be so difficult to obtain and manipulate that it may as well not be “open” at all for the average user.

Something to keep in mind when big players talk about privacy. Do they mean private from their prying eyes or yours?

I think you will find in most cases that “privacy” means private from you and not the big players.

If you want to do a good deed for this week, support this data set at Gittip.

I first saw this in a tweet by Gregory Piatetsky.

!!CON 2014 (Videos/Transcripts)

Sunday, June 22nd, 2014

!!CON 2014 (Videos/Transcripts)

From the homepage:

!!Con (pronounced bang bang con) 2014 was two days of ten-minute talks (with lots of breaks, of course!) about what excites us about programming — the amazing, the strange, and the heartwarming. From kernel exploits to teaching kids to program, !!Con was about bringing the NYC programming community together to celebrate the joy and excitement and the surprising moments in what we do. We defined “programming community” broadly — you didn’t need to be a professional programmer to be part of !!Con.

Let me give you just a few of the titles of the presentations:

  • The Art of Obsession
  • The Sound of Segfaults!!
  • Nantucket! Hacking at verse
  • Now you’re thinking with PCMPISTRI!
  • How I used my knowledge of code (and music!) to help fight fires!
  • and more!

I haven’t watched all the videos, yet, but they have me thinking that ten minutes may be the ideal presentation length.

Long enough to present your core idea or what excites you yet not long enough to fill your slides with text. 😉 For details, the audience can read your paper.

I first saw this in a tweet by Julia Evans.

Online Bioinformatics / Computational Biology

Saturday, June 21st, 2014

An Annotated Online Bioinformatics / Computational Biology Curriculum by Stephen Turner.

From the post:

Two years ago David Searls published an article in PLoS Comp Bio describing a series of online courses in bioinformatics. Yesterday, the same author published an updated version, “A New Online Computational Biology Curriculum,” (PLoS Comput Biol 10(6): e1003662. doi: 10.1371/journal.pcbi.1003662).

This updated curriculum has a supplemental PDF describing hundreds of video courses that are foundational to a good understanding of computational biology and bioinformatics. The table of contents embedded into the PDF’s metadata (Adobe Reader: View>Navigation Panels>Bookmarks; Apple Preview: View>Table of Contents) breaks the curriculum down into 11 “departments” with links to online courses in each subject area:

  1. Mathematics Department
  2. Computer Science Department
  3. Data Science Department
  4. Chemistry Department
  5. Biology Department
  6. Computational Biology Department
  7. Evolutionary Biology Department
  8. Systems Biology Department
  9. Neurosciences Department
  10. Translational Sciences Department
  11. Humanities Department

The key term here is annotated. That is the author isn’t just listing courses from someone else’s list but has some experience with the course.

Should be a great resource whether you are a CS person looking at bioinformatics/computational biology or if you are a bioinformatics person trying to communicate with the CS side.


Revision of Serializing RDF Data…

Saturday, June 21st, 2014

Revision of Serializing RDF Data as Clojure Code Specification by Frédérick Giasson.

From the post:

In my previous blog post RDF Code: Serializing RDF Data as Clojure Code I did outline a first version of what a RDF serialization could look like if it would be serialized using Clojure code. However, after working with this proposal for two weeks, I found a few issues with the initial assumptions that I made that turned out to be bad design decisions in terms of Clojure code.

This blog post will discuss these issues, and I will update the initial set of rules that I defined in my previous blog post. Going forward, I will use the current rules as the way to serialize RDF data as Clojure code.

An example of where heavy data use with a proposal leads to its refinement!

Looking forward to more posts in this series.

What’s On Your Desktop?

Saturday, June 21st, 2014

The Analyst’s Toolbox by Simon Raper.

From the post:

There are hundreds, maybe thousands, of open source/free/online tools out there that form part of the analyst’s toolbox. Here’s what I have on my mac for day to day work. Click on the leaf node labels to be redirected to the relevant sites. Visualisation in D3.

Tools in day to day use by a live data analyst. Nice presentation as well.

What’s on your desktop?

How To Design A Great User Interface

Saturday, June 21st, 2014

How To Design A Great User Interface

From the post:

The goal and only purpose of a user interface (UI), as the name implies, is to create an experience for the user.

Many automated solutions exist to make UI design simpler and faster; however, the designer must understand some basic rules of how to design a user interface. Because the focus is centered on the potential user, the user’s needs must primarily drive all design choices.

What are the needs of the user?

  • To accomplish the task with relative ease
  • To complete the task quickly
  • To enjoy the experience

The single most important characteristic of the UI is that it has to work well and work consistently. Secondly, the UI must carry out commands and respond quickly and intuitively. Lastly, but still very important the user interface should be visually appealing to the user.

Projects like Egas may give you a boost in the right direction for a topic map authoring/navigation interface but you are going to be ultimately responsible for your own design.

This post and the related ones will give you an opportunity to understand some of the primary issues you will face in creating a great user interface.

If you have no other take away from this post, notice that “impressing the user with how you view the paradigm” isn’t one of the goals of a great user interface.


Saturday, June 21st, 2014

Egas: a collaborative and interactive document curation platform by David Campos, el al.


With the overwhelming amount of biomedical textual information being produced, several manual curation efforts have been set up to extract and store concepts and their relationships into structured resources. As manual annotation is a demanding and expensive task, computerized solutions were developed to perform such tasks automatically. However, high-end information extraction techniques are still not widely used by biomedical research communities, mainly because of the lack of standards and limitations in usability. Interactive annotation tools intend to fill this gap, taking advantage of automatic techniques and existing knowledge bases to assist expert curators in their daily tasks. This article presents Egas, a web-based platform for biomedical text mining and assisted curation with highly usable interfaces for manual and automatic in-line annotation of concepts and relations. A comprehensive set of de facto standard knowledge bases are integrated and indexed to provide straightforward concept normalization features. Real-time collaboration and conversation functionalities allow discussing details of the annotation task as well as providing instant feedback of curator’s interactions. Egas also provides interfaces for on-demand management of the annotation task settings and guidelines, and supports standard formats and literature services to import and export documents. By taking advantage of Egas, we participated in the BioCreative IV interactive annotation task, targeting the assisted identification of protein–protein interactions described in PubMed abstracts related to neuropathological disorders. When evaluated by expert curators, it obtained positive scores in terms of usability, reliability and performance. These results, together with the provided innovative features, place Egas as a state-of-the-art solution for fast and accurate curation of information, facilitating the task of creating and updating knowledge bases and annotated resources.

Database URL:

Read this article and/or visit the webpage and tell me this doesn’t have topic map editor written all over it!

Domain specific to be sure but any decent interface for authoring topic maps is going to be domain specific.

Very, very impressive!

I am following up with the team to check on the availability of the software.

A controlled vocabulary for pathway entities and events

Saturday, June 21st, 2014

A controlled vocabulary for pathway entities and events by Steve Jupe, et al.


Entities involved in pathways and the events they participate in require descriptive and unambiguous names that are often not available in the literature or elsewhere. Reactome is a manually curated open-source resource of human pathways. It is accessible via a website, available as downloads in standard reusable formats and via Representational State Transfer (REST)-ful and Simple Object Access Protocol (SOAP) application programming interfaces (APIs). We have devised a controlled vocabulary (CV) that creates concise, unambiguous and unique names for reactions (pathway events) and all the molecular entities they involve. The CV could be reapplied in any situation where names are used for pathway entities and events. Adoption of this CV would significantly improve naming consistency and readability, with consequent benefits for searching and data mining within and between databases.

Database URL:

There is no doubt that “unambiguous and unique names for reactions (pathway events) and all the molecular entities they involve” would have all the benefits listed by the authors.

Unfortunately, the experience of the HUGO Gene Nomenclature Committee, for example, has been that “other” names for genes are used and then the HUGO designation is created. Making the HUGO designation only one of several names a gene may have.

Another phrase for “universal name” is “an additional name.”

It is an impressive effort and should be useful in disambiguating the additional names for pathway entities and events.

FYI, from the homepage of the database:

Reactome is a free, open-source, curated and peer reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education.

Structure and Interpretation of Computer Programs (SICP)

Saturday, June 21st, 2014

Structure and Interpretation of Computer Programs (SICP) by Abelson & Sussman.

Already available from MIT Press for free, SICP is available here with newly typeset mathematics and figures.

You will find Kindle versions as well.

A truly remarkable addition to e-texts in computer science would be TAOCP. Somehow I don’t think that is going to happen anytime soon.

I first saw this in a tweet by thattommyhall.

Mainstream Honeypots?

Saturday, June 21st, 2014

Open-Source Tool Aimed At Propelling Honeypots Into the Mainstream by Kelly Jackson Higgins.

From the post:

Researchers have built a free open-source honeypot software program aimed at propelling the hacker decoys into security weapons for everyday organizations.

The Modern Honey Network (MHN) software, created by the Google Ventures-backed startup ThreatStream, automates much of the process of setting up and monitoring honeypots, as well as gleaning threat intelligence from them. An API allows it to integrate with IDSes, IPSes, application-layer firewalls, SIEM, and other security tools to set up defenses against attacks it detects.

Honeypots — basically lures posing as machines that let organizations gather intelligence and study the behaviors of attackers — long have been a popular and valuable tool for security researchers. There are plenty of open-source honeypot tools available today, but the high maintenance and complexity of deploying and running these lures have made them unrealistic security options for most businesses.

“Honeypots have never truly taken off in the enterprise,” says Greg Martin, CEO of ThreatStream, which provides a software-as-a-service threat intelligence system for large organizations like Northrop Grumman and SAIC. The goal of MHN is to simplify honeypot deployment and ultimately to make these tools a mainstream, inherent part of the security arsenal for companies in various industries.

MHN, meanwhile, can be used with a little crowdsourcing, too. “We’ve created a public server that pulls together intelligence [the systems gather], and you have the option to crowdsource the information,” Martin says. ThreatStream ultimately plans to share attack trends publicly: which countries are hosting the attacks and where DDoS attacks are occurring, for instance. “You can create a huge cyber weather map.”

The free honeypot tool is available here for download.

Hackers have already learned the lesson that shared information floats all attackers higher. Perhaps cyberdefense is taking a step in that direction with Modern Honey Network (MHN).

Collecting data is the first step towards authoring a topic map. What additional information would you want to collect in connection with that from MHN?

Storing and visualizing LinkedIn…

Saturday, June 21st, 2014

Storing and visualizing LinkedIn with Neo4j and sigma.js by Bob Briody.

From the post:

In this post I am going to present a way to:

  • load a linkedin networkvia the linkedIn developer API into neo4j using python
  • serve the network from neo4j using node.js, express.js, and cypher
  • display the network in the browser using sigma.js

Great post but it means one (1) down and two hundred and five (205) more to go, if you are a member of the social networks listed on List of social networking websites at Wikipedia, and that excludes dating sites and includes only “notable, well-known sites.”

I would be willing to bet that your social network of friends, members of your religious organization, people where you work, etc. would start to swell the number of other social networks that number you as a member.

Hmmm, so one off social network visualizations are just that, one off social network visualizations. You can been seen as part of one group and not say two or three intersecting groups.

Moreover, an update to one visualized network isn’t going to percolate into another visualized network.

There is the “normalize your graph” solution to integrate such resources but what if you aren’t the one to realize the need for “normalization?”

You have two separate actors in your graph visualization after doing the best you can. Another person encounters information indicating these “two” people are in fact one person. They update their data. But that updated knowledge has no impact on your visualization, unless you simply happen across it.

Seems like a poor way to run intelligence gathering doesn’t it?

Archive integration at Mattilsynet

Saturday, June 21st, 2014

Archive integration at Mattilsynet by Lars Marius Garshol (slides)

In addition to being on the path to become a prominent beer expert (see:, Lars Marius has long been involved in integration technologies in general and topic maps in particular.

These slides give a quick overview of a current integration project.

There is one point Lars makes that merits special attention:

No hard bindings from code to data model

  • code should have no knowledge of the data model
  • all data model-specific logic should be configuration
  • makes data changes much easier to handle

(slide 4)

Keep that in mind when evaluating ETL solutions. What is being hard coded?

PS: I was amused that Lars describes RDF as “Essentially a graph database….” True but the W3C starting marketing that claim only after graph databases had a surge in popularity.

Markup editors are manipulating directed acyclic graphs so I suppose they are graph editors as well. 😉


Friday, June 20th, 2014

Book-NLP: Natural language processing pipeline for book-length documents.

From the webpage:

BookNLP is a natural language processing pipeline that scales to books and other long documents (in English), including:

  • Part-of-speech tagging (Stanford)
  • Dependency parsing (MaltParser)
  • Named entity recognition (Stanford)
  • Character name clustering (e.g., “Tom”, “Tom Sawyer”, “Mr. Sawyer”, “Thomas Sawyer” -> TOM_SAWYER)
  • Quotation speaker identification
  • Pronominal coreference resolution

I can think of several classes of documents where this would be useful. Congressional hearing documents for example. Agency reports would be another.

Not the final word for mapping but certainly an assist to an author.

Processing satellite imagery

Friday, June 20th, 2014

Processing satellite imagery

From the post:

Need to add imagery to your map? This tutorial will teach you the basics of image processing for mapping, including an introduction to raster data, and how to acquire, publish and process raster imagery of our world.

Open-source and at your fingertips. Let’s dive in.

From Mapbox and very cool!

It’s not short so get a fresh cup of coffee and enjoy the tour!

Friends of the NSA

Friday, June 20th, 2014

Governments let NSA tap cables on their territory, latest Snowden revelations show by David Meyer.

David has a great summary of recent Snowden leaks that make it clear that multiple governments were cooperating with the NSA tapping efforts.

From the post:

Who’s in? Some of these “third-party”, non-Five Eyes partners, as listed in other Snowden documents: Algeria, Austria, Belgium, Croatia, the Czech Republic, Denmark, Ethiopia, Finland, France, Germany, Greece, Hungary, India, Israel, Italy, Japan, Jordan, Macedonia, the Netherlands, Norway, Pakistan, Poland, Romania, Saudi Arabia, Singapore, South Korea, Spain, Sweden, Taiwan, Thailand, Tunisia, Turkey and the United Arab Emirates.

Assuming that the storage issues could be solved, the NSA could at the very least support itself by selling copies of intercepted conversations, email, etc. On the open market with eBay style bidding.

How much would you pay to hear a conversation between Obama and Putin?

Would not solve the privacy issue but would make the NSA less of a drain on US taxpayers.

Poor Reasoning Travels Faster Than Bad News

Friday, June 20th, 2014

Google forced to e-forget a company worldwide by Lisa Vaas.

From the post:

Forcing Google to develop amnesia is turning out to be contagious.

Likely inspired by Europeans winning the right to be forgotten in Google search results last month, a Canadian court has ruled that Google has to remove search results for a Canadian company’s competitor, not just in Canada but around the world.

The Supreme Court of British Columbia ruled on 13 June that Google had two weeks to forget the websites of a handful of companies with “Datalink” in their names.

I didn’t know I was being prescient in Feathers, Gossip and the European Union Court of Justice (ECJ) when I said:

Even if Google, removes all of its references from a particular source, the information could be re-indexed in the future from new sources.

That is precisely the issue in the Canadian case. Google removes specific URLs only to have different URLs for the same company appear in their search results.

The tenor of the decision is best captured by:

The Court must adapt to the reality of e-commerce with its potential for abuse by those who would take the property of others and sell it through the borderless electronic web of the internet. I conclude that an interim injunction should be granted compelling Google to block the defendants’ websites from Google’s search results worldwide. That order is necessary to preserve the Court’s process and to ensure that the defendants cannot continue to flout the Court’s orders. [159]

What you won’t find in the decision is any mention of the plaintiffs tracking down the funds from e-commerce sites alleged to be selling the product in question. Odd isn’t it? The plaintiffs are concerned about enjoined sales but make no effort to recover funds from those sales?

Of course, allowing a Canadian court (or any court) to press-gang anyone at hand to help enforce its order is very attractive, to courts at least. Should not be so attractive to anyone concerned with robust e-commerce.

If enjoined sales were occurring, there may have been evidence on sales but the court fails to mention it, plaintiffs had more than enough remedies to pursue those transactions.

Instead, with the aid of a local court, the plaintiffs are forcing Google to act as its unpaid worldwide e-commerce assassin.

More convenient for the local plaintiff but a bad news for global e-commerce.

PS: I don’t suppose anyone will be registering new websites with the name “Datalink” in them just to highlight the absurdity of this decision.

Software Patent Earthquake!

Thursday, June 19th, 2014

The details are far from settled but in Alice v. CSL Bank, the US Supreme Court ruled 9-0 that a software patent is invalid.

From the opinion:

We hold that the claims at issue are drawn to the abstract idea of intermediated settlement, and that merely requiring generic computer implementation fails to transform that abstract idea into a patent-eligible invention.

If you want to buy a software portfolio, I would do it quickly, while patent holders are still in a panic. 😉

Improving GitHub for science

Thursday, June 19th, 2014

Improving GitHub for science

From the post:

GitHub is being used today to build scientific software that’s helping find Earth-like planets in other solar systems, analyze DNA, and build open source rockets.

Seeing these projects and all this momentum within academia has pushed us to think about how we can make GitHub a better tool for research. As scientific experiments become more complex and their datasets grow, researchers are spending more of their time writing tools and software to analyze the data they collect. Right now though, these efforts often happen in isolation.

Citable code for academic software

Sharing your work is good, but collaborating while also getting required academic credit is even better. Over the past couple of months we’ve been working with the Mozilla Science Lab and data archivers, Figshare and Zenodo, to make it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.

DOIs form the backbone of the academic reference and metrics system. With a DOI for your GitHub repository archive, your code becomes citable. Our newest Guide explains how to create a DOI for your repository.

A move in the right direction to be sure but how much of a move is open to question.

Think of a DOI as the equivalent to a International Standard Book Number (ISBN). Using that as an identifier, you are sure to find a book that I cite.

But if the book is several hundred pages long, you may find my “citing it” by an ISBN identifier alone isn’t quite good enough.

The same will be true for some citations using DOIs for Github repositories. Better than nothing at all, but falls short of a robust identifier for material within a Github archive.

I first saw this in a tweet by Peter Kraker.


Thursday, June 19th, 2014

Rth: a Flexible Parallel Computation Package for R by Norm Matloff.

From the post:

The key feature of Rth is in the word flexible in the title of this post, which refers to the fact that Rth can be used on two different kinds of platforms for parallel computation: multicore systems and Graphics Processing Units (GPUs). You all know about the former–it’s hard to buy a PC these days that is not at least dual-core–and many of you know about the latter. If your PC or laptop has a somewhat high-end graphics card, this enables extremely fast computation on certain kinds of problems. So, whether have, say, a quad-core PC or a good NVIDIA graphics card, you can run Rth for fast computation, again for certain types of applications. And both multicore and GPUs are available in the Amazon EC2 cloud service.

Rth Quick Start

Our Rth home page tells you the GitHub site at which you can obtain the package, and how to install it. (We plan to place it on CRAN later on.) Usage is simple, as in this example:

Rth is an example of what I call Pretty Good Parallelism (an allusion to Pretty Good Privacy). For certain applications it can get you good speedup on two different kinds of common platforms (multicore, GPU). Like most parallel computation systems, it works best on very regular, “embarrassingly parallel” problems. For very irregular, complex apps, one may need to resort to very detailed C code to get a good speedup.

Rth has not been tested on Windows so I am sure the authors would appreciate reports on your use of Rth with Windows.

Contributions of new Rth functions are solicited. At least if you don’t mind making parallel processing easier for everyone. 😉

I first saw this in a tweet by Christopher Lalanne.

TeX Live 2014 released…

Thursday, June 19th, 2014

TeX Live 2014 released – what’s new by Stefan Kottwitz.

Just enough to get you interested:

  • TeX and MetaFont updates
  • pdfTeX with “fake spaces”
  • LuaTeX, engine that can reside in CPU cache
  • numerous other changes and improvements

Stefan covers these and more, while pointing you to the documentation for more details.

Has anyone calculated how many decades TeX/LaTeX are ahead of the average word processor?

Just curious.

Time-Based Versioned Graphs

Wednesday, June 18th, 2014

Time-Based Versioned Graphs

From the post:

Many graph database applications need to version a graph so as to see what it looked like at a particular point in time. Neo4j doesn’t provide intrinsic support either at the level of its labelled property graph or in the Cypher query language for versioning. Therefore, to version a graph we need to make our application graph data model and queries version aware.

Separate Structure From State

The key to versioning a graph is separating structure from state. This allows us to version the graph’s structure independently of its state.

To help describe how to design a version-aware graph model, I’m going to introduce some new terminology: identity nodes, state nodes, structural relationships and state relationships.

Identity Nodes

Identity nodes are used to represent the positions of entities in a domain-meaningful graph structure. Each identity node contains one or more immutable properties, which together constitute an entity’s identity. In a version-free graph (the kind of graph we build normally) nodes tend to represent both an entity’s position and its state. Identity nodes in a version-aware graph, in contrast, serve only to identify and locate an entity in a network structure.

Structural Relationships

Identity nodes are connected to one another using timestamped structural relationships. These structural relationships are similar to the domain relationships we’d include in a version-free graph, except they have two additional properties, from and to, both of which are timestamps.

State Nodes and Relationships

Connected to each identity node are one or more state nodes. Each state node represents a snapshot of an entity’s state. State nodes are connected to identity nodes using timestamped state relationships.

Great modeling example but you have to wonder about a graph implementation that doesn’t support versioning out of the box.

It can be convenient to treat data as though it were stable, but we all know that isn’t true.

Don’t we?

Lock Maker Proclaims Locks Best For Security

Wednesday, June 18th, 2014

The best defense against surveillance in the cloud is strong locks, says Amazon CTO Werner Vogels by Mathew Ingram.

From the post:

Although fear of government surveillance has made Amazon’s job more challenging when it comes to selling the benefits of cloud data storage, Amazon’s chief technology officer Werner Vogels told attendees at the Structure conference in San Francisco that the company continues to see strong growth in demand both inside and outside the United States, and it is responding to customers concerns about surveillance by stressing two things: strong encryption and the control that Amazon and its AWS infrastructure give to users.

Vogels described how Neelie Kroes, digital commissioner for the European Commission, said in a recent speech that no matter what regulations countries have around privacy or surveillance, hackers and spies will always try to get around them, and so the best defense isn’t a good lawyer, it’s a good lock — and Amazon “has the best locks,” Vogels said. “The point is that the customer needs to be in control of their data, and we give them full confidence that no one is going to access their data but themselves.”

You may enjoy the interview if you are looking for reassurance. If you are looking for security advice, drive on.

One example to rebut the “strong lock” argument. The NSA no doubt has pages of protocol about sysadmins changing their passwords, etc. So from a lock standpoint, the NSA had some rocking locks!

Except, some of the key holders to the locks decided to share their keys. Oh.

Locks are only one part, an important one but still just one, of a complex of measures that define “security” for an entity.

Anyone who says differently, is selling you a partial solution to your security problems.

US Museums

Wednesday, June 18th, 2014

US Museums

From the webpage:

There are over 35,000 museums in the United States! Click on one below, browse around, or use the search form to find the next one you will visit.

Minimal information at present but certainly a starting place for collaboration and enrichment!

Such as harvesting the catalogs of the museums that have them online.


I first saw this in a tweet by Lincoln Mullen.