Archive for July, 2014

Processing 3.0a1

Monday, July 28th, 2014

Processing 3.0a1

From the description:

3.0a1 (26 July 2014) Win 32 / Win 64 / Linux 32 / Linux 64 / Mac OS X.

The revisions cover incremental changes between releases, and are especially important to read for pre-releases.

From the revisions:

Kicking off the 3.0 release process. The focus for Processing 3 is improving the editor and the coding process, so we’ll be integrating what was formerly PDE X as the main editor.

This release also includes a number of bug fixes and changes, based on in-progress Google Summer of Code projects and a few helpful souls on Github.

Please contribute to the Processing 3 release by testing and reporting bugs. Or better yet, helping us fix them and submitting pull requests.

In case you are unfamiliar with Processing:

Processing is a programming language, development environment, and online community. Since 2001, Processing has promoted software literacy within the visual arts and visual literacy within technology. Initially created to serve as a software sketchbook and to teach computer programming fundamentals within a visual context, Processing evolved into a development tool for professionals. Today, there are tens of thousands of students, artists, designers, researchers, and hobbyists who use Processing for learning, prototyping, and production.


Street Slope Map

Sunday, July 27th, 2014

A neat idea for maps:

street slope map

See on Twitter.

I can think of a number of use cases for street slope information. Along with surveillance camera coverage, average lighting conditions, average police patrols, etc.

I first saw this in a tweet by Bob Lehman.


Sunday, July 27th, 2014

PHPTMAPI 3 by Johannes Schmidt.

From the webpage:

PHPTMAPI 3 is the succession project of

PHPTMAPI is a PHP5 API for creating and manipulating topic maps, based on the project. This API enables PHP developers an easy and standardized implementation of ISO/IEC 13250 Topic Maps in their applications.

What is TMAPI?

TMAPI is a programming interface for accessing and manipulating data held in a topic map. The TMAPI specification defines a set of core interfaces which must be implemented by a compliant application as well as (eventually) a set of additional interfaces which may be implemented by a compliant application or which may be built upon the core interfaces.

Please spread the word to our PHP brethren.

Elementary Algorithms

Sunday, July 27th, 2014

Elementary Algorithms by Xinyu LIU.

From the github page:

AlgoXY is a free book about elementary algorithms and data structures. This book doesn’t only focus on an imperative (or procedural) approach, but also includes purely functional algorithms and data structures. It doesn’t require readers to master any programming languages, because all the algorithms are described using mathematical functions and pseudocode.

For reference and implementation purposes, source code in C, C++, Haskell, Python, Scheme/Lisp is available in addition to the book.

The contents of the book are provided under GNU FDL and the source code is under GNU GPLv3.

The PDF version can be downloaded from github:

This book is also available online at:

I was concerned when the HTML version for trie was only 2 pages long. You need to view the pdf version, which for trie is some forty (40) pages, to get an idea of the coverage of any particular algorithm.

I first saw this in a tweet by OxAX

Digital Humanities and Computer Science

Sunday, July 27th, 2014

Chicago Colloquium on Digital Humanities and Computer Science


1 August 2014, abstracts of ~ 750 words and a minimal bio sent to

31 August 2014, Deadline for Early Registration Discount.

19 September 2014, Dealing for group rate reservations at the Orrington Hotel.

23-24 October, 2014 Colloquium.

From the call for papers:

The ninth annual meeting of the Chicago Colloquium on Digital Humanities and Computer Science (DHCS) will be hosted by Northwestern University on October 23-24, 2014.

The DHCS Colloquium has been a lively regional conference (with non-trivial bi-coastal and overseas sprinkling), rotating since 2006 among the University of Chicago (where it began), DePaul, IIT, Loyola, and Northwestern. At the first Colloquium Greg Crane asked his memorable question “What to do with a million books?” Here are some highlights that I remember across the years:

  • An NLP programmer at Los Alamos talking about the ways security clearances prevented CIA analysts and technical folks from talking to each other.
  • A demonstration that if you replaced all content words in Arabic texts and focused just on stop words you could determine with a high degree of certainty the geographical origin of a given piece of writing.
  • A visualization of phrases like “the king’s daughter” in a sizable corpus, telling you much about who owned what.
  • A social network analysis of Alexander the Great and his entourage.
  • An amazingly successful extraction of verbal parallels from very noisy data.
  • Did you know that Jane Austen was a game theorist before her time and that her characters were either skillful or clueless practitioners of this art?

And so forth. Given my own interests, I tend to remember “Text as Data” stuff, but there was much else about archaeology, art, music, history, and social or political life. You can browse through some of the older programs at


One of the weather sites promises that October is between 42 F for the low and 62 F for the high (on average). Sounds like a nice time to visit Northwestern University!

To say nothing of an exciting conference!

I first saw this in a tweet by David Bamman.

Ten habits of highly effective data:…

Sunday, July 27th, 2014

Ten habits of highly effective data: Helping your dataset achieve its full potential by Anita de Waard.

Anita gives all the high minded and very legitimate reasons for creating highly effective data, with examples.

Read her slides to pick up the rhetoric you need and leads on how to create highly effective data.

Let me add one concern to drive your interest in creating highly effective data:

Funders want researchers to create highly effective data.

Enough said?

Answers to creating highly effective data continue to evolve but not attempting to create highly effective data is a losing proposal.

Underspecifying Meaning

Sunday, July 27th, 2014

Word Meanings Evolve to Selectively Preserve Distinctions on Salient Dimensions by Catriona Silvey, Simon Kirby, and Kenny Smith.


Words refer to objects in the world, but this correspondence is not one-to-one: Each word has a range of referents that share features on some dimensions but differ on others. This property of language is called underspecification. Parts of the lexicon have characteristic patterns of underspecification; for example, artifact nouns tend to specify shape, but not color, whereas substance nouns specify material but not shape. These regularities in the lexicon enable learners to generalize new words appropriately. How does the lexicon come to have these helpful regularities? We test the hypothesis that systematic backgrounding of some dimensions during learning and use causes language to gradually change, over repeated episodes of transmission, to produce a lexicon with strong patterns of underspecification across these less salient dimensions. This offers a cultural evolutionary mechanism linking individual word learning and generalization to the origin of regularities in the lexicon that help learners generalize words appropriately.

I can’t seem to access the article today but the premise is intriguing.

Perhaps people can have different “…less salient dimensions…” and therefore are generalizing words “inappropriately” from the standpoint of another person.

Curious if a test can be devised to identify those “…less salient dimensions…” in some target population? Might lead to faster identification of terms likely to be mis-understood.

Clojure Destructuring Tutorial….

Sunday, July 27th, 2014

Clojure Destructuring Tutorial and Cheat Sheet by John Louis Del Rosario.

From the post:

When I try to write or read some Clojure code, every now and then I get stuck on some destructuring forms. It’s like a broken record. One moment I’m in the zone, then this thing hits me and I have to stop what I’m doing to try and grok what I’m looking at.

So I decided I’d write a little tutorial/cheatsheet for Clojure destructuring, both as an attempt to really grok it (I absorb stuff more quickly if I write it down), and as future reference for myself and others.

Below is the whole thing, copied from the original gist. I’m planning on adding more (elaborate) examples and a section for compojure’s own destructuring forms. If you want to bookmark the cheat sheet, I recommend the gist since it has proper syntax highlighting and will be updated first.

John’s right, the gist version is easier to read.

As of 27 July 2014, the sections on “More Examples” and “Compojure” are blank if you feel like contributing.

I first saw this in a tweet by Daniel Higginbotham.

The Simplicity of Clojure

Sunday, July 27th, 2014

The Simplicity of Clojure by Bridget Hillyer and Clinton N. Dreisbach. OSCON 2014.

A great overview of Clojure that covers:

  • Clojure Overview
  • Collections
  • Sequences
  • Modeling
  • Functions
  • Flow Control
  • Polymorphism
  • State
  • Coljure Libraries

Granted they are slides so you need to fill in with other sources of content, such as Clojure for the Brave and True, but they do provide an outline for learning more.

I first saw this in a tweet by Christophe Lalanne.

Stanford Large Network Dataset Collection

Saturday, July 26th, 2014

Stanford Large Network Dataset Collection by Jure Leskovec.

From the webpage:

SNAP networks are also availalbe from UF Sparse Matrix collection. Visualizations of SNAP networks by Tim Davis.

If you need software to go with these datasets, consider Stanford Network Analysis Platform (SNAP)

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

A Python interface is available for SNAP.

I first saw this at: Stanford Releases Large Network Datasets by Ryan Swanstrom.

Digital Commonplace Book?

Saturday, July 26th, 2014

Rick Minerich reviews a precursor to a digital commonplace book in Sony Digital Paper DPT-S1 at Lambda Jam 2014.

Limited to PDF files which you can highlight text, attach annotations (which can be exported), and you can use the DPT-S1 as a notepad.

To take the DTP-S1 a step further towards creating a commonplace book, it should:

  1. Export highlighted text with a reference to the text of origin
  2. Export annotated text with a reference to the text of origin
  3. Enable export target of note pages in the DPT-S1
  4. Enable pages that “roll” off the display (larger page sizes)
  5. Enable support of more formats

The first application (software or hardware) with reference preserving cut-n-paste from a variety of formats to the user’s note-taking format, will be a killer app.

And one step closer to being a digital commonplace book.

BTW, one authorized re-seller for the DPT-S1 has this notice on their website:

PLEASE NOTE: As of now we are only authorized to sell the Sony DPT-S1 within the Entertainment Industry. This is a pilot program and we are NO LONGER selling to the general public.

We understand that this is frustrating to many as this is a VERY popular product, however at this time we can provide NO INFORMATION regarding sales to the general public. This is a non-negotiable aspect of our agreement with Sony and regrettably, any inquiries by the general public will not be answered. Thank you for your understanding.
(Text color as it appears on the website.)

I can think of other words than “frustrating.”

Hopefully the popularity of the current version will encourage Sony to cure some of its limitations and make it more widely available.

The Sony Digital Paper site.

Resellers for legal and financial, law library, entertainment, and “all other professions.”

Or perhaps someone else will overcome the current limitations of the DPT-S1 and Sony will regret its overly restrictive marketing policies.

I first saw this in a tweet by Adam Foltzer.

Apple Backdoor Update – Not False

Friday, July 25th, 2014

Yesterday I posted: Do you want a backdoor with that iPhone/iPad? only to read today UPDATE: The Apple backdoor that wasn’t by Violet Blue.

Thinking I may have been taken in by a hoax, I read Violet’s post rather carefully:

From Violet’s post:

Since Mr. Zdziarski presented “Identifying back doors, attack points, and surveillance mechanisms in iOS devices“, his miscasting of Apple’s developer diagnostics as a “backdoor” was defeated on Twitter, debunked and saw SourceClear calling Zdziarski an attention seeker in Computerworld, and Apple issued a statement saying that no, this is false.

In fact, this allegedly “secret backdoor” was added to diagnostic information that has been as freely available as a page out of a phone book since 2002.

Interesting. So if you are called an “attention seeker” in Computerworld and a vendor denies your claim, the story is false?

Let’s read the sources before jumping to the conclusion that the story was false.

From the Computerworld account:

Apple swiftly rejected Zdziarski’s accusations, pointing out that end users are in complete control of the claimed hacking process — the person owning the device must have unlocked it and “agreed to trust another computer before the computer is able” to access the diagnostic data the claimed NerveGas attack focuses on.

The author of the article I quoted said:

For the backdoor to be exploited by a spy, your iDevice needs to be synced to another computer via a feature called iOS pairing.

Once your iDevice is paired to your PC or Mac, they exchange encryption keys and certificates to establish an encrypted SSL tunnel, and the keys are never deleted unless the iPhone or iPad is wiped with a factory reset.

That means a hacker could insert spyware on your computer to steal the pairing keys, which allows them to locate and connect to your device via Wi-Fi.

Sounds to me like Apple and Zorabedian agree on the necessary conditions for the exploit. You have to “unlock” and “trust another computer.”


Violet Blue ignores or doesn’t bother to read the technical agreement between Apple and Zorabedian but to take Zdziarski to task for name calling and attention seeking.

The second accusation is a pot calling the kettle black situation.

Zorabedian should have said that Apple had this backdoor since 2002, which is a useful correction to the original story.

Advanced Data Analysis from an Elementary Point of View (update)

Friday, July 25th, 2014

Advanced Data Analysis from an Elementary Point of View by Cosma Rohilla Shalizi. (8 January 2014)

From the introduction:

These are the notes for 36-402, Advanced Data Analysis, at Carnegie Mellon. If you are not enrolled in the class, you should know that it’s the methodological capstone of the core statistics sequence taken by our undergraduate majors (usually in their third year), and by students from a range of other departments. By this point, they have taken classes in introductory statistics and data analysis, probability theory, mathematical statistics, and modern linear regression (“401”). This class does not presume that you have learned but forgotten the material from the pre-requisites; it presumes that you know that material and can go beyond it. The class also presumes a firm grasp on linear algebra and multivariable calculus, and that you can read and write simple functions in R. If you are lacking in any of these areas, now would be an excellent time to leave.

I last reported on this draft in 2012 at: Advanced Data Analysis from an Elementary Point of View

Looking forward to this works publication by Cambridge University Press.

I first saw this in a tweet by Mark Patterson.

Open-Data Data Dumps

Friday, July 25th, 2014

Government’s open-data portal at risk of becoming a data dump by Jj Worrall.

From the post:

The Government’s new open-data portal is not yet where it would like it to be, Minister Brendan Howlin said in a Department of Public Expenditure and Reform meeting room earlier this week.

In case expectations are too high, the word “pilot” is in italics when you visit the site in question –

Meanwhile the words “start” and “beginning” pepper the conversation with the Minister and a variety of data experts from the Insight Centre in NUI Galway who have helped create the site. allows those in the Government, as well as interested businesses and citizens, to examine data from a variety of public bodies, opening opportunities for Government efficiencies and commercial possibilities along the way.

The main problem is that there is not much of it, and a lot of what is there can’t be utilised in a particularly useful fashion.

As director of the US Open Data Institute Waldo Jaquith told The Irish Times, with “almost no data” available in a format that’s genuinely usable by app developers, businesses or interested parties, for the moment represents “a haphazard collection of data”.

It is important to realize that governments and their staffs have very little experience at being open and/or sharing data. Among the reasons for being reluctant to post open-data are:

  1. Less power over you since requests for data cannot be delayed or denied
  2. Less power in general because others will have the data
  3. Less power to confer on others by exclusive access to the data
  4. Less security since data may show poor results or performance
  5. Less security since data may show favoritism or fraud
  6. Less prestige as the source of answers on the data

Not an exhaustive list but it is a reminder that changing the attitudes about open-data probably beyond your reach.

What you can do with a site such as, is to find a dataset of interest to you and make concrete suggestions for improvements.

There are a number of government staffers who I didn’t capture in my list of reasons to not share data. Side with them and facilitate their work.

For example:

Met Éireann Climate Products. A polite note to Evelyn.O’ should point out that an order form and price list doesn’t really constitute “open-data” in the sense citizens and developers expect. Should take the “resource” off the listing and make it available elsewhere. “Data products to order” for example.


Weather Buoy Network Real Time Data, where if you dig long enough, you will find that you can download csv formatted data by blindly guessing at bouy names. A map of bouy locations would greatly assist at that point, not to mention having an RSS feed for bouy data as it is received. Downloading a file tells me I am not getting “Real Time Data.” Yes?

Not major improvements but would improve those two items at any rate.

It will take time but ultimately sharing staff will prevail. You can hasten that day’s arrival or you can retard it. Your choice.

I first saw this in a tweet by Deirdre Lee.

Tools and Resources Development Fund [bioscience ODF UK]

Friday, July 25th, 2014

Tools and Resources Development Fund

Application deadline: 17 September 2014, 4pm

From the summary:

Our Tools and Resources Development Fund (TRDF) aims to pump prime the next generation of tools, technologies and resources that will be required by bioscience researchers in scientific areas within our remit. It is anticipated that successful grants will not exceed £150k (£187k FEC) (ref 1) and a fast-track, light touch peer review process will operate to enable researchers to respond rapidly to emerging challenges and opportunities.

Projects are expected to have a maximum value of £150k (ref 1). The duration of projects should be between 6 and 18 months, although community networks to develop standards could be supported for up to 3 years.

A number of different types of proposal are eligible for consideration.

  • New approaches to the analysis, modelling and interpretation of research data in the biological sciences, including development of software tools and algorithms. Of particular interest will be proposals that address challenges arising from emerging new types of data and proposals that address known problems associated with data handling (e.g. next generation sequencing, high-throughput phenotyping, the extraction of data from challenging biological images, metagenomics).
  • New frameworks for the curation, sharing, and re-use/re-purposing of research data in the biological sciences, including embedding data citation mechanisms (e.g. persistent identifiers for datasets within research workflows) and novel data management planning (DMP) implementations (e.g. integration of DMP tools within research workflows)
  • Community approaches to the sharing of research data including the development of standards (this could include coordinating UK input into international standards development activities).
  • Approaches designed to exploit the latest computational technology to further biological research; for example, to facilitate the use of cloud computing approaches or high performance computing architectures.

Projects may extend existing software resources; however, the call is designed to support novel tools and methods. Incremental improvement and maintenance of existing software that does not provide new functionality or significant performance improvements (e.g. by migration to an advanced computing environment) does not fall within the scope of the call.

Very timely since the UK announcement that OpenDocument Format (ODF) is among the open standards:

The standards set out the document file formats that are expected to be used across all government bodies. Government will begin using open formats that will ensure that citizens and people working in government can use the applications that best meet their needs when they are viewing or working on documents together. (Open document formats selected to meet user needs)

ODF as a format supports RDFa as metadata but lacks an implementation that makes full use of that capability.

Imagine biocuration that:

  • Starts with authors writing a text and is delivered to
  • Publishers, who can proof or augment the author’s biocuration
  • Results are curated on on publication (not months or years later)
  • Results are immediately available for collation with other results.

The only way to match the explosive growth of bioscience publications with equally explosive growth of bioscience curation, is to use tools the user already knows. Like word processing software.

Please pass this along and let me know of other grants or funding opportunities where adaptation of office standards or software could change the fundamentals of workflow.

Neo4j Index Confusion

Friday, July 25th, 2014

Neo4j Index Confusion by Nigel Small.

From the post:

Since the release of Neo4j 2.0 and the introduction of schema indexes, I have had to answer an increasing number of questions arising from confusion between the two types of index now available: schema indexes and legacy indexes. For clarification, these are two completely different concepts and are not interchangable or compatible in any way. It is important, therefore, to make sure you know which you are using.

Nigel forgets to mention that legacy indexes were based on Lucene, schema indexes, not.

If you are interested in the technical details of the schema indexes, start with On Creating a MapDB Schema Index Provider for Neo4j 2.0 by Michael Hunger.

Michael says in his tests that the new indexing solution is faster than Lucene. Or more accurately, faster than Lucene as used in prior Neo4j versions.

How simple are your indexing needs?

Pussy Stalking [Geo-Location as merge point]

Friday, July 25th, 2014

Cat stalker knows where your kitty lives (and it’s your fault) by Lisa Vaas.

From the post:

Ever posted a picture of your cat online?

Unless your privacy settings avoid making APIs publicly available on sites like Flickr, Twitpic, Instagram or the like, there’s a cat stalker who knows where your liddl’ puddin’ lives, and he’s totally pwned your pussy by geolocating it.

That’s right, fat-faced grey one from Al Khobar in Saudi Arabia, Owen Mundy knows you live on Tabari Street.

cat stalker

Mundy, a data analyst, artist, and Associate Professor in the Department of Art at Florida State University, has been working on the data visualisation project, which is called I Know Where Your Cat Lives.

See Lisa’s post for the details about the “I Know Where Your Cat Lives” project.

The same data leakage is found in other types of photographs as well. Such as photographs by military personnel.

An enterprising collector could use geolocation as a merge point to collect all the photos made at a particular location. Or using geolocation ask “who?” for some location X.

Or perhaps a city map using geolocated images to ask “who?” Everyone may not know your name but with a large enough base of users, someone will.

PS: There is at least one app for facial recognition, NameTag. I don’t have a cellphone so you will have to comment on how well it works. I still like the idea of a “who?” site. Perhaps because I prefer human intell over data vacuuming.

Linked Data 2011 – 2014

Friday, July 25th, 2014

One of the more well known visualizations of the Linked Data Cloud has been updated to 2014. For comparison purposes, I have included the 2011 version as well.

LOD Cloud 2011

LOD Cloud 2011

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.”

LOD Cloud 2014

LOD Cloud 2014

From Adoption of Linked Data Best Practices in Different Topical Domains by Max Schmachtenberg, Heiko Paulheim, and Christian Bizer.

How would you characterize the differences between the two?

Partially a question of how to use large graph displays? Even though the originals (in some versions) are interactive, how often is an overview of related linked data sets required?

I first saw this in a tweet by Juan Sequeda.

How to Make a Complete Map…

Thursday, July 24th, 2014

How to Make a Complete Map of Every Thought You Think by Lion Kimbro.

From the introduction:

This book is about how to make a complete map of everything you think for as long as you like.

Whether that’s good or not, I don’t know- keeping a map of all your thoughts has a “freezing” effect on the mind. It takes a lot of (albeit pleasurable) work, but produces nothing but SIGHT.

If you do the things described in this book, you will be IMMOBILIZED for the duration of your commitment.The immobilization will come on gradually, but steadily. In the end, you will be incapable of going somewhere without your cache of notes, and will always want a pen and paper w/ you. When you do not have pen and paper, you will rely on complex memory pegging devices, described in “The Memory Book”. You will NEVER BE WITHOUT RECORD, and you will ALWAYS RECORD.

YOU MAY ALSO ARTICULATE. Your thoughts will be clearer to you than they have ever been before. You will see things you have never seen before. When someone shows you one corner, you’ll have the other 3 in mind. This is both good and bad. It means you will have the right information at the right time in the right place. It also means you may have trouble shutting up. Your mileage may vary.

You will not only be immobilized in the arena of action, but you will also be immobilized in the arena of thought. This appears to be contradictory, but it’s not really. When you are writing down your thoughts, you are making them clear to yourself, but when you revise your thoughts, it requires a lot of work- you have to update old ideas to point to new ideas. This discourages a lot of new thinking. There is also a “structural integrity” to your old thoughts that will resist change. You may actively not-think certain things, because it would demand a lot of note keeping work. (Thus the notion that notebooks are best applied to things that are not changing.)

Sounds bizarre. Yes?

Here is how the BBC’s Giles Turnbull summarized the system:

The system breaks down into simple jottings made during the day – what he calls “speeds”. These can be made on sheets of paper set aside for multiple subjects, or added directly to sheets dedicated to a specific subject. Speeds are made on the fly, as they happen, and it’s up to the writer to transcribe these into another section of the notebook system later on.

Lion suggests using large binders full of loose sheets of paper so that individual sheets can be added, removed and moved from one place to another. Notes can be given subjects and context hints as they are made, to help the writer file them into larger, archived binders when the time comes to organise their thoughts.

Even so, the writer is expected to carry one binder around with them at all times, and add new notes as often as possible, augmented with diagrams, arrows and maps.

With that summary description, it becomes apparent that Lion has reinvented the commonplace book, this one limited to your own thoughts.

Have you thought any more about how to create a digital commonbook interface?

Do you want a backdoor with that iPhone/iPad?

Thursday, July 24th, 2014

iPhone/iPad sales reps never have to ask that question. Every iPhone and IPad has a built-in backdoor. Accessible over any Wi-Fi network. How convenient.

BTW, this is not a bug, it is a feature. According to John Zorabedian, Apple calls it a “diagnostic function.” In other words, this is no accidental bug, this is a feature!

See John’s complete report at: iSpy? Researcher exposes backdoor in iPhones and iPads.

After you read John’s post, re-blog it or point to it on Apple/iPhone/iPad forums, lists, etc.

Perhaps the default screen on iPhones and iPads should read:

You are in a crowded shopping mall. You are naked.

Just to remind users of the security status of these Apple devices.

If you create a topic map on hardware/software security, iPhones and IPads are of type: insecure.

UPDATE: The Apple backdoor that wasn’t by Violet Blue.

From Violet’s post:

Since Mr. Zdziarski presented “Identifying back doors, attack points, and surveillance mechanisms in iOS devices“, his miscasting of Apple’s developer diagnostics as a “backdoor” was defeated on Twitter, debunked and saw SourceClear calling Zdziarski an attention seeker in Computerworld, and Apple issued a statement saying that no, this is false.

In fact, this allegedly “secret backdoor” was added to diagnostic information that has been as freely available as a page out of a phone book since 2002.

Interesting. So if you are called an “attention seeker” in Computerworld and a vendor denies your claim, the story is false?

In the Computerworld account:

Apple swiftly rejected Zdziarski’s accusations, pointing out that end users are in complete control of the claimed hacking process — the person owning the device must have unlocked it and “agreed to trust another computer before the computer is able” to access the diagnostic data the claimed NerveGas attack focuses on.

Isn’t that what Zorabedian said:

For the backdoor to be exploited by a spy, your iDevice needs to be synced to another computer via a feature called iOS pairing.

Once your iDevice is paired to your PC or Mac, they exchange encryption keys and certificates to establish an encrypted SSL tunnel, and the keys are never deleted unless the iPhone or iPad is wiped with a factory reset.

That means a hacker could insert spyware on your computer to steal the pairing keys, which allows them to locate and connect to your device via Wi-Fi.

Sounds to me like Apple and Zorabedian agree on the necessary conditions for the exploit.


Curious that Violet Blue jumps over the technical agreement between Apple and Zorabedian to take the later to task for name calling and attention seeking. The latter accusation being too ironic for words.

New Testament Transcription

Thursday, July 24th, 2014

There is an excellent example of a transcription interface at: A screen shot won’t display well but I can sketch the general form of the interface:

transcription interface

A user selects a character in the papyrus by “clicking” on its center point. That point can be moved if need be. The character will be highlighted and you then select the matching character on the keyboard.

There are examples of the instructions that can be played if you are uncertain at any point.

I can’t imagine a more intuitive transcription interface.

I have suggested crowd sourcing transcription of the New Testament (and Old Testament/Hebrew Bible as well) before to groups concerned with those texts. The response has always been that there are cases that require expertise to transcribe. Fair enough, that’s very true.

But, with crowd transcription, we would be able to use the results of hundreds if not thousands of transcribers to identify the characters or symbols that have no consistent transcription. Those particular cases could be “kicked up stairs” to the experts.

The end result, assuming access to all the extant manuscripts, would be a traceable transcription of all the sources for the New Testament back to particular manuscripts or papyri. With all the witnesses to a particular character or word being at the reader’s fingertips. (Ditto for the Old Testament/Hebrew Bible.)

We have the technology to bring the witnesses to the biblical text to everyone who is interested. The only remaining question is whether funders can overcome the reluctance of the usual suspects to granting everyone that level of access.

Personally I have no fear of free and open access to the witnesses to the biblical text. As a text the Bible has resisted efforts to pervert it meaning for more than two (2) thousand years. It can take care of itself.

Hadoop Summit Content Curation

Thursday, July 24th, 2014

Hadoop Summit Content Curation by Jules S. Damji.

From the post:

Although the Hadoop Summit San Jose 2014 has come and gone, the invaluable content—keynotes, sessions, and tracks—is available here. We ’ve selected a few sessions for Hadoop developers, practitioners, and architects, curating them under Apache Hadoop YARN, the architectural center and the data operating system.

In most of the keynotes and tracks three themes resonated:

  1. Enterprises are transitioning from traditional Hadoop to modern Hadoop 2.
  2. YARN is an enabler, the central orchestrator that facilitates multiple workloads, runs multiple data engines, and supports multiple access patterns—batch, interactive, streaming, and real-time—in Apache Hadoop 2.
  3. Apache Hadoop 2, as part of Modern Data Architecture (MDA), is enterprise ready.

It doesn’t matter if I have cable or DirectTV, there is never a shortage of material to watch. 😉


Choke-Point based Benchmark Design

Wednesday, July 23rd, 2014

Choke-Point based Benchmark Design by Peter Boncz.

From the post:

The Linked Data Benchmark Council (LDBC) mission is to design and maintain benchmarks for graph data management systems, and establish and enforce standards in running these benchmarks, and publish and arbitrate around the official benchmark results. The council and its website just launched, and in its first 1.5 year of existence, most effort at LDBC has gone into investigating the needs of the field through interaction with the LDBC Technical User Community (next TUC meeting will be on October 5 in Athens) and indeed in designing benchmarks.

So, what makes a good benchmark design? Many talented people have paved our way in addressing this question and for relational database systems specifically the benchmarks produced by TPC have been very helpful in maturing relational database technology, and making it successful. Good benchmarks are relevant and representative (address important challenges encountered in practice), understandable , economical (implementable on simple hardware), fair (such as not to favor a particular product or approach), scalable, accepted by the community and public (e.g. all of its software is available in open source). This list stems from Jim Gray’s Benchmark Handbook. In this blogpost, I will share some thoughts on each of these aspects of good benchmark design.

Just in case you want to start preparing for the Athens meeting:

The Social Network Benchmark 0.1 draft and supplemental materials.

The Semantic Publishing Benchmark 0.1 draft and supplemental materials.

Take the opportunity to download the benchmark materials edited by Jim Gray. Will be useful in evaluating the benchmarks of the LDBC.

Improving RRB-Tree Performance through Transience

Wednesday, July 23rd, 2014

Improving RRB-Tree Performance through Transience by Jean Niklas L’orange.


The RRB-tree is a confluently persistent data structure based on the persistent vector, with efficient concatenation and slicing, and effectively constant time indexing, updates and iteration. Although efficient appends have been discussed, they have not been properly studied.

This thesis formally describes the persistent vector and the RRB-tree, and presents three optimisations for the RRB-tree which have been successfully used in the persistent vector. The differences between the implementations are discussed, and the performance is measured. To measure the performance, the C library librrb is implemented with the proposed optimisations.

Results shows that the optimisations improves the append performance of the RRB-tree considerably, and suggests that its performance is comparable to mutable array lists in certain situations.

Jean’s thesis is available at:

Although immutable data structures are obviously better suited for parallel programming, years of hacks on mutable data structures have set a high bar for performance. Unreasonably, parallel programmers want the same level of performance from immutable data structures as from their current mutable ones. 😉

Research such as Jean’s moves functional languages one step closer to being the default for parallel processing.

Finger trees:…

Wednesday, July 23rd, 2014

Finger trees: a simple general-purpose data structure by Ralf Hinze and Ross Paterson.


We introduce 2-3 finger trees, a functional representation of a persistent sequences supporting access to the ends in amortized constant time, and concatenation and splitting in time logarithmic in the size of the smaller piece. Representations achieving these bounds have appeared previously, but 2-3 finger trees are much simpler, as are the operations on them. Further, by defi ning the split operation in a general form, we obtain a general purpose data structure that can serve as a sequence, priority queue, search tree, priority search queue and more.

Before the Hinze and Paterson article you may want to read: 2-3 finger trees in ASCII by Jens Nicolay.

Note 2-3 finger trees go unmentioned in Purely Functional Data Structures by Chris Okasaki.

Other omissions of note?

First World War Digital Resources

Wednesday, July 23rd, 2014

First World War Digital Resources by Christopher Phillips.

From the post:

The centenary of the First World War has acted as a catalyst for intense public and academic attention. One of the most prominent manifestations of this increasing interest in the conflict is in the proliferation of digital resources made available recently. Covering a range of national and internationally-focused websites, this review makes no pretence at comprehensiveness; indeed it will not cover the proliferation of locally-oriented sites such as the Tynemouth World War One Commemoration Project, or those on neutral territories like the Switzerland and the First World War. Instead, this review will offer an introduction to some of the major repositories of information for both public and academic audiences seeking further understanding of the history of the First World War.

The Imperial War Museum (IWM) in London has been designated by the British government as the focal point of British commemorations of the war. The museum itself has been the recipient of a £35million refurbishment, and the IWM’s Centenary Website acts as a collecting point for multiple regional, national and international cultural and educational organisations through the First World War Centenary Partnership. This aspect of the site is a real triumph, providing a huge, regularly updated events calendar which demonstrates both the geographical spread and the variety of the cultural and academic contributions scheduled to take place over the course of the centenary.

Built upon the stunning visual collections held by the museum, the website contains a number of introductory articles on a wide range of subjects. In addition to the relatively familiar subjects of trenches, weaponry and poets, the website also provides contributions on the less-traditional aspects of the conflict. The varied roles taken by women, the ‘sideshow’ theatres of war outside the Western Front, and the myriad animals used by the armed forces are also given featured. Although the many beautiful photographs and images from the IWM itself are individually recorded, the lack of a ‘further reading’ section to supplement the brief written descriptions is a weakness, particularly as the site is clearly geared towards those at an early stage in their research into the conflict (the site contains a number of advertisements for interactive talks at IWM sites aimed at students at KS3 and above).

The keystone of the IWM’s contribution to the centenary, however, is the Lives of the First World War project. Lives aims to create a ‘permanent digital memorial to more than eight million men and women from across Britain and the Commonwealth’ before the end of the centenary. Built upon the foundation of official medal index cards, the site relies upon contributions from the public, inputting data, photographs and information to help construct the ‘memorial’. Launched in February 2014, the database is currently sparsely populated, with very little added to the life stories of the majority of soldiers. Concentration at the moment appears to be on the more ‘celebrity’ soldiers of the war, men such as Captain Noel Chavasse and Wilfred Owen, upon whom significant research has already been undertaken. Although a search option is available to find individual soldiers by name, unit, or service number, the limitations of the search engine render a comparison of soldiers from the same city or from a shared workplace impossible. Lives is undoubtedly an ambitious project; however at this time there is little available for genealogists or academic researchers on the myriad stories still locked in attics and archives across Britain.

If you are interested in World War I and its history, this is an excellent starting point. Unlike military histories, the projects covered here paint a broader picture of the war, a picture that includes a wider cast of characters.

Awesome Big Data

Wednesday, July 23rd, 2014

Awesome Big Data by Onur Akpolat.

From the webpage:

A curated list of awesome big data frameworks, ressources and other awesomeness. Inspired by awesome-php, awesome-python, awesome-ruby, hadoopecosystemtable & big-data.

Your contributions are always welcome!


Great list of projects.

Curious to see if it develops enough community support to sustain the curation of the listing.

Finding resource collections like this one is so haphazard on the WWW that often times authors are duplicating the work of others. Not intentionally, just unaware of a similar resource.

Similar to the repeated questions that appear on newsgroups and email lists about basic commands or flaws in programs. The answer probably already exists in an archive or FAQ, but how is a new user to find it?

The social aspects of search and knowledge sharing are likely as important, if not more so, than the technologies we use to implement them.

Suggestions for reading on the social aspects of search and knowledge sharing?

Web Annotation Working Group Charter

Wednesday, July 23rd, 2014

Web Annotation Working Group Charter

From the webpage:

Annotating, which is the act of creating associations between distinct pieces of information, is a widespread activity online in many guises but currently lacks a structured approach. Web citizens make comments about online resources using either tools built into the hosting web site, external web services, or the functionality of an annotation client. Readers of ebooks make use the tools provided by reading systems to add and share their thoughts or highlight portions of texts. Comments about photos on Flickr, videos on YouTube, audio tracks on SoundCloud, people’s posts on Facebook, or mentions of resources on Twitter could all be considered to be annotations associated with the resource being discussed.

The possibility of annotation is essential for many application areas. For example, it is standard practice for students to mark up their printed textbooks when familiarizing themselves with new materials; the ability to do the same with electronic materials (e.g., books, journal articles, or infographics) is crucial for the advancement of e-learning. Submissions of manuscripts for publication by trade publishers or scientific journals undergo review cycles involving authors and editors or peer reviewers; although the end result of this publishing process usually involves Web formats (HTML, XML, etc.), the lack of proper annotation facilities for the Web platform makes this process unnecessarily complex and time consuming. Communities developing specifications jointly, and published, eventually, on the Web, need to annotate the documents they produce to improve the efficiency of their communication.

There is a large number of closed and proprietary web-based “sticky note” and annotation systems offering annotation facilities on the Web or as part of ebook reading systems. A common complaint about these is that the user-created annotations cannot be shared, reused in another environment, archived, and so on, due to a proprietary nature of the environments where they were created. Security and privacy are also issues where annotation systems should meet user expectations.

Additionally, there are the related topics of comments and footnotes, which do not yet have standardized solutions, and which might benefit from some of the groundwork on annotations.

The goal of this Working Group is to provide an open approach for annotation, making it possible for browsers, reading systems, JavaScript libraries, and other tools, to develop an annotation ecosystem where users have access to their annotations from various environments, can share those annotations, can archive them, and use them how they wish.

Depending on how fine grained you want your semantics, annotation is one way to convey them to others.

Unfortunately, looking at the starting point for this working group, “open” means RDF, OWL and other non-commercially adopted technologies from the W3C.

Defining the ability to point, using XQuery perhaps and reserving to users the ability to create standards for annotation payloads would be a much more “open” approach. That is an approach you are unlikely to see from the W3C.

I would be more than happy to be proven wrong on that point.

Supplying Missing Semantics? (IKWIM)

Wednesday, July 23rd, 2014

Chris Ford in Creating music with Clojure and Overtone uses an example of a harmonic sound missing its first two harmonic components and yet when heard, our ears supply the missing components. Quite spooky when you first see it but there is no doubt that the components are “missing” quite literally from the result.

Which makes me wonder, do we generally supply semantics, appropriately or inappropriately, to data?

Unless it is written in an unknown script, we “know” what data must mean, based on what we would mean by such data.

Using “data” in the broadest sense to include all recorded information.

Even unknown scripts don’t stop us from assigning our “meanings” to texts. I will have to run down some of the 17th century works on Egyptian Hieroglyphics at some point.

Entertaining and according to current work on historical Egyptian, not even close to what we now understand the texts to mean.

The “I know what it means” (IKWIM) syndrome may be the biggest single barrier to all semantic technologies. Capturing the semantics of texts is always an expensive proposition and if I already IKWIM, then why bother?

If you are capturing something I already know, that can be shared with others. Another disincentive for capturing semantics.

To paraphrase a tweet I saw today by no-hacker-news

Why take 1 minute to document when others can waste a day guessing?