Archive for the ‘JSON’ Category

Parsing JSON is a Minefield

Wednesday, October 26th, 2016

Parsing JSON is a Minefield by Nicolas Seriot.


JSON is the de facto standard when it comes to (un)serialising and exchanging data in web and mobile programming. But how well do you really know JSON? We’ll read the specifications and write test cases together. We’ll test common JSON libraries against our test cases. I’ll show that JSON is not the easy, idealised format as many do believe. Indeed, I did not find two libraries that exhibit the very same behaviour. Moreover, I found that edge cases and maliciously crafted payloads can cause bugs, crashes and denial of services, mainly because JSON libraries rely on specifications that have evolved over time and that let many details loosely specified or not specified at all.
(emphasis in original)

Or the summary (tweet) that caught my attention:

I published: Parsing JSON is a Minefield … in which I could not find two parsers that exhibited the same behaviour

Or consider this graphic, which in truth needs a larger format than even the original:


Don’t worry, you can’t read the original at its default resolution. I had to enlarge the view several times to get a legible display.

More suitable for a poster sized print.

Perhaps something to consider for Balisage 2017 as swag?

Excellent work and a warning against the current vogue of half-ass standardization in some circles.

“We know what we meant” is a sure sign of poor standards work.

A Conflict-Free Replicated JSON Datatype

Tuesday, August 16th, 2016

A Conflict-Free Replicated JSON Datatype by Martin Kleppmann, Alastair R. Beresford.


Many applications model their data in a general-purpose storage format such as JSON. This data structure is modified by the application as a result of user input. Such modifications are well understood if performed sequentially on a single copy of the data, but if the data is replicated and modified concurrently on multiple devices, it is unclear what the semantics should be. In this paper we present an algorithm and formal semantics for a JSON data structure that automatically resolves concurrent modifications such that no updates are lost, and such that all replicas converge towards the same state. It supports arbitrarily nested list and map types, which can be modified by insertion, deletion and assignment. The algorithm performs all merging client-side and does not depend on ordering guarantees from the network, making it suitable for deployment on mobile devices with poor network connectivity, in peer-to-peer networks, and in messaging systems with end-to-end encryption.

Not a fast read and I need to think about its claim that JSON supports more complexity than XML. 😉


The Symptom of Many Formats

Monday, June 13th, 2016

Distro.Mic: An Open Source Service for Creating Instant Articles, Google AMP and Apple News Articles

From the post:

Mic is always on the lookout for new ways to reach our audience. When Facebook, Google and Apple announced their own native news experiences, we jumped at the opportunity to publish there.

While setting Mic up on these services, David Björklund realized we needed a common article format that we could use for generating content on any platform. We call this format article-json, and we open-sourced parsers for it.

Article-json got a lot of support from Google and Apple, so we decided to take it a step further. Enter DistroMic. Distro lets anyone transform an HTML article into the format mandated by one of the various platforms.


While I applaud the DistroMic work, I am saddened that it was necessary.

From the DistroMic page, here is the same article in three formats:


“article”: [
“text”: “Astronomers just announced the universe might be expanding up to 9% faster than we thought.\n”,
“additions”: [
“type”: “link”,
“rangeStart”: 59,
“rangeLength”: 8,
“URL”: “”
“inlineTextStyles”: [
“rangeStart”: 59,
“rangeLength”: 8,
“textStyle”: “bodyLinkTextStyle”
“role”: “body”,
“layout”: “bodyLayout”
“text”: “It’s a surprising insight that could put us one step closer to finally figuring out what the hell dark energy and dark matter are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.\n”,
“additions”: [
“type”: “link”,
“rangeStart”: 98,
“rangeLength”: 28,
“URL”: “”
“inlineTextStyles”: [
“rangeStart”: 98,
“rangeLength”: 28,
“textStyle”: “bodyLinkTextStyle”
“role”: “body”,
“layout”: “bodyLayout”
“role”: “container”,
“components”: [
“role”: “photo”,
“URL”: “bundle://image-0.jpg”,
“style”: “embedMediaStyle”,
“layout”: “embedMediaLayout”,
“caption”: {
“text”: “Source: \n NASA\n \n”,
“additions”: [
“type”: “link”,
“rangeStart”: 13,
“rangeLength”: 4,
“URL”: “”
“inlineTextStyles”: [
“rangeStart”: 13,
“rangeLength”: 4,
“textStyle”: “embedCaptionTextStyle”
“textStyle”: “embedCaptionTextStyle”
“layout”: “embedLayout”,
“style”: “embedStyle”
“bundlesToUrls”: {
“image-0.jpg”: “”


<p>Astronomers just announced the universe might be expanding
<a href=””>up to 9%</a> faster than we thought.</p>
<p>It’s a surprising insight that could put us one step closer to finally figuring out what the hell
<a href=””>
dark energy and dark matter</a> are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.</p>
<figure data-feedback=”fb:likes,fb:comments”>
<img src=””></img>
Source: <a href=”


<p>Astronomers just announced the universe might be expanding
<a href=””>up to 9%</a> faster than we thought.</p> <p>It’s a surprising insight that could put us one step closer to finally figuring out what the hell
<a href=””> dark energy and dark matter</a> are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.</p>
<amp-img width=”900″ height=”445″ layout=”responsive” src=””></amp-img>
<a href=”

All starting from the same HTML source:

<p>Astronomers just announced the universe might be expanding
<a href=””>up to 9%</a> faster than we thought.</p><p>It’s a surprising insight that could put us one step closer to finally figuring out what the hell
<a href=””>
dark energy and dark matter</a> are. Or it could mean that we’ve gotten something fundamentally wrong in our understanding of physics, perhaps even poking a hole in Einstein’s theory of gravity.</p>
<img width=”900″ height=”445″ src=””>
<a href=”

Three workflows based on what started life in one common format.

Three workflows that have their own bugs and vulnerabilities.

Three workflows that duplicate the capabilities of each other.

Three formats that require different indexing/searching.

This is not the cause of why we can’t have nice things in software, but it certainly is a symptom.

The next time someone proposes a new format for a project, challenge them to demonstrate a value-add over existing formats.

Stop Comparing JSON and XML

Thursday, November 19th, 2015

Stop Comparing JSON and XML by Yegor Bugayenko.

From the post:

JSON or XML? Which one is better? Which one is faster? Which one should I use in my next project? Stop it! These things are not comparable. It’s similar to comparing a bicycle and an AMG S65. Seriously, which one is better? They both can take you from home to the office, right? In some cases, a bicycle will do it better. But does that mean they can be compared to each other? The same applies here with JSON and XML. They are very different things with their own areas of applicability.

Yegor follows that time-honored Web tradition of telling people, who aren’t listening, why they should follow his advice.


If nothing else, circulate this around the office to get everyone’s blood pumping this late in the week.

I would amend Yegor’s headline to read: Stop Comparing JSON and XML Online!

As long as your discussions don’t gum up email lists, news feeds, Twitter, have at it.


Streaming Data IO in R

Monday, June 29th, 2015

Streaming Data IO in R – curl, jsonlite, mongolite by Jeroem Ooms.


The jsonlite package provides a powerful JSON parser and generator that has become one of standard methods for getting data in and out of R. We discuss some recent additions to the package, in particular support streaming (large) data over http(s) connections. We then introduce the new mongolite package: a high-performance MongoDB client based on jsonlite. MongoDB (from “humongous”) is a popular open-source document database for storing and manipulating very big JSON structures. It includes a JSON query language and an embedded V8 engine for in-database aggregation and map-reduce. We show how mongolite makes inserting and retrieving R data to/from a database as easy as converting it to/from JSON, without the bureaucracy that comes with traditional databases. Users that are already familiar with the JSON format might find MongoDB a great companion to the R language and will enjoy the benefits of using a single format for both serialization and persistency of data.

R, JSON, MongoDB, what’s there not to like? 😉

From UseR! 2015.


Tooling Up For JSON

Saturday, January 24th, 2015

I needed to explore a large (5.7MB) JSON file and my usual command line tools weren’t a good fit.

Casting about I discovered Jshon: Twice as fast, 1/6th the memory. From the home page for Jshon:

Jshon parses, reads and creates JSON. It is designed to be as usable as possible from within the shell and replaces fragile adhoc parsers made from grep/sed/awk as well as heavyweight one-line parsers made from perl/python. Requires Jansson

Jshon loads json text from stdin, performs actions, then displays the last action on stdout. Some of the options output json, others output plain text meta information. Because Bash has very poor nested datastructures, Jshon does not try to return a native bash datastructure as a tpical library would. Instead, Jshon provides a history stack containing all the manipulations.

The big change in the latest release is switching the everything from pass-by-value to pass-by-reference. In a typical use case (processing AUR search results for ‘python’) by-ref is twice as fast and uses one sixth the memory. If you are editing json, by-ref also makes your life a lot easier as modifications do not need to be manually inserted through the entire stack.

Jansson is described as: “…a C library for encoding, decoding and manipulating JSON data.” Usual ./configure, make, make install. Jshon has no configure or install script so just make and toss it somewhere that is in your path.

Under Bugs you will read: “Documentation is brief.”

That’s for sure!

Still, it has enough examples that with some practice you will find this a handy way to explore JSON files.


CSV on the Web:… [ .csv 5,250,000, .rdf 72,700]

Thursday, January 8th, 2015

CSV on the Web: Metadata Vocabulary for Tabular Data, and Their Conversion to JSON and RDF

From the post:

The CSV on the Web Working Group has published First Public Working Drafts of the Generating JSON from Tabular Data on the Web and the Generating RDF from Tabular Data on the Web documents, and has also issued new releases of the Metadata Vocabulary for Tabular Data and the Model for Tabular Data and Metadata on the Web Working Drafts. A large percentage of the data published on the Web is tabular data, commonly published as comma separated values (CSV) files. Validation, conversion, display, and search of that tabular data requires additional information on that data. The “Metadata vocabulary” document defines a vocabulary for metadata that annotates tabular data, providing such information as datatypes, linkage among different tables, license information, or human readable description of columns. The standard conversion of the tabular data to JSON and/or RDF makes use of that metadata to provide representations of the data for various applications. All these technologies rely on a basic data model for tabular data described in the “Model” document. The Working Group welcomes comments on these documents and on their motivating use cases. Learn more about the Data Activity.

These are working drafts and as such have a number of issues noted in the text of each one. Excellent opportunity to participate in the W3C process.

There aren’t any reliable numbers but searching for “.csv” returns 5,250,000 “hits” and searching on “.rdf” returns 72,700 “hits.”

That sound really low for CSV and doesn’t include all the CSV files on local systems.

Still, I would say that CSV files continue to be important and that this work merits your attention.

Last Call: XQuery 3.1 and XQueryX 3.1; and additional supporting documents

Friday, October 10th, 2014

Last Call: XQuery 3.1 and XQueryX 3.1; and additional supporting documents

From the post:

Today the XQuery Working Group published a Last Call Working Draft of XQuery 3.1 and XQueryX 3.1. Additional supporting documents were published jointly with the XSLT Working Group: a Last Call Working Draft of XPath 3.1, together with XPath Functions and Operators, XQuery and XPath Data Model, and XSLT and XQuery Serialization. XQuery 3.1 and XPath 3.1 introduce improved support for working with JSON data with map and array data structures as well as loading and serializing JSON; additional support for HTML class attributes, HTTP dates, scientific notation, cross-scaling between XSLT and XQuery and more. Comments are welcome through 7 November 2014. Learn more about the XML Activity.

How closely do you read?

To answer that question, read all the mentioned documents by 7 November 2014, keeping a list of errors you spot.

Submit your list to the XQuery Working Group by by 7 November 2014 and score your reading based on the number of “errors” accepted by the working group.

What is your W3C Proofing Number? (Average number of accepted “errors” divided by the number of W3C drafts where “errors” were submitted.)

6,482 Datasets Available

Tuesday, August 26th, 2014

6,482 Datasets Available Across 22 Federal Agencies In Data.json Files by Kin Lane.

From the post:

It has been a few months since I ran any of my federal government data.json harvesting, so I picked back up my work, and will be doing more work around datasets that federal agnecies have been making available, and telling the stories across my network.

I’m still surprised at how many people are unaware that 22 of the top federal agencies have data inventories of their public data assets, available in the root of their domain as a data.json file. This means you can go to many and there is a machine readable list of that agencies current inventory of public datasets.

See Kin’s post for links to the agency data.json files.

You may also want to read: What Happened With Federal Agencies And Their Data.json Files, which details Kin’s earlier efforts with tracking agency data.json files.

Kin points out that these data.json files are governed by: OMB M-13-13 Open Data Policy—Managing Information as an Asset. It’s pretty joyless reading but if you are interested in the the policy details or the requirements agencies must meet, it’s required reading.

If you are looking for datasets to clean up or combine together, it would be hard to imagine a more diverse set to choose from.

JSON-LD for software discovery…

Monday, June 16th, 2014

JSON-LD for software discovery, reuse and credit by Afron Smith.

From the post:

JSON-LD is a way of describing data with additional context (or semantics if you like) so that for a JSON record like this:

{ "name" : "Arfon" }

when there’s an entity called name you know that it means the name of a person and not a place.

If you haven’t heard of JSON-LD then there are some great resources here and an excellent short screencast on YouTube here.

One of the reasons JSON-LD is particularly exciting is that it’s a lightweight way of organising JSON-formatted data and giving semantic meaning without having to care about things like RDF data models, XML and the (note the capitals) Semantic Web. Being much more succinct than XML and JavaScript native, JSON has over the past few years become the way to expose data through a web-based API. JSON-LD offers a way for API provides (and consumers) to share data more easily with little or no ambiguity about what the data they’re describing.

The YouTube video “What is JSON-LD?” by Manu Sporny makes an interesting point about the “ambiguity problem,” that is do you mean by “name” what I mean by “name” as a property?

At about time mark 5:36, Manu addresses the “ambiguity problem.”

The resolution of the ambiguity is to use a hyperlink as an identifier, the implication being that if we use the same identifier, we are talking about the same thing. (That isn’t true in real life, cf. the many meanings of owl:sameAS, but for simplicity sake, let’s leave that to one side.)

OK, what is the difference in both of us using the string “name” and both of us using the string “”? Both of them are opaque strings that either match or don’t. This just kicks the semantic can a little bit further down the road.

Let me use a better example from

"@context": "",
"@id": "",
"name": "John Lennon",
"born": "1940-10-09",
"spouse": ""

If you follow you will obtain a 2.4k JSON-LD file that contains (in part):

“Person”: “

Following that link results in a webpage that reads in part:

The Person class represents people. Something is a Person if it is a person. We don’t nitpic about whether they’re alive, dead, real, or imaginary. The Person class is a sub-class of the Agent class, since all people are considered ‘agents’ in FOAF.

and it is said to be:

Disjoint With: Project Organization

Ambiguity jumps back to the fore with: Something is a Person if it is a person.

What is that solipsism? Tautology?

There is no opportunity to say what properties are necessary to qualify as a “person” in the sense defined FOAF.

You may think that is nit-picking but without the ability to designate properties required to be a “person,” it isn’t possible to talk about U.S.C Title 42: 1983 civil rights actions where municipalities are held to be “persons” within the meaning of this law. That’s just one example. There are numerous variations on “person” for legal purposes.

You could argue that JSON-LD is for superficial or bubble-gum semantics but it is too useful a syntax for that fate.

Rather I would like to see JSON-LD to make ambiguity “manageable” by its users. True, you could define a “you know what I mean” document like FOAF, if that suits your purposes. On the other hand, you should be able to define required key/value pairs for any subject and for any key or value to extend an existing definition.

How far you need to go is on a case by case basis. For apps that display “AI” by tracking you and pushing more ads your way, FOAF may well be sufficient. For those of us with non-advertising driven interests, other diversions may await.

Announcing Actions

Thursday, April 17th, 2014

Announcing Actions

From the post:

When we launched almost 3 years ago, our main focus was on providing vocabularies for describing entities — people, places, movies, restaurants, … But the Web is not just about static descriptions of entities. It is about taking action on these entities — from making a reservation to watching a movie to commenting on a post.

Today, we are excited to start the next chapter of and structured data on the Web by introducing vocabulary that enables websites to describe the actions they enable and how these actions can be invoked.

The new actions vocabulary is the result of over two years of intense collaboration and debate amongst the partners and the larger Web community. Many thanks to all those who participated in these discussions, in particular to members of the Web Schemas and Hydra groups at W3C. We are hopeful that these additions to will help unleash new categories of applications.


Thing > Action

An action performed by a direct agent and indirect participants upon a direct object. Optionally happens at a location with the help of an inanimate instrument. The execution of the action may produce a result. Specific action sub-type documentation specifies the exact expectation of each argument/role.

Fairly coarse but I can see how it would be useful.

BTW, the examples are only available in JSON-LD. Just in case you were wondering.

Given the coarseness of and its success, due consideration should be given to semantics of “appropriate” coarseness for any particular task.

JSON-LD and Why I Hate the Semantic Web

Tuesday, January 28th, 2014

JSON-LD and Why I Hate the Semantic Web by Manu Sporny.

From the post:

JSON-LD became an official Web Standard last week. This is after exactly 100 teleconferences typically lasting an hour and a half, fully transparent with text minutes and recorded audio for every call. There were 218+ issues addressed, 2,000+ source code commits, and 3,102+ emails that went through the JSON-LD Community Group. The journey was a fairly smooth one with only a few jarring bumps along the road. The specification is already deployed in production by companies like Google, the BBC,, Yandex, Yahoo!, and Microsoft. There is a quickly growing list of other companies that are incorporating JSON-LD. We’re off to a good start.

In the previous blog post, I detailed the key people that brought JSON-LD to where it is today and gave a rough timeline of the creation of JSON-LD. In this post I’m going to outline the key decisions we made that made JSON-LD stand out from the rest of the technologies in this space.

I’ve heard many people say that JSON-LD is primarily about the Semantic Web, but I disagree, it’s not about that at all. JSON-LD was created for Web Developers that are working with data that is important to other people and must interoperate across the Web. The Semantic Web was near the bottom of my list of “things to care about” when working on JSON-LD, and anyone that tells you otherwise is wrong. :P

TL;DR: The desire for better Web APIs is what motivated the creation of JSON-LD, not the Semantic Web. If you want to make the Semantic Web a reality, stop making the case for it and spend your time doing something more useful, like actually making machines smarter or helping people publish data in a way that’s useful to them.


Something to get your blood pumping early in the week.

Although, I don’t think it is healthy for Manu to hold back so much. 😉

Read the comments to the post as well.

JSON-LD Is A W3C Recommendation

Thursday, January 16th, 2014

JSON-LD Is A W3C Recommendation

From the post:

The RDF Working Group has published two Recommendations today:

  • JSON-LD 1.0. JSON is a useful data serialization and messaging format. This specification defines JSON-LD, a JSON-based format to serialize Linked Data. The syntax is designed to easily integrate into deployed systems that already use JSON, and provides a smooth upgrade path from JSON to JSON-LD. It is primarily intended to be a way to use Linked Data in Web-based programming environments, to build interoperable Web services, and to store Linked Data in JSON-based storage engines.
  • JSON-LD 1.0 Processing Algorithms and API. This specification defines a set of algorithms for programmatic transformations of JSON-LD documents. Restructuring data according to the defined transformations often dramatically simplifies its usage. Furthermore, this document proposes an Application Programming Interface (API) for developers implementing the specified algorithms.

It would make a great question on a markup exam to ask whether JSON reminded you more of the “Multicode Basic Concrete Syntax” or a “Variant Concrete Syntax?” For either answer, explain.

In any event, you will be encountering JSON-LD so these recommendations will be helpful.

Topotime gallery & sandbox

Thursday, December 26th, 2013

Topotime gallery & sandbox

From the website:

A pragmatic JSON data format, D3 timeline layout, and functions for representing and computing over complex temporal phenomena. It is under active development by its instigators, Elijah Meeks (emeeks) and Karl Grossner (kgeographer), who welcome forks, comments, suggestions, and reasonably polite brickbats.

Topotime currently permits the representation of:

  • Singular, multipart, cyclical, and duration-defined timespans in periods (tSpan in Period). A Period can be any discrete temporal thing, e.g. an historical period, an event, or a lifespan (of a person, group, country).
  • The tSpan elements start (s), latest start (ls), earliest end (ee), end (e) can be ISO-8601 (YYYY-MM-DD, YYYY-MM or YYYY), or pointers to other tSpans or their individual elements. For example, >23.s stands for ‘after the start of Period 23 in this collection.’
    • Uncertain temporal extents; operators for tSpan elements include: before (<), after (>), about (~), and equals (=).
  • Further articulated start and end ranges in sls and eee elements, respectively.
  • An estimated timespan when no tSpan is defined
  • Relations between events. So far, part-of, and participates-in. Further relations including has-location are in development.

Topotime currently permits the computation of:

  • Intersections (overlap) between between a query timespan and a collection of Periods, answering questions like “what periods overlapped with the timespan [-433, -344] (Plato’s lifespan possibilities)?” with an ordered list.

To learn more, check out these and other pages in the Wiki and the Topotime web page

I am currently reading the A Song of Fire and Ice (first volume, A Game of Thrones) and the uncertain temporal extents of Topotime may be useful for modeling some aspects of the narrative.

What will be more difficult to model will be facts known to some parties but not to others, at any point in the narrative.

Unlike graph models where every vertex is connected to every other vertex.

As I type that, I wonder if the edge connecting a vertex (representing a person) to some fact or event (another vertex), could have a property that represents the time in the novel’s narrative when the person in question knows a fact or event?

I need to plot out knowledge of a lineage. If you know the novel you can guess which one. 😉

Mapping the open web using GeoJSON

Sunday, December 8th, 2013

Mapping the open web using GeoJSON by Sean Gillies.

From the post:

GeoJSON is an open format for encoding information about geographic features using JSON. It has much in common with older GIS formats, but also a few new twists: GeoJSON is a text format, has a flexible schema, and is specified in a single HTML page. The specification is informed by standards such as OGC Simple Features and Web Feature Service and streamlines them to suit the way web developers actually build software today.

Promoted by GitHub and used in the Twitter API, GeoJSON has become a big deal in the open web. We are huge fans of the little format that could. GeoJSON suits the web and suits us very well; it plays a major part in our libraries, services, and products.

A short but useful review of why GeoJSON is important to MapBox and why it should be important to you.

A must read if you are interested in geo-locating data of interest to your users to maps.

Sean mentions that Github promotes GeoJSON but I’m curious if the NSA uses/promotes it as well? 😉

Elasticsearch Workshop

Tuesday, October 8th, 2013

Elasticsearch Workshop by David Pilato.

Nothing startling or new but a good introduction to Elasticsearch that you can pass along to programmers who like JSON. 😉

Nothing against JSON but “efficient” syntaxes are like using 7-bit encodings because it saves disk space.

Norch – a search engine for node.js

Friday, August 2nd, 2013

Norch – a search engine for node.js by Fergus McDowall.

From the post:

Norch is a search engine written for Node.js. Norch uses the Node search-index module which is in turn written using the super fast levelDB library that Google open-sourced in 2011.

The aim of Norch is to make a simple, fast search server, that requires minimal configuration to set up. Norch sacrifices complex functionality for a limited robust feature set, that can be used to set up a freetext search engine for most enterprise scenarios.

Currently Norch features

  • Full text search
  • Stopword removal
  • Faceting
  • Filtering
  • Relevance weighting (tf-idf)
  • Field weighting
  • Paging (offset and resultset length)

Norch can index any data that is marked up in the appropriate JSON format

Download the first release of Norch (0.2.1) here

Not every feature possible but it looks like Norch covers the most popular ones.

…Apache HBase REST Interface, Part 2

Friday, April 12th, 2013

How-to: Use the Apache HBase REST Interface, Part 2 by Jesse Anderson.

From the post:

This how-to is the second in a series that explores the use of the Apache HBase REST interface. Part 1 covered HBase REST fundamentals, some Python caveats, and table administration. Part 2 below will show you how to insert multiple rows at once using XML and JSON. The full code samples can be found on GitHub.

Only fair to cover both XML and TBL’s new favorite, JSON. (Tim Berners-Lee Renounces XML?)

Tim Berners-Lee Renounces XML?

Wednesday, April 10th, 2013

Draft TAG Teleconference Minutes 4th of April 2013

In a discussion of ISSUE-34: XML Transformation and composability (e.g., XSLT,XInclude, Encryption) the following exchange takes place:

Noah: Lets go through the issues and see which we can close. … Processing model of XML. Is there any interest in this?


Tim: I’m happy to do things with XML. This came from when we’re talking about XML was processed. The meaning from XML has to be taken outside-in. Otherwise you cannot create new XML specifications that interweave with what exist. … Not clear people noticed that.

I note that traceker has several status codes we can assign, including OPEN, PENDING, REVIEW, POSTPONED, and CLOSED.

Tim: Henry did a lot more work on that. I don’t feel we need to put a whole lot of energy into XML at all. JSON is the new way for me. It’s much more straightforward.

Suggestion: if we think this is now resolved or uninteresting, CLOSE it; if we think it’s interesting but not now, then POSTPONED?

Tim: We need another concept besides OPEN/CLOSED. Something like NOT WORKING ON IT.

Noah: It has POSTPONED.

Tim: POSTPONED expresses a feeling of guilt. But there’s no guilt.

Noah: It’s close enough and I’m not looking forward to changing Tracker.

ht, you wanted to add 0.02USD

Henry: I’m happy to move this to the backburner. I think there’s a genuine issue here and of interest to the community but I don’t have the bandwidth.

Noah: We need to tell ourselves a story as to what these codes mean. … Historically we used CLOSED for “it’s in pretty good shape”.

Henry: I’m happy with POSTPONED and it’s better than CLOSED.

+1 for postponing


RESOLUTION: We mark ISSUE-34 (xmlFunctions-34) POSTPONED

I think this is important, thanks for doing it noah

(emphasis added)

XML can be improved to be sure but the concept is not inherently flawed.

To JSON supporters, all I can say is XML wasn’t the bloated confusion you see now when it started.

The Pragmatic Haskeller – Episode 1

Sunday, April 7th, 2013

The Pragmatic Haskeller – Episode 1 by Alfredo Di Napoli.

The first episode of “The Pragmatic Haskeller” starts with:

In the beginning was XML, and then JSON.

When I read that sort of thing, it is hard to know whether to weep or pitch a fit.

Neither one is terribly productive but if you are interested in the rich heritage that XML relies upon drop me a line.

The first lesson is a flying start on Haskell data and moving it between JSON and XML fomats.


Thursday, March 28th, 2013


From the webpage:

Elephant is an S3-backed key-value store with querying powered by Elastic Search. Your data is persisted on S3 as simple JSON documents, but you can instantly query it over HTTP.

Suddenly, your data becomes as durable as S3, as portable as JSON, and as queryable as HTTP. Enjoy!

i don’t recall seeing Elephant on the Database Landscape Map – February 2013. Do you?

Every database is thought, at least by its authors, to be different from all the others.

What dimensions would be the most useful ones for distinction/comparison?


I first saw this in Nat Torkington’s Four short links: 27 March 2013.

Pig, ToJson, and Redis to publish data with Flask

Saturday, February 16th, 2013

Pig, ToJson, and Redis to publish data with Flask by Russell Jurney.

From the post:

Pig can easily stuff Redis full of data. To do so, we’ll need to convert our data to JSON. We’ve previously talked about pig-to-json in JSONize anything in Pig with ToJson. Once we convert our data to json, we can use the pig-redis project to load redis.

What do you think?

Something “lite” to test a URI dictionary locally?

Core JSON: The Fat-Free Alternative to XML

Monday, February 4th, 2013

Core JSON: The Fat-Free Alternative to XML by Tom Marrs.

From the webpage:

JSON (JavaScript Object Notation) is a standard text-based data interchange format that enables applications to exchange data over a computer network. This Refcard covers JSON syntax, validation, modeling, and JSON Schema, and includes tips and tricks for using JSON with various tools and programming languages.

I prefer XML over JSON and SGML over XML.

Having said that, I have to agree that JSON is a demonstration that complex protocols for the interchange of data are unnecessary.

At least if you only care about validation and not the documenting the semantics of the data being interchanged.

Put another way, semantics are never self-evident or documenting. With JSON, some other carrier has to delivery semantics, if at all.

Topic maps are great carriers of semantics, particularly if you use JSON schemas or data files from multiple sources.

BTW, you will note that JSON is based on those pesky tuples that Robert Barta makes so much of. 😉

Open Data Protocol

Thursday, January 31st, 2013

Open Data Protocol

From the webpage:

There is a vast amount of data available today and data is now being collected and stored at a rate never seen before. Much, if not most, of this data however is locked into specific applications or formats and difficult to access or to integrate into new uses.

The Open Data Protocol (OData) is a Web protocol for querying and updating data that provides a way to unlock your data and free it from silos that exist in applications today. OData does this by applying and building upon Web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores. The protocol emerged from experiences implementing AtomPub clients and servers in a variety of products over the past several years. OData is being used to expose and access information from a variety of sources including, but not limited to, relational databases, file systems, content management systems and traditional Web sites.

OData is consistent with the way the Web works – it makes a deep commitment to URIs for resource identification and commits to an HTTP-based, uniform interface for interacting with those resources (just like the Web). This commitment to core Web principles allows OData to enable a new level of data integration and interoperability across a broad range of clients, servers, services, and tools.

I have mentioned this resource before but it was buried in a post and not a separate post.

The amount of documentation has grown and much improved since then.



Friday, November 9th, 2012


From the homepage:

An open-source distributed database built with love.

Enjoy an intuitive query language, automatically parallelized queries, and simple administration.

Table joins and batteries included.

and the overview:

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

Simple programming model:

  • JSON data model and immediate consistency.
  • Distributed joins, subqueries, aggregation, atomic updates.
  • Hadoop-style map/reduce.

Easy administration:

  • Friendly web and command-line administration tools.
  • Takes care of machine failures and network interrupts.
  • Multi-datacenter replication and failover.

Horizontal scalability:

  • Sharding and replication to multiple nodes.
  • Queries are automatically parallelized and distributed.
  • Lock-free operation via MVCC concurrency.

Just once I would like to see a software release where the feature list reads:

<humor>Job Security – Never mentioned by “easy to learn” software packages. Our software is a stone cold bitch to learn. The usual ‘hello world” takes the better part of a day. But, who wants to write “hello world?”

Once you do learn it, it has more power than native C code and is faster. Are you a top gun programmer or a script kiddie? We write software for the former, not the latter.

Probably not going to happen.

BTW, at this time ReThinkDB does not support secondary indexes. But the way the documentation reads, that doesn’t sound like a permanent condition.

Could be useful for some cases and certainly will be.


Tuesday, October 2nd, 2012

JSONiq: The JSON Query Language

From the webpage:

JSONiq extends XQuery, a mature W3C standard, with native JSON support. Like XQuery and SQL, JSONiq is declarative: Expressions can nest with full composability.

Project, Filter, Join, Group… Like SQL, JSONiq can do all that. And it has many more features inherited from XQuery. JSONiq also inherits all XQuery builtin functions: date times, string manipulation, regular expressions, and more.

JSOniq is an expressive and highly optimizable language to query and update NoSQL stores. It enables developers to leverage the same productive high-level language across a variety of NoSQL products.

This came in over the nosql-discuss mailing list a day or so ago.

Sounds promising. Any early comments?

Got big JSON? BigQuery expands data import for large scale web apps

Tuesday, October 2nd, 2012

Got big JSON? BigQuery expands data import for large scale web apps by Ryan Boyd, Developer Advocate.

From the post:

JSON is the data format of the web. JSON is used to power most modern websites, is a native format for many NoSQL databases hosting top web applications, and provides the primary data format in many REST APIs. Google BigQuery, our cloud service for ad-hoc analytics on big data, has now added support for JSON and the nested/repeated structure inherent in the data format.

JSON opens the door to a more object-oriented view of your data compared to CSV, the original data format supported by BigQuery. It removes the need for duplication of data required when you flatten records into CSV. Here are some examples of data you might find a JSON format useful for:

  • Log files, with multiple headers and other name-value pairs.
  • User session activities, with information about each activity occurring nested beneath the session record.
  • Sensor data, with variable attributes collected in each measurement.

Nested/repeated data support is one of our most requested features. And while BigQuery’s underlying infrastructure supports it, we’d only enabled it in a limited fashion through M-Lab’s test data. Today, however, developers can use JSON to get any nested/repeated data into and out of BigQuery.

It had to happen. “Big Json” that is.

My question is when “Bigger Data” is going to catch on?

If you got far enough ahead, say six to nine months, you could copyright something like “Biggest Data” and start collecting fees when it comes into common usage.

JSONize Anything in Pig with ToJson

Thursday, September 27th, 2012

JSONize Anything in Pig with ToJson by Russell Jurney.

The critical bit reads:

That is precisely what the ToJson method of pig-to-json does. It takes a bag or tuple or nested combination thereof and returns a JSON string.

See Russell’s post for the details.

St. Laurent on Balisage

Sunday, August 12th, 2012

Applying markup to complexity: The blurry line between markup and programming by Simon St. Laurent.

Simon’s review of Balisage will make you want to attend next year, if you missed this year.

He misses an important issue with JSON (and XML) when he writes:

JSON gave programmers much of what they wanted: a simple format for shuttling (and sometimes storing) loosely structured data. Its simpler toolset, freed of a heritage of document formats and schemas, let programmers think less about information formats and more about the content of what they were sending.

XML and JSON look at data through different lenses. XML is a tree structure of elements, attributes, and content, while JSON is arrays, objects, and values. Element order matters by default in XML, while JSON is far less ordered and contains many more anonymous structures. (emphasis added)

The problem with JSON in a nutshell (apologies to O’Reilly): anonymous structures.

How is a subsequent programmer going to discover the semantics of “anonymous structures?”

Works great for job security, works less well for information integration several “generations” of programmers later.

XML can be poorly documented, just like JSON, but relationships between elements are explicit.

Anonymity, of all kinds, is the enemy of re-use of data, semantic integration and useful archiving of data.

If those aren’t your use cases, use anonymous JSON structures. (Or undocumented XML.)

From Solr to elasticsearch [Clarity as a Value?]

Monday, August 6th, 2012

From Solr to elasticsearch by Rob Young.

From the post:

Search is right at the center of GOV.UK. It’s the main focus of the homepage and it appears in the corner of every single page. Many of our recent and upcoming apps such as licence finder also rely heavily on search. So, making sure we have the right tool for the job is vital. Recently we decided to begin switching away from Solr to elasticsearch for our search server. Rob Young, a developer at GDS explains in some detail the basis for our decisions – the usual disclaimers about this being quite technical apply.

I am sure there are points to be made for both Solr and ElasticSearch. No doubt much religious debate will follow this decision.

What interested me was the claim that:

Just about the most important feature of any search engine is the ability to query it. Both Solr and elasticsearch expose their query APIs over HTTP but they do so in quite different ways. Solr queries are made up of two and three letter URL parameters, while elasticsearch queries are clear, self documenting JSON objects passed in the HTTP body.

It is possible, as the example in the post shows, to have “…clear, self documenting JSON objects….” in ElasticSearch but isn’t clarity in that case optional?

Or at least in the eyes of its user?

Not to downplay the important of being “…clear and self-documenting…” but to make it clear that is a design choice. A good one in my opinion but a design choice none the less.

That clarity occurs in this case in JSON is an accident of expression.