Archive for the ‘JSON’ Category

JSONize Anything in Pig with ToJson

Thursday, September 27th, 2012

JSONize Anything in Pig with ToJson by Russell Jurney.

The critical bit reads:

That is precisely what the ToJson method of pig-to-json does. It takes a bag or tuple or nested combination thereof and returns a JSON string.

See Russell’s post for the details.

St. Laurent on Balisage

Sunday, August 12th, 2012

Applying markup to complexity: The blurry line between markup and programming by Simon St. Laurent.

Simon’s review of Balisage will make you want to attend next year, if you missed this year.

He misses an important issue with JSON (and XML) when he writes:

JSON gave programmers much of what they wanted: a simple format for shuttling (and sometimes storing) loosely structured data. Its simpler toolset, freed of a heritage of document formats and schemas, let programmers think less about information formats and more about the content of what they were sending.

XML and JSON look at data through different lenses. XML is a tree structure of elements, attributes, and content, while JSON is arrays, objects, and values. Element order matters by default in XML, while JSON is far less ordered and contains many more anonymous structures. (emphasis added)

The problem with JSON in a nutshell (apologies to O’Reilly): anonymous structures.

How is a subsequent programmer going to discover the semantics of “anonymous structures?”

Works great for job security, works less well for information integration several “generations” of programmers later.

XML can be poorly documented, just like JSON, but relationships between elements are explicit.

Anonymity, of all kinds, is the enemy of re-use of data, semantic integration and useful archiving of data.

If those aren’t your use cases, use anonymous JSON structures. (Or undocumented XML.)

From Solr to elasticsearch [Clarity as a Value?]

Monday, August 6th, 2012

From Solr to elasticsearch by Rob Young.

From the post:

Search is right at the center of GOV.UK. It’s the main focus of the homepage and it appears in the corner of every single page. Many of our recent and upcoming apps such as licence finder also rely heavily on search. So, making sure we have the right tool for the job is vital. Recently we decided to begin switching away from Solr to elasticsearch for our search server. Rob Young, a developer at GDS explains in some detail the basis for our decisions – the usual disclaimers about this being quite technical apply.

I am sure there are points to be made for both Solr and ElasticSearch. No doubt much religious debate will follow this decision.

What interested me was the claim that:

Just about the most important feature of any search engine is the ability to query it. Both Solr and elasticsearch expose their query APIs over HTTP but they do so in quite different ways. Solr queries are made up of two and three letter URL parameters, while elasticsearch queries are clear, self documenting JSON objects passed in the HTTP body.

It is possible, as the example in the post shows, to have “…clear, self documenting JSON objects….” in ElasticSearch but isn’t clarity in that case optional?

Or at least in the eyes of its user?

Not to downplay the important of being “…clear and self-documenting…” but to make it clear that is a design choice. A good one in my opinion but a design choice none the less.

That clarity occurs in this case in JSON is an accident of expression.

First steps in data visualisation using d3.js, by Mike Dewar

Friday, January 13th, 2012

First steps in data visualisation using d3.js, by Mike Dewar

Drew Conway writes:

Last night Mike Dewar presented a wonderful talk to the New York Open Statistical Programming Meetup titled, “First steps in data visualisation using d3.js.” Mike took the audience through an excellent review of d3.js fundamentals, as well as showed off some of the features of working with Chrome Web Developer Tools. This is one of the best talks we have ever had, and if you have had any interest in exploring d3.js, but were intimidated by the design concepts or syntax, this is exactly the talk for you.

Follow the link to Drew’s blog to see the video or link to Mike’s slides (a product of d3.js).

This is an impressive presentation but I hesitated before making this post since Mike refers to XML as “clunky.” 😉 Oh, well, the rest of the presentation made up for it. Sorta. The audio quality leaves something to be desired as Mike wanders away from the microphone.

BTW, presentation question: What’s wrong with the bar chart in Mike’s first example? I count at least two. How many do you see?


Thursday, December 8th, 2011

QL.IO – A declarative, data-retrieval and aggregation gateway for quickly consuming HTTP APIs.

From the about page:

A SQL and JSON inspired DSL

SQL is quite a powerful DSL to retrieve, filter, project, and join data — see efforts like A co-Relational Model of Data for Large Shared Data Banks, LINQ, YQL, or unQL for examples. combines SQL, JSON, and a few procedural style constructs into a compact language. Scripts written in this language can make HTTP requests to retrieve data, perform joins between API responses, project responses, or even make requests in a loop. But note that’s scripting language is not SQL – it is SQL inspired.


Most real-world client apps need to mashup data from multiple APIs in one go. Data mashup is often complicated as client apps need to worry about order of requests, inter-dependencies, error handling, and parallelization to reduce overall latency.’s scripts are procedural in appearance but are executed out of order based on dependencies. Some statements may be scheduled in parallel and some in series based on a dependency analysis done at script compile time. The compilation is an on-the-fly process.

Consumer Centric Interfaces

APIs are designed for reuse, and hence they cater to the common denominator. Getting new fields added, optimizing responses, or combining multiple requests into one involve drawn out negotiations between API producing teams and API consuming teams. lets API consuming teams move fast by creating consumer-centric interfaces that are optimized for the client – such optimized interfaces can reduce bandwidth usage and number of HTTP requests.

I can believe the “SQL inspired” part since it looks like keys/column headers are opaque. That is you an specify a key/column header but you can’t specify the identity of the subject it represents.

So, if you don’t know the correct term, you are SOL. Which isn’t the state of being inspired.

Still, it looks like an interesting effort that could develop to be non-opaque with regard to keys and possibly values. (The next stage is how do you learn what properties a subject representative has for the purpose of subject recognition.)

Processing json data with apache velocity

Sunday, November 20th, 2011

Processing json data with apache velocity from Pierre Lindenbaum.

From the post:

I’ve written a tool named “apache velocity” which parse json data and processes it with “Apache velocity” (a template engine ). The (javacc) source code is available here:

Just in case you run across some data in JSON format and could use and example of processing it with Apache Velocity. Just in case. 😉


Friday, October 14th, 2011


From the website:

A Cloud NoSQL JSON Database

I don’t know that you will find this a useful entry into the Cloud/NoSQL race but it does come with comics. 😉

I haven’t signed up for the beta but did skim the blog.

In his design principles, complains about HTTP being slow. Maybe I should send him a pointer to: Optimizing HTTP: Keep-alive and Pipelining. What do you think?

If you join the beta, let me know what you think are strong/weak points from a topic map perspective. Thanks!

Querying ElasticSearch from VIM

Wednesday, October 12th, 2011

Querying ElasticSearch from VIM

From the post:

I’m using ElasticSearch quite a bit and finally decided to make it easy to debug. I now write JSON queries with a .es extension. And have this in my .vim/filetype.vim file:

Debugging ElasticSearch results with Perl.

I just know Robert (Barta) has a one liner for this and thought this might temp him into commenting. 😉


Monday, August 1st, 2011


From the webpage:

UnQL means Unstructured Query Language. It’s an open query language for JSON, semi-structured and document databases.

Another query language. Thoughts?

JSON-LD – Expressing Linked Data in JSON

Thursday, July 7th, 2011

JSON-LD – Expressing Linked Data in JSON

I mentioned recently a mailing list on Linked Data in JSON.

From the webpage:

JSON-LD (JavaScript Object Notation for Linked Data) is a lightweight Linked Data format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale. If you are already familiar with JSON, writing JSON-LD is very easy. There is a smooth migration path from the JSON you use today, to the JSON-LD you will use in the future. These properties make JSON-LD an ideal Linked Data interchange language for JavaScript environments, Web services, and unstructured databases such as CouchDB and MongoDB.

Short example or two plus links to other resources.

Linked Data in JSON

Sunday, May 8th, 2011

A mailing list has been created for Linked Data in JSON.

Manu Spomy has posted Updated JSON-LD Draft with a summary of changes and links for those already familiar with the draft.

You will be encountering it so it will be helpful to follow the discussion.