Archive for the ‘Data Types’ Category

The Algebra of Algebraic Data Types

Saturday, May 17th, 2014

Chris Taylor has a series of posts that correspond to a talk he gave in London (November 2012), video on YouTube and slides on Github.

Part 1.

Part 2.

Part 3.

Suggest you read the blog posts first and then following the slides while listening to the video.

If you have been wondering about types in Haskell, this is a golden opportunity.

Data: Continuous vs. Categorical

Thursday, April 18th, 2013

Data: Continuous vs. Categorical by Robert Kosara.

From the post:

Data comes in a number of different types, which determine what kinds of mapping can be used for them. The most basic distinction is that between continuous (or quantitative) and categorical data, which has a profound impact on the types of visualizations that can be used.

The main distinction is quite simple, but it has a lot of important consequences. Quantitative data is data where the values can change continuously, and you cannot count the number of different values. Examples include weight, price, profits, counts, etc. Basically, anything you can measure or count is quantitative.

Categorical data, in contrast, is for those aspects of your data where you make a distinction between different groups, and where you typically can list a small number of categories. This includes product type, gender, age group, etc.

Both quantitative and categorical data have some finer distinctions, but I will ignore those for this posting. What is more important, is: why do those make a difference for visualization?

I like the use of visualization to reinforce the notion of difference between continuous and categorical data.

Makes me wonder about using visualization to explore the use of different data types for detecting subject sameness.

It may seem trivial to use the TMDM’s sameness of subject identifiers (simple string matching) to say two or more topics represent the same subject.

But what if subject identifiers match but other properties, say gender (modeled as an occurrence), do not?

Illustrating a mistake in the use of a subject identifier but also a weakness in reliance on a subject identitier (data type URI) for subject identity.

That data type relies only one string matching for identification purposes. Which may or may not agree with your subject sameness requirements.

Redis Data Structure Cheatsheet

Tuesday, February 26th, 2013

Redis Data Cheatsheet by Brian P O’Rourke.

From the post:

Redis data structures are simple – none of them are likely to be a perfect match for the problem you’re trying to solve. But if you pick the right initial structure for your data, Redis commands can guide you toward efficient ways to get what you need.

Here’s our standard reference table for Redis datatypes, their most common uses, and their most common misuses. We’ll have follow-up posts with more details, specific use-cases (and code), but this is a handy reference:

I created a PDF version of the Redis Datatypes — Uses and Misuses.

Thinking it would be easier to reference than bookmarking a post. Any errors introduced are solely my responsibility.

I first saw this at: Alex Popescu’s Redis – Pick the Right Data Structure.

Eventually-Consistent Data Structures

Tuesday, November 13th, 2012

Eventually-Consistent Data Structures by Sean Cribbs

Summary:

Sean Cribbs discusses Convergent Replicated Data Types, data structures that tolerate eventual consistency.

Covers a number of eventually consistent data types.

Materials you may want to cover before you watch the presentation:

Safety/Liveness – from Proving the Correctness of Multiprocess Programs – Leslie Lamport (March 1977) (As a bonus, a link to all Leslie Lamport’s papers.)

Safety and liveness: Eventual consistency is not safe by Peter Ballis.

Logic and Lattices for Distributed Programming by Neil Conway, William Marczak, Peter Alvaro, Joseph M. Hellerstein, and David Maier.

A comprehensive study of Convergent and Commutative Replicated Data Types by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski.

Strong Eventual Consistency and Conflict-free Replicated Data Types by Marc Shapiro (video).

I first saw this in a tweet by Sean T. Allen.

Convergent and Commutative Replicated Data Types [Warning: Heavy Sledding Ahead]

Thursday, October 11th, 2012

A comprehensive study of Convergent and Commutative Replicated Data Types (PDF file) Marc Shapiro, Nuno M. Preguiça, Carlos Baquero, Marek Zawirski.

Abstract:

Eventual consistency aims to ensure that replicas of some mutable shared object converge without foreground synchronisation. Previous approaches to eventual consistency are ad-hoc and error-prone. We study a principled approach: to base the design of shared data types on some simple formal conditions that are sufficient to guarantee eventual consistency. We call these types Convergent or Commutative Replicated Data Types (CRDTs). This paper formalises asynchronous object replication, either state based or operation based, and provides a sufficient condition appropriate for each case. It describes several useful CRDTs, including container data types supporting both add and remove operations with clean semantics, and more complex types such as graphs, montonic DAGs, and sequences. It discusses some properties needed to implement non-trivial CRDTs.

I found this following a link in the readme for riak dt which said:

WHAT?

Currently under initial development, riak_dt is a platform for convergent data types. It’s built on riak core and deployed with riak. All of our current work is around supporting fast, replicated, eventually consistent counters (though more data types are in the repo, and on the way.) This work is based on the paper – A Comprehensive study of Convergent and Commutative Replicated Data Types – which you may find an interesting read.

WHY?

Riak’s current model for handling concurrent writes is to store sibling values and present them to the client for resolution on read. The client must encode the logic to merge these into a single, meaningful value, and then inform Riak by doing a further write. Convergent data types remove this burden from the client, as their structure guarantees they will deterministically converge to a single value. The simplest of these data types is a counter.

I haven’t thought of merging of subject representatives as a quest for “consistency” but that is one way to think about it.

The paper is forty-seven pages long and has forty-four references, most of which I suspect are necessary to fully appreciate the work.

Having said that, I suspect it will be well worth the effort.

QUDT – Quantities, Units, Dimensions and Data Types in OWL and XML

Monday, September 12th, 2011

QUDT – Quantities, Units, Dimensions and Data Types in OWL and XML

From background:

The QUDT Ontologies, and derived XML Vocabularies, are being developed by TopQuadrant and NASA. Originally, they were developed for the NASA Exploration Initiatives Ontology Models (NExIOM) project, a Constellation Program initiative at the AMES Research Center (ARC). The goals of the QUDT ontology are twofold:

  • to provide a unified model of, measurable quantities, units for measuring different kinds of quantities, the numerical values of quantities in different units of measure and the data structures and data types used to store and manipulate these objects in software;
  • to populate the model with the instance data (quantities, units, quantity values, etc.) required to meet the life-cycle needs of the Constellation Program engineering community.

If you are looking for measurements, this would be one place to start.