Archive for the ‘Datomic’ Category

Conversations With Datomic

Friday, August 21st, 2015

Conversations With Datomic by Carin Meier. (See Conversations With Datomic Part 2 as well.)

Perhaps not “new” but certainly an uncommon approach to introducing users to a database system.

Carin has a “conversation” with Datomic that starts from the very beginning of creating a database and goes forward.

Rewarding and a fun read!

Enjoy!

More Power for Datomic Datalog: Negation, Disjunction, and Range Optimizations

Monday, January 19th, 2015

More Power for Datomic Datalog: Negation, Disjunction, and Range Optimizations by Stuart Halloway.

From the post:

Today’s Datomic release includes a number of enhancements to Datomic’s Datalog query language:

  • Negation, via the new not and not-join clauses
  • Disjunction (or) without using rules, via the new or and or-join clauses
  • Required rule bindings
  • Improved optimization of range predicates

Each is described below, and you can follow the examples from the mbrainz data set in Java or in Clojure.

The four hours of video in Datomic Training Videos may not be enough for you to appreciate this post but you will soon enough!

If you want a steep walk on queries, try Datomic Queries and Rules. Not for the faint of heart.

Enjoy!

Datomic Training Videos

Monday, January 19th, 2015

Datomic Training Videos by Stu Halloway.

Part I: What is Datomic?

Part II: The Datomic Information Model

Part III: The Datomic Transaction Model

Part IV: The Datomic Query Model

Part V: The Datomic Time Model

Part VI: The Datomic Operational Model

About four (4) hours of videos with classroom materials, slides, etc.

OK, it’s not Downton Abbey but if you missed it last night you have a week to kill before it comes on TV again. May as well learn something while you wait. 😉

Pay particular attention to the time model in Datomic. Then ask yourself (intelligence community): Why can’t I do that with my database? (Insert your answer as a comment, leaving out classified details.)

A bonus question: What role should Stu play on Downton Abbey?

Datomic 0.9.5078 now available

Tuesday, November 25th, 2014

Datomic 0.9.5078 now available by Ben Kamphaus.

From the post:

This message covers changes in this release. For a summary of critical release notices, see http://docs.datomic.com/release-notices.html.

The Datomic Team recommends that you always take a backup before adopting a new release.

## Changed in 0.9.5078

  • New CloudWatch metrics: `WriterMemcachedPutMusec`, `WriterMemcachedPutFailedMusec` `ReaderMemcachedPutMusec` and `ReaderMemcachedPutFailedMusec` track writes to memcache. See http://docs.datomic.com/caching.html#memcached
  • Improvement: Better startup performance for databases using fulltext.
  • Improvement: Enhanced the Getting Started examples to include the Pull API and find specifications.
  • Improvement: Better scheduling of indexing jobs during bursty transaction volumes
  • Fixed bug where Pull API could incorrectly return renamed attributes.
  • Fixed bug that caused `db.fn/cas` to throw an exception when `false` was passed as the new value.

http://www.datomic.com/get-datomic.html

In case you haven’t walked through Datomic, you really should.

Here is one example why:

Next download the subset of the mbrainz database covering the period 1968-1973 (which the Datomic team has scientifically determined as being the most important period in the history of recorded music): [From: https://github.com/Datomic/mbrainz-sample]

Truer words were never spoken! 😉

Enjoy!

Datomic Pull API

Tuesday, October 28th, 2014

Datomic Pull API by Stuart Holloway.

From the post:

Datomic‘s new Pull API is a declarative way to make hierarchical selections of information about entities. You supply a pattern to specify which attributes of the entity (and nested entities) you want to pull, and db.pull returns a map for each entity.

Pull API vs. Entity API

The Pull API has two important advantages over the existing Entity API:

Pull uses a declarative, data-driven spec, whereas Entity encourages building results via code. Data-driven specs are easier to build, compose, transmit and store. Pull patterns are smaller than entity code that does the same job, and can be easier to understand and maintain.

Pull API results match standard collection interfaces (e.g. Java maps) in programming languages, where Entity results do not. This eliminates the need for an additional allocation/transformation step per entity.

A sign that it is time to catch up on what has been happening with Datomic!

Exposing Resources in Datomic…

Wednesday, August 20th, 2014

Exposing Resources in Datomic Using Linked Data by Ratan Sebastian.

From the post:

Financial data feeds from various data providers tend to be closed off from most people due to high costs, licensing agreements, obscure documentation, and complicated business logic. The problem of understanding this data, and providing access to it for our application is something that we (and many others) have had to solve over and over again. Recently at Pellucid we were faced with three concrete problems

  1. Adding a new data set to make data visualizations with. This one was a high-dimensional data set and we were certain that the queries that would be needed to make the charts had to be very parameterizable.

  2. We were starting to come to terms with the difficulty of answering support questions about the data we use in our charts given that we were serving up the data using a Finagle service that spoke a binary protocol over TCP. Support staff should not have to learn Datomic’s highly expressive query language, Datalog or have to set up a Scala console to look at the raw data that was being served up.

  3. Different data sets that we use had semantically equivalent data that was being accessed in ways specific to that data set.

And as a long-term goal we wanted to be able to query across data sets instead of doing multiple queries and joining in memory.

These are very orthogonal goals to be sure. We embarked on a project which we thought might move us in those three directions simultaneously. We’d already ingested the data set from the raw file format into Datomic, which we love. Goal 2 was easily addressable by conveying data over a more accessible protocol. And what’s more accessible than REST. Goal 1 meant that we’d have expose quite a bit of Datalog expressivity to be able to write all the queries we needed. And Goal 3 hinted at the need for some way to talk about things in different data silos using a common vocabulary. Enter the Linked Data Platform. A W3C project, the need for which is brilliantly covered in this talk. What’s the connection? Wait for it…

The RDF Datomic Mapping

If you are happy with Datomic and RDF primitives, for semantic purposes, this may be all you need.

You have to appreciate Ratan’s closing sentiments:

We believe that a shared ontology of financial data could be very beneficial to many and open up the normally closeted world of handling financial data.

Even though we know as a practical matter that no “shared ontology of financial data” is likely to emerge.

In the absence of such a shared ontology, there are always topic maps.

Using Datomic as a Graph Database

Thursday, April 10th, 2014

Using Datomic as a Graph Database by Joshua Davey.

From the post:

Datomic is a database that changes the way that you think about databases. It also happens to be effective at modeling graph data and was a great fit for performing graph traversal in a recent project I built.

I started out building kevinbacon.us using Neo4j, a popular open-source graph database. It worked very well for actors that were a few hops away, but finding paths between actors with more than 5 hops proved problematic. The cypher query language gave me little visibility into the graph algorithms actually being executed. I wanted more.

Despite not being explicitly labeled as such, Datomic proved to be an effective graph database. Its ability to arbitrarily traverse datoms, when paired with the appropriate graph searching algorithm, solved my problem elegantly. This technique ended up being fast as well.

Quick aside: this post assumes a cursory understanding of Datomic. I won’t cover the basics, but the official tutorial will help you get started.
….

If you are interested in Datomic, Clojure, functional programming, or graphs, this is a must read for you.

Not to spoil any surprises but Joshua ends up with excellent performance.

I first saw this in a tweet by Atabey Kaygun.

Datomic R-trees

Thursday, February 6th, 2014

Datomic R-trees by James Sofra.

From the description:

Slides for a talk given at Melbourne Functional Users Group on an R-tree based spatial indexer for Datomic.

The slides do a good job explaining the advantages of Datomic for spatial data and using R-trees with it.

References from the slides that you will find helpful:

R-TREES. A Dynamic Index Structure for Spatial Searching. A. Guttman (1984)

Sort-based query-adaptive loading of R-trees, Daniar Achakeev, Bernhard Seeger, Peter Widmayer. (2012)

Sort-based parallel loading of R-trees, Daniar Achakeev, Marc Seidemann, Markus Schmidt, Bernhard Seeger. (2012)

The R*-tree: an efficient and robust access method for points and rectangles (1990), by Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger. (1990)

OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for R-tree, Taewon Lee, Sukho Lee. (2003)

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree, Lars Arge. (2004)

Compact Hilbert indices, Christopher Hamilton. (2006)

R-Trees: Theory and Applications, Yannis Manolopoulos, Alexandros Nanopoulos, Apostolos N. Papadopoulos and Yannis Theodoridis. (2006)

See also: https://github.com/jsofra/datomic-rtree

Schema Alteration [You asked for it.]

Thursday, January 23rd, 2014

Schema Alteration by Christopher Redinger.

From the post:

Datomic is a database that has flexible, minimal schema. Starting with version 0.9.4470, available here, we have added the ability to alter existing schema attributes after they are first defined. You can alter schema to

  • rename attributes
  • rename your own programmatic identities (uses of :db/ident)
  • add or remove indexes
  • add or remove uniqueness constraints
  • change attribute cardinality
  • change whether history is retained for an attribute
  • change whether an attribute is treated as a component

Schema alterations use the same transaction API as all other transactions, just as schema installation does. All schema alterations can be performed while a database is online, without requiring database downtime. Most schema changes are effective immediately, at the end of the transaction. There is one exception: adding an index requires a background job to build the new index. You can use the new syncSchema API for detecting when a schema change is available.

When renaming an attribute or identity, you can continue to use the old name as long as you haven’t repurposed it. This allows for incremental application updating.

See the schema alteration docs for the details.

Schema alteration has been our most requested enhancement. We hope you find it useful and look forward to your feedback.

Well? Go alter some schemas (non-production would be my first choice) and see what happens. 😉

Erik Meijer and Rich Hickey – Clojure and Datomic

Sunday, November 10th, 2013

Expert to Expert: Erik Meijer and Rich Hickey – Clojure and Datomic

From the description:

At GOTO Chicago Functional Programming Night, Erik Meijer and Rich Hickey sat down for a chat about the latest in Rich’s programming language, Clojure, and also a had short discussion about one of Rich’s latest projects, Datomic, a database written in Clojure. Always a pleasure to get a few titans together for a random discussion. Thank you Erik and Rich!

A bit dated (2012) but very enjoyable!

Making Sense Out of Datomic,…

Friday, June 28th, 2013

Making Sense Out of Datomic, The Revolutionary Non-NoSQL Database by Jakub Holy.

From the post:

I have finally managed to understand one of the most unusual databases of today, Datomic, and would like to share it with you. Thanks to Stuart Halloway and his workshop!

Why? Why?!?

As we shall see shortly, Datomic is very different from the traditional RDBMS databases as well as the various NoSQL databases. It even isn’t a database – it is a database on top of a database. I couldn’t wrap my head around that until now. The key to the understanding of Datomic and its unique design and advantages is actually simple.

The mainstream databases (and languages) have been designed around the following constraints of 1970s:

  • memory is expensive
  • storage is expensive
  • it is necessary to use dedicated, expensive machines

Datomic is essentially an exploration of what database we would have designed if we hadn’t these constraints. What design would we choose having gigabytes of RAM, networks with bandwidth and speed matching and exceeding harddisk access, the ability to spin and kill servers at a whim.

But Datomic isn’t an academical project. It is pragmatic, it wants to fit into our existing environments and make it easy for us to start using its futuristic capabilities now. And it is not as fresh and green as it might seem. Rich Hickey, the master mind behind Clojure and Datomic, has reportedly thought about both these projects for years and the designs have been really well thought through.

(…)

Deeply interesting summary of Datomic.

The only point I would have added about traditional databases was the requirement for normalized data. Placing load on designers and users instead of the software.

Excision [Forgetting But Remembering You Forgot (Datomic)]

Tuesday, June 4th, 2013

Excision

From the post:

It is a key value proposition of Datomic that you can tell not only what you know, but how you came to know it. When you add a fact:

conn.transact(list(":db/add", 42, ":firstName", "John"));

Datomic does more than merely record that 42‘s first name is “John“. Each datom is also associated with a transaction entity, which records the moment (:db/txInstant) the datom was recorded.

(…)

Given this information model, it is easy to see that Datomic can support queries that tell you:

  • what you know now
  • what you knew at some point in the past
  • how and when you came to know any particular datom

So far so good, but there is a fly in the ointment. In certain situations you may be forced to excise data, pulling it out root and branch and forgetting that you ever knew it. This may happen if you store data that must comply with privacy or IP laws, or you may have a regulatory requirement to keep records for seven years and then “shred” them. For these scenarios, Datomic provides excision.

One approach to the unanswered question of what does it means to delete something from a topic map?

Especially interesting because you can play with the answer that Datomic provides.

Doesn’t address the issue of what it means to delete a topic that has caused other topics to merge.

I first saw this in Christophe Lalanne’s A bag of tweets / May 2013.

Should Business Data Have An Audit Trail?

Thursday, March 21st, 2013

The “second slide” I would lead with from Stuart Halloway’s Datomic, and How We Built It would be:

Should Business Data Have An Audit Trail?

Actually Stuart’s slide #65 but who’s counting? 😉

Stuart points out the irony of git, saying:

developer data is important enough to have an audit trail, but business data is not

Whether business data should always have an audit trail would attract shouts of yes and no, depending on the audience.

Regulators, prosecutors, good government types, etc., mostly shouting yes.

Regulated businesses, security brokers, elected officials, etc., mostly shouting no.

Some in between.

Datomic, which has some common characteristics with topic maps, gives you the ability to answer these questions:

  • Do you want auditable business data or not?
  • If yes to auditable business data, to what degree?

Rather different that just assuming it isn’t possible.

Abstract:

Datomic is a database of flexible, time-based facts, supporting queries and joins, with elastic scalability and ACID transactions. Datomic queries run your application process, giving you both declarative and navigational access to your data. Datomic facts (“datoms”) are time-aware and distributed to all system peers, enabling OLTP, analytics, and detailed auditing in real time from a single system.

In this talk, I will begin with an overview of Datomic, covering the problems that it is intended to solve and how its data model, transaction model, query model, and deployment model work together to solve those problems. I will then use Datomic to illustrate more general points about designing and implementing production software, and where I believe our industry is headed. Key points include:

  • the pragmatic adoption of functional programming
  • how dynamic languages fare in mission- and performance- critical settings
  • the importance of data, and the perils of OO
  • the irony of git, or why developers give themselves better databases than they give their customers
  • perception, coordination, and reducing the barriers to scale

Resources

  • Video from CME Group Technology Conference 2012
  • Slides from CME Group Technology Conference 2012

Davy Suvee on FluxGraph – Towards a time aware graph built on Datomic

Saturday, February 2nd, 2013

Davy Suvee on FluxGraph – Towards a time aware graph built on Datomic by René Pickhardt.

From the post:

Davy really nicely introduced the problem of looking at a snapshot of a data base. This problem obviously exists for any data base technology. You have a lot of timestamped records but running a query as if you fired it a couple of month ago is always a difficult challange.

With FluxGraph a solution to this is introduced.

How I understood him in the talk he introduces new versions of a vertex or an edge everytime it gets updated, added or removed. So far I am wondering about scaling and runtime. This approach seems like a lot of overhead to me. Later during Q & A I began to have the feeling that he has a more efficient way of storing this information so I really have to get in touch with davy to rediscuss the internals.

FluxGraph anyway provides a very clean API to access these temporal information.

FluxGraph at GitHub.

Time is an obvious issue in any business or medical context.

But also important when the news hounds ask: “Who knew what when?”

And there you may have personal relationships, meetings, communications, etc.

Clojure/Datomic creator Rich Hickey on Deconstructing the Database

Saturday, August 25th, 2012

Clojure/Datomic creator Rich Hickey on Deconstructing the Database

From the description:

Rich Hickey, author of Clojure, and designer of Datomic presents a new way to look at database architectures in this talk from JaxConf 2012. What happens when you deconstruct the traditional monolithic database – separating transaction processing, storage and query into independent cooperating services? Coupled with a data model based around atomic facts and awareness of time, you get a significantly different set of capabilities and tradeoffs. This talk with discuss how these ideas play out in the design and architecture of Datomic, a new database for the JVM.

I truly appreciate the description of database updates as “a miracle occurs.”

There is much to enjoy and consider here.

Linked Lists in Datomic [Herein of tolog and Neo4j]

Tuesday, August 21st, 2012

Linked Lists in Datomic by Joachim Hofer.

From the post:

As my last contact with Prolog was over ten years ago, I think it’s time for some fun with Datomic and Datalog. In order to learn to know Datomic better, I will attempt to implement linked lists as a Datomic data structure.

First, I need a database “schema”, which in Datomic means that I have to define a few attributes. I’ll define one :content/name (as a string) for naming my list items, and also the attributes for the list data structure itself, namely :linkedList/head and :linkedList/tail (both are refs):

You may or may not know that tolog, a topic map query language, was inspired in part by Datalog. Understanding Datalog could lead to new insights into tolog.

The other reason to mention this post is that Neo4j uses linked lists as part of its internal data structure.

If I am reading slide 9 (Neo4J Internals (update)) correctly, relationships are hard coded to have start/end nodes (singletons).

Not going to squeeze hyperedges out of that data structure.

What if you replaced the start/end node values with key/value pair as membership criteria for membership in the hyperedge?

Even if most nodes have only start/end nodes meeting a membership criteria, would free you up to have hyperedges when needed.

Will have to look at the implementation details on hyperedges/nodes to see. Suspect others have found better solutions.

Datomic Free Edition

Tuesday, July 24th, 2012

Datomic Free Edition

From the post:

We’re happy to announce today the release of Datomic Free Edition. This edition is oriented around making Datomic easier to get, and use, for open source and smaller production deployments.

  • Datomic Free Edition is … free!
  • The system supports transactor-local storage
  • The peer library includes a memory database and Datomic Datalog
  • The Free transactor and peers are freely redistributable
  • The transactor supports 2 simultaneous peers

Of particular note here is that Datomic Free Edition comes with a redistributable license, and does not require a personal/business-specific license from us. That means you can download Datomic Free, build e.g. an open source application with it, and ship/include Datomic Free binaries with your software. You can also put the Datomic Free bits into public repositories and package managers (as long as you retain the licenses and copyright notices).

There is a ton of capability included in the Free Edition, including the Datomic in-process memory database (great for testing), and the Datomic datalog engine, which works on both Datomic databases and in-memory collections. That’s right, free datalog for everyone.

You can use Datomic Free Edition in production, and you can use it in commercial applications.

Get Datomic!

I first saw this at Alex Popescu’s myNoSQL.

Thinking in Datomic: Your data is not square

Tuesday, July 10th, 2012

Thinking in Datomic: Your data is not square by Pelle Braendgaard.

From the post:

Datomic is so different than regular databases that your average developer will probably choose to ignore it. But for the developer and startup who takes the time to understand it properly I think it can be a real unfair advantage as a choice for a data layer in your application.

In this article I will deal with the core fundamental definition of how data is stored in Datomic. This is very different from all other databases so before we even deal with querying and transactions I think it’s a good idea to look at it.

Yawn, “your data is not square.” 😉 Just teasing.

But we have all heard the criticism of relational tables. I think writers can assume that much, at least in technical forums.

The lasting value of the NoSQL movement (in addition to whichever software packages survive) will be its emphasis on analysis of your data. Your data may fit perfectly well into a square but you need to decide that after looking at your data, not before.

The same can be said about the various NoSQL offerings. Your data may or may not be suited for a particular NoSQL option. The data analysis “cat being out of the bag,” it should be applied to NoSQL options as well. True, almost any option will work, your question should be why is option X the best option for my data/use case?

Distributed Temporal Graph Database Using Datomic

Saturday, April 21st, 2012

Distributed Temporal Graph Database Using Datomic

Post by Alex Popescu calling out construction of a “distributed temporal graph database.”

Temporal used in the sense of timestamping entries in the database.

Beyond such uses, beware, there be dragons.

Temporal modeling isn’t for the faint of heart.

Datomic

Wednesday, March 7th, 2012

Michael Popescu (myNoSQL) has a couple of posts on resources for Datomic.

Intro Videos to Datomic and Datomic Datalog

and,

Datomic: Distributed Database Designed to Enable Scalable, Flexible and Intelligent Applications, Running on Next-Generation Cloud Architectures

I commend the materials you will find there but the white paper in particular, which has the following section:

ATOMIC DATA – THE DATOM

Once you are storing facts, it becomes imperative to choose an appropriate granularity for facts. If you want to record the fact that Sally likes pizza, how best to do so? Most databases require you to update either the Sally record or document, or the set of foods liked by Sally, or the set of likers of pizza. These kind of representational issues complicate and rigidify applications using relational and document models. This can be avoided by recording facts as independent atoms of information. Datomic calls such atomic facts ‘datoms‘. A datom consists of an entity, attribute, value and transaction (time). In this way, any of those sets can be discovered via query, without embedding them into a structural storage model that must be known by applications.

In some views of granularity, the datom “atom” looks like a four-atom molecule to me. 😉 Not to mention that entities/attributes and values can have relationships that don’t involve each other.