Extending Data Beyond the Database – The Notion of “State” by David Loshin
From the post:
In my last post, I essentially suggested that there is a difference between merging two static data sets and merging static data sets with dynamic ones. It is worth providing a more concrete example to demonstrate what I really mean by this idea: let’s say you had a single analytical database containing customer profile information (we’ll call this data set “Profiles”), but at the same time had access to a stream of web page transactions performed by individuals identified as customers (we can refer to this one as “WebStream”).
The challenge is that the WebStream data set may contain information with different degrees of believability. If an event can be verified as the result of a sequence of web transactions within a limited time frame, the resulting data should lead to an update of the Profiles data set. On the other hand, if the sequence does not take place, or takes place over an extended time frame, there is not enough “support” for the update and therefore the potential modification is dropped. For example, if a visitor places a set of items into a shopping cart and completes a purchase, the customer’s preferences are updated based on the items selected and purchased. But if the cart is abandoned and not picked up within 2 hours, the customer’s preferences may not be updated.
Because the update is conditional on a number of different variables, the system must hold into some data until it can either be determined that the preferences are updated or not. We can refer to this as maintaining some temporary state that either resolves into a modification to the Profiles data set or is thrown out after 2 hours.
Are your data sets static or dynamic? And if dynamic, how do you delay merging until some other criteria is met?
The first article David refers to is: Data Quality and State.
Interesting that as soon as we step away from static files and data, the world explodes in complexity. Add to that dynamic notions of identity and recognition and complexity seems like an inadequate term for what we face.
Be mindful those are just slices of what people automatically process all day long. Fix your requirements and build to spec. Leave the “real world” to wetware.