Polyglot Data Management – Big Data Everywhere Recap by Michele Nemschoff.
From the post:
At the Big Data Everywhere conference held in Atlanta, Senior Software Engineer Mike Davis and Senior Solution Architect Matt Anderson from Liaison Technologies gave an in-depth talk titled “Polyglot Data Management,” where they discussed how to build a polyglot data management platform that gives users the flexibility to choose the right tool for the job, instead of being forced into a solution that might not be optimal. They discussed the makeup of an enterprise data management platform and how it can be leveraged to meet a wide variety of business use cases in a scalable, supportable, and configurable way.
Matt began the talk by describing the three components that make up a data management system: structure, governance and performance. “Person data” was presented as a good example when thinking about these different components, as it includes demographic information, sensitive information such as social security numbers and credit card information, as well as public information such as Facebook posts, tweets, and YouTube videos. The data management system components include:
It’s a vendor pitch so read with care but it comes closer than any other pitch I have seen to capturing the dynamic nature of data. Data isn’t the same from every source and you treat it the same at your peril.
If I had to say the pitch has a theme it is to adapt your solutions to your data and goals, not the other way around.
The one place where I may depart from the pitch is on the meaning of “normalization.” True enough we may want to normalize data a particular way this week, this month, but that should no preclude us from other “normalizations” should our data or requirements change.
The danger I see in “normalization” is that the cost of changing static ontologies, schemas, etc., leads to their continued use long after they have passed their discard dates. If you are as flexible with regard to your information structures as you are your data, then new data or requirements are easier to accommodate.
Or to put it differently, what is the use of being flexible with data if you intend to imprison it in a fixed labyrinth?