The Data Engineering Ecosystem: An Interactive Map by David Drummond and John Joo.
From the post:
Companies, non-profit organizations, and governments are all starting to realize the huge value that data can provide to customers, decision makers, and concerned citizens. What is often neglected is the amount of engineering required to make that data accessible. Simply using SQL is no longer an option for large, unstructured, or real-time data. Building a system that makes data usable becomes a monumental challenge for data engineers.
There is no plug and play solution that solves every use case. A data pipeline meant for serving ads will look very different from a data pipeline meant for retail analytics. Since there are unlimited permutations of open-source technologies that can be cobbled together, it can be overwhelming when you first encounter them. What do all these tools do and how do they fit into the ecosystem?
Insight Data Engineering Fellows face these same questions when they begin working on their data pipelines. Fortunately, after several iterations of the Insight Data Engineering Program, we have developed this framework for visualizing a typical pipeline and the various data engineering tools. Along with the framework, we have included a set of tools for each category in the interactive map.
This looks quite handy if you are studying for a certification test and need to know the components and a brief bit about each one.
For engineering purposes, it would be even better if you could connect your pieces together and then map the data flows through the pipelines. That is where did the data previously held in table X go during each step and what operations were performed on it? Not to mention being able to track an individual datum through the process.
Is there a tool that I haven’t seen or overlooked that allows that type of insight into a data pipeline? With subject identities of course for the various subjects along the way.