100 open source Big Data architecture papers for data professionals

100 open source Big Data architecture papers for data professionals by Anil Madan.

From the post:

Big Data technology has been extremely disruptive with open source playing a dominant role in shaping its evolution. While on one hand it has been disruptive, on the other it has led to a complex ecosystem where new frameworks, libraries and tools are being released pretty much every day, creating confusion as technologists struggle and grapple with the deluge.

If you are a Big Data enthusiast or a technologist ramping up (or scratching your head), it is important to spend some serious time deeply understanding the architecture of key systems to appreciate its evolution. Understanding the architectural components and subtleties would also help you choose and apply the appropriate technology for your use case. In my journey over the last few years, some literature has helped me become a better educated data professional. My goal here is to not only share the literature but consequently also use the opportunity to put some sanity into the labyrinth of open source systems.

One caution, most of the reference literature included is hugely skewed towards deep architecture overview (in most cases original research papers) than simply provide you with basic overview. I firmly believe that deep dive will fundamentally help you understand the nuances, though would not provide you with any shortcuts, if you want to get a quick basic overview.

Jumping right in…

You will have a great background in Big Data if you read all one hundred (100) papers.

What you will be missing is an overview that ties the many concepts and terms together into a coherent narrative.

Perhaps after reading all 100 papers, you will start over to map the terms and concepts one to the other.

That would both useful and controversial within the field of Big Data!


I first saw this in a tweet by Kirk Borne.

Comments are closed.