Big Data Defined by Russell Jurney.
From the post:
Specifically, a Big Data system has four properties:
- It uses local storage to be fast but inexpensive
- It uses clusters of commodity hardware to be inexpensive
- It uses free software to be inexpensive
- It is open source to avoid expensive vendor lock-in
It has been raining all day but I had to laugh when I saw Russell’s definition of “a Big Data system.”
Does it remind you of any particular player in the Big Data pack? 😉
That’s one way to build marketshare, you define yourself to be the measuring stick.
Let’s walk through the list and see what comments or alternatives suggest themselves:
- It uses local storage to be fast but inexpensive
[What? No cloud? Have you compared all the cost of local hardware against the cloud?]
- It uses clusters of commodity hardware to be inexpensive
[Wonder why NCSA build Blue Waters “from Cray hardware, operates at a sustained performance of more than 1 petaflop (1 quadrillion calculations per second) and is capable of peak performance of 11.61 petaflops (11.6 quadrillion calculations per second).” Must not be “big data.]
- It uses free software to be inexpensive
[They say that so often. I wonder what they are using as a basis for comparison? LaTeX versus MS Word? Have you paid anyone to typeset a paper in LaTeX versus asking your staff to type it in MS Word?]
- It is open source to avoid expensive vendor lock-in
[Actually it is open formats that avoid vendor lock-in, expensive or otherwise]
I enjoy a bit of marketing fluff as much as the next person but it should at least be plausible.