Study Stacks MySQL, MapReduce and Hive
From the post:
Many small and medium sized businesses would like to get in on the big data game but do not have the resources to implement parallel database management systems. That being the case, which relational database management system would provide small businesses the highest performance?
This question was asked and answered by Marissa Hollingsworth of Boise State University in a graduate case study that compared the performance rates of MySQL, Hadoop MapReduce, and Hive at scales no larger than nine gigabytes.
Hollingsworth also used only relational data, such as payment information, which stands to reason since anything more would require a parallel system. “This experiment,” said Hollingsworth “involved a payment history analysis which considers customer, account, and transaction data for predictive analytics.”
The case study, the full text of which can be found here, concluded that MapReduce would beat out MySQL and Hive for datasets larger than one gigabyte. As Hollingsworth wrote, “The results show that the single server MySQL solution performs best for trial sizes ranging from 200MB to 1GB, but does not scale well beyond that. MapReduce outperforms MySQL on data sets larger than 1GB and Hive outperforms MySQL on sets larger than 2GB.”
Although your friends may not admit it, some of them have small data. Or interact with clients with small data.
You print this post out and put it in their inbox. Anonymously. They will appreciate it even if they can’t acknowledge having seen it.
When thinking about data and data storage, you might want to keep the comparisons you will find at: How much is 1 byte, kilobyte, megabyte, gigabyte, etc.? in mind.
Roughly speaking, 1 GB is the equivalent of 4,473 books.
The 10 GB limit in this study is roughly 44,730 books.
Sometimes all you need is small data.