Big Data analytics with Hive and iReport
From the post:
Each J.J. Abrams’ TV series Person of Interest episode starts with the following narration from Mr. Finch one of the leading characters: “You are being watched. The government has a secret system–a machine that spies on you every hour of every day. I know because…I built it.” Of course us technical people know better. It would take a huge team of electrical and software engineers many years to build such a high performing machine and the budget would be unimaginable… or wouldn’t be? Wait a second we have Hadoop! Now everyone of us can be Mr. Finch for a modest budget thanks to Hadoop.
In JCG article “Hadoop Modes Explained – Standalone, Pseudo Distributed, Distributed” JCG partner Rahul Patodi explained how to setup Hadoop. The Hadoop project has produced a lot of tools for analyzing semi-structured data but Hive is perhaps the most intuitive among them as it allows anyone with an SQL background to submit MapReduce jobs described as SQL queries. Hive can be executed from a command line interface, as well as run in a server mode with a Thrift client acting as a JDBC/ODBC interface giving access to data analysis and reporting applications.
In this article we will set up a Hive Server, create a table, load it with data from a text file and then create a Jasper Resport using iReport. The Jasper Report executes an SQL query on the Hive Server that is then translated to a MapReduce job executed by Hadoop.
Just in case you have ever wanted to play the role of “Big Brother.” 😉
On the other hand, the old adage about a good defense being a good offense may well be true.
Competing with other governments, organizations, companies, agencies or even inside them.