IT’s Next Hot Job: Hadoop Guru by Doug Henschen InformationWeek.
“We’re hiring, and we’re paying 10% more than the other guys.”
Those were the first words from Larry Feinsmith, managing director, office of the CIO, at JPMorgan Chase, in his Tuesday keynote address at Hadoop World in New York. Who JPMorgan Chase is hiring, specifically, are people with Hadoop skills, so Feinsmith was in the right place. More than 1,400 people were in the audience, and attendee polls indicated that at least three quarters of their organizations are already using Hadoop, the open source big data platform.
The “and we’re paying 10% more” bit was actually Feinsmith’s ad-libbed follow-on to the previous keynoter, Hugh Williams, VP of search, experience, and platforms at eBay. After explaining eBay’s Hadoop-based Cassini search engine project, Williams said his company is hiring Hadoop experts to help build out and run the tool.
Feinsmith’s core message was that Hadoop is hugely promising, maturing quickly, and might overlap the functionality of relational databases over the next three years. In fact, Hadoop World 2011 was a coming-out party of sorts, as it’s now clear that Hadoop will matter to more than just Web 2.0 companies like eBay, Facebook, Yahoo, AOL, and Twitter. A straight-laced financial giant with more than 245,000 employees, 24 million checking accounts, 5,500 branches, and 145 million credit cards in use, JPMorgan Chase lends huge credibility to that vision.
JP Morgan Chase has 25,000 IT employees, and it spends about $8 billion on IT each year–$4 billion on apps and $4 billion on infrastructure. The company has been working with Hadoop for more than three years, and it’s easy to see why. It has 150 petabytes (with a “p”) of data online, generated by trading operations, banking activities, credit card transactions, and some 3.5 billion logins each year to online banking and brokerage accounts.
The benefits of Hadoop? Massive scalability, schema-free flexibility to handle a variety of data types, and low cost. Hadoop systems built on commodity hardware now cost about $4,000 per node, according to Cloudera, the Hadoop enterprise support and management software provider (and the organizer and host of Hadoop World). With the latest nodes typically having 16 compute cores and 12 1-terabyte or 2-terabyte drives, that’s massive storage and compute capacity at a very low cost. In comparison, aggressively priced relational data warehouse appliances cost about $10,000 to $12,000 per terabyte.
OK, but what does Hadoop not have out of the box? Can you say cross-domain subject or data semantics? Some “expert – (insert your name)” is going to have to supply the semantics. Have to know the Hadoop ecosystem, but having a firm background in mapping between semantic domains will make you a semantic “top gun.”