Measuring User Retention with Hadoop and Hive by Daniel Russo.
From the post:
The Hadoop ecosystem is comprised of numerous technologies that can work together to provide a powerful and scalable mechanism for analyzing and deriving insight from large quantities of data.
In an effort to showcase the flexibility and raw power of queries that can be performed over large datasets stored in Hadoop, this post is written to demonstrate an example use case. The specific goal is to produce data related to user retention, an important metric for all product companies to analyze and understand.
Motivation: Why User Retention?
Broadly speaking, when equipped with the appropriate tools and data, we can enable our team and our customers to better understand the factors that drive user engagement and to ultimately make decisions that deliver better products to market.
User retention measures speak to the core of product quality by answering a crucial question about how the product resonates with users. In the case of apps (mobile or otherwise), that question is: “how many days does it take for users to stop using (or uninstall) the app?”.
Pinch Media (now Flurry) delivered a formative presentation early in the AppStore’s history. Among numerous insights collected from their dataset was the following slide, which detailed patterns in user retention across all apps implementing their tracking SDK:
I mention this example because:
- User retention is the measure of an app’s success or failure.*
- Hadoop and Hive skill sets are good ones pick up.
* I have a pronounced fondness for requirements and the documenting of the same. Others prefer unit/user/interface/final tests. Still others prefer formal proofs of “correctness.” All pale beside the test of “user retention.” If users keep using an application, what other measure would be meaningful?