How To Capitalize on Clickstream data with Hadoop by Cheryle Custer.
From the post:
In the last 60 seconds there were 1,300 new mobile users and there were 100,000 new tweets. As you contemplate what happens in an internet minute Amazon brought in $83,000 worth of sales. What would be the impact of you being able to identify:
- What is the most efficient path for a site visitor to research a product, and then buy it?
- What products do visitors tend to buy together, and what are they most likely to buy in the future?
- Where should I spend resources on fixing or enhancing the user experience on my website?
In the Hortonworks Sandbox, you can run a simulation of website Clickstream behavior to see where users are located and what they are doing on the website. This tutorial provides a dataset of a fictitious website and the behavior of the visitors on the site over a 5 day period. This is a 4 million line dataset that is easily ingested into the single node cluster of the Sandbox via HCatalog.
The first paragraph is what I would call an Economist lead-in. It captures your attention:
…60 seconds…1300 new mobile users …100,000 new tweets. …minute…Amazon…$83,000…sales.
If the Economist is your regular fare, your pulse rate went up at “1300 new mobile users” and by the minute/$83,000 you started to tingle. 😉
How to translate that for semantic technologies in general and topic maps in particular?
Remember The Monstrous Cost of Work Failure graphic?
Where we read that 58% of employees spend one-half of a workday “filing, deleting, or sorting information.”
Just to simplify the numbers, one-quarter (1/4) of your total workforce hours are spent on “filing, deleting, or sorting information.”
Divide your current payroll figure by four (4).
Does that result get your attention?
If not, call emergency services. You are dead or having a medical crisis.
Use that payroll division as:
A positive, topic maps can help you recapture some of that 1/4 of your payroll, or
A negative, topic maps can help you stem the bleeding from non-productive activity,
depending on which will be more effective with a particular client.
BTW, do read Cheryle’s post.
Hadoop’s capabilities are more limited by your imagination than any theoretical CS limit.