Archive for the ‘Web Analytics’ Category

Data Scientist Solution Kit

Friday, March 7th, 2014

Data Scientist Solution Kit

From the post:

The explosion of data is leading to new business opportunities that draw on advanced analytics and require a broader, more sophisticated skills set, including software development, data engineering, math and statistics, subject matter expertise, and fluency in a variety of analytics tools. Brought together by data scientists, these capabilities can lead to deeper market insights, more focused product innovation, faster anomaly detection, and more effective customer engagement for the business.

The Data Science Challenge Solution Kit is your best resource to get hands-on experience with a real-world data science challenge in a self-paced, learner-centric environment. The free solution kit includes a live data set, a step-by-step tutorial, and a detailed explanation of the processes required to arrive at the correct outcomes.

Data Science at Your Desk

The Web Analytics Challenge includes five sections that simulate the experience of exploring, then cleaning, and ultimately analyzing web log data. First, you will work through some of the common issues a data scientist encounters with log data and data in JSON format. Second, you will clean and prepare the data for modeling. Third, you will develop an alternate approach to building a classifier, with a focus on data structure and accuracy. Fourth, you will learn how to use tools like Cloudera ML to discover clusters within a data set. Finally, you will select an optimal recommender algorithm and extract ratings predictions using Apache Mahout.

With the ongoing confusion about what it means to be a “data scientist,” having a certification or two isn’t going to hurt your chances for employment.

And you may learn something in the bargain. 😉

Using Neo4J for Website Analytics

Saturday, January 25th, 2014

Using Neo4J for Website Analytics by Francesco Gallarotti.

From the post:

Working at the office customizing and installing different content management systems (CMS) for some of our clients, I have seen different ways of tracking users and then using the collected data to:

  1. generate analytics reports
  2. personalize content

I am not talking about simple Google Analytics data. I am referring to ways to map users into predefined personas and then modify the content of the site based on what that persona is interested into.

Interesting discussion of tracking users for web analytics with a graph database.

Not NSA grade tracking because users are collapsed into predefined personas. Personas limit the granularity of your tracking.

On the other hand, if that is all the granularity that is required, personas allow you to avoid a lot of “merge” statements that test for the prior existence of a user in the graph.

Depending on the circumstances, I would create new nodes for each visit by a user, reasoning it is quicker to stream the data and later combine for specific users, if desired. Defining “personas” on the fly from the pages visited and ignoring the individual users.

Thinking I can always ignore granularity I don’t need but once lost, granularity is forever lost.

…Creating Reliable Billion Page View Web Services

Thursday, August 9th, 2012

High Scalability reports in 3 Tips and Tools for Creating Reliable Billion Page View Web Services an article by Amir Salihefendic that suggests:

  • Realtime monitor everything
  • Be proactive
  • Be notified when crashes happen

Are three tips to follow on the hunt to a reliable billion page view web service.

I’m a few short of that number but it was still an interesting post. 😉

And you can’t ever tell, might snag a client that is more likely to reach those numbers.

Probabilistic Data Structures for Web Analytics and Data Mining

Friday, July 27th, 2012

Probabilistic Data Structures for Web Analytics and Data Mining by Ilya Katsov.

Speaking of scalability, consider:

Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and poor applicability to realtime use cases. On the other hand, when one is interested only in simple additive metrics like total page views or average price of conversion, it is obvious that raw data can be efficiently summarized, for example, on a daily basis or using simple in-stream counters. Computation of more advanced metrics like a number of unique visitor or most frequent items is more challenging and requires a lot of resources if implemented straightforwardly. In this article, I provide an overview of probabilistic data structures that allow one to estimate these and many other metrics and trade precision of the estimations for the memory consumption. These data structures can be used both as temporary data accumulators in query processing procedures and, perhaps more important, as a compact – sometimes astonishingly compact – replacement of raw data in stream-based computing.

For some subjects, we have probabilistic identifications, based upon data that is too voluminous or rapid to allow for a “definitive” identification.

The techniques introduced here will give you a grounding in data structures to deal with those situations. Interesting reading.

I saw this in Christophe Lalanne’s Bag of Tweets for July 2012.

All Presentation Software is Broken

Thursday, May 17th, 2012

All Presentation Software is Broken by Ilya Grigorik.

From the post:

Whenever the point I’m trying to make lacks clarity, I often find myself trying to dress it up: fade in the points, slide in the chart, make prettier graphics. It is a great tell when you catch yourself doing it. Conversely, I have yet to see a presentation or a slide that could not have been made better by stripping the unnecessary visual dressing. Simple slides require hard work and a higher level of clarity and confidence from the presenter.

All presentation software is broken. Instead of helping you become a better speaker, we are competing on the depth of transition libraries, text effects, and 3D animations. Prezi takes the trophy. As far as I can tell, it is optimized for precisely one thing: generating nausea.

Next Presentation Platform: Browser

If you want your message to travel, then the browser is your (future) presentation platform of choice. No proprietary formats, no conversion nightmares, instant access from billions of devices, easy sharing, and more. Granted, the frameworks and the authoring tools are still lacking, but that is only a matter of time.

Unfortunately, we are off to a false start. Instead of trying to make the presenter more effective, we are too busy trying to replicate the arsenal of useless visual transitions with the HTML5, CSS3 and WebGL stacks. Spinning WebGL cubes and CSS transitions make for a fun technology demo but add zero value – someone, please, stop the insanity. We have web connectivity, ability to build interactive slides, and get realtime feedback and analytics from the audience. There is nothing to prove by imitating the broken features of PowerPoint and Keynote, let’s leverage the strengths of the web platform instead. (emphasis added)

Imagine that. Testing your slides. Sounds like testing software before it is released to paying customers.

Test your slides on a real audience before a conference or meeting with your board or important client. What a novel concept.

By “real audience” I mean someone other than yourself or one of your office mates.

When you are tempted to say, “they just don’t understand….,” substitute, “I didn’t explain …. well.” (Depends on whether you want to feel smart or be an effective communicator. Your call.)

Presentation software isn’t fixable.

Presenters on the other hand, maybe.

But you have to fix yourself, no one can do it for you.

The 2015 Digital Marketing Rule Book. Change or Perish.

Monday, January 9th, 2012

The 2015 Digital Marketing Rule Book. Change or Perish.

Avinash Kaushik writes:

It is the season to be predicting the future, but that is almost always a career-limiting move. So I’m not going to do that.

It is a lot easier to predict the present. So I’m not going to do that either.

Rather, I’m going to share a clump of realities/rules garnered from the present to help ready you for the predictable near future . Now here is the great part… if you follow these rules and act on these insights I believe you’ll be significantly better prepared for the unpredictable future.

Awesome right?

Now here’s another surprise: These rules/insights/mind shifts are not about data!

He covers a lot of interesting ground to conclude:

Do you agree with my learning that our primary problem is not web analytics/data but, rather, it is unimaginative web strategies?

My “take away” was much earlier in his post:

All while constantly optimizing your portfolio via controlled experiments .

For me the primary problem is two-fold:

  • web analytics/data as understood by management (not the users they are trying to reach), and
  • unimaginative web strategies

How can you have an imaginative or even intelligible web strategy unless and until you understand user behavior or their understanding of the data?

See my post on testing relevance tuning with the top ten actresses for 2011 as an example of questioning web analytics.