Archive for the ‘Exploratory Data Analysis’ Category

Facebook teaches you exploratory data analysis with R

Monday, May 12th, 2014

Facebook teaches you exploratory data analysis with R by David Smith.

From the post:

Facebook is a company that deals with a lot of data — more than 500 terabytes a day — and R is widely used at Facebook to visualize and analyze that data. Applications of R at Facebook include user behaviour, content trends, human resources and even graphics for the IPO prospectus. Now, four R users at Facebook (Moira Burke, Chris Saden, Dean Eckles and Solomon Messing) share their experiences using R at Facebook in a new Udacity on-line course, Exploratory Data Analysis.

The more data you explore, the better data explorer you will be!


I first saw this in a post by David Smith.

Findability and Exploration:…

Monday, February 24th, 2014

Findability and Exploration: the future of search by Stijn Debrouwere.

From the introduction:

The majority of people visiting a news website don’t care about the front page. They might have reached your site from Google while searching for a very specific topic. They might just be wandering around. Or they’re visiting your site because they’re interested in one specific event that you cover. This is big. It changes the way we should think about news websites.

We need ambient findability. We need smart ways of guiding people towards the content they’d like to see — with categorization and search playing complementary goals. And we need smart ways to keep readers on our site, especially if they’re just following a link from Google or Facebook, by prickling their sense of exploration.

Pete Bell recently opined that search is the enemy of information architecture. That’s too bad, because we’re really going to need great search if we’re to beat Wikipedia at its own game: providing readers with timely information about topics they care about.

First, we need to understand a bit more about search. What is search?

A classic (2010) statement of the requirements for a “killer” app. I didn’t say “search” app because search might not be a major aspect of its success. At least if you measure success in terms of user satisfaction after using an app.

A satisfaction that comes from obtaining the content they want to see. How they got there isn’t important to them.

Practical tools for exploring data and models

Wednesday, April 17th, 2013

Practical tools for exploring data and models by Hadley Alexander Wickham. (PDF)

From the introduction:

This thesis describes three families of tools for exploring data and models. It is organised in roughly the same way that you perform a data analysis. First, you get the data in a form that you can work with; Section 1.1 introduces the reshape framework for restructuring data, described fully in Chapter 2. Second, you plot the data to get a feel for what is going on; Section 1.2 introduces the layered grammar of graphics, described in Chapter 3. Third, you iterate between graphics and models to build a succinct quantitative summary of the data; Section 1.3 introduces strategies for visualising models, discussed in Chapter 4. Finally, you look back at what you have done, and contemplate what tools you need to do better in the future; Chapter 5 summarises the impact of my work and my plans for the future.

The tools developed in this thesis are firmly based in the philosophy of exploratory data analysis (Tukey, 1977). With every view of the data, we strive to be both curious and sceptical. We keep an open mind towards alternative explanations, never believing we
have found the best model. Due to space limitations, the following papers only give a glimpse at this philosophy of data analysis, but it underlies all of the tools and strategies that are developed. A fuller data analysis, using many of the tools developed in this thesis, is available in Hobbs et al. (To appear).

Has a focus on R tools, including ggplot2 and Wilkinson’s The Grammar of Graphics.

The “…never believing we have found the best model” approach works for me!


I first saw this at Data Scholars.

“Verdict First, Then The Trial”

Sunday, April 15th, 2012

No, not the Trevon Martin case but rather the lack of “exploratory data analysis” in business environments.

From Business Intelligence Ain’t Over Until Exploratory Data Analysis Sings, where Wayne Kernochan reviews the rise of statistical analysis in businesses and then says:

And yet there is a glaring gap in this picture – or at least a gap that should be glaring. This gap might be summed up as Alice in Wonderland’s “verdict first, then the trial.” Both the business and the researcher start with their own narrow picture of what the customer or research subject should look like, and the analytics and statistics that accompany such hypotheses are designed to narrow in on a solution rather than expand due to unexpected data. Thus, the business/researcher is likely to miss key customer insights, psychological and otherwise.

Pile on top of this the “not invented here” syndrome characteristic of most enterprises, and the “confirmation bias” that recent research has shown to be prevalent among individuals and organizations, and you have a real analytical problem on your hands. (emphasis added)

I don’t know if I would call it “a real analytical problem” so much as I would call it “business as usual.”

There may be a real coming shortage of people who can turn the crank to make the usual analysis come out the other end.

Can you imagine the shortage of people who possess the analytical skills and initiative to do more than the usual analysis?

The ability to recognize when two or more departments have different vocabularies for the same things is one indicator of possible analytical talent.

What are some others? (Thinking you can also use these to find topic map authors for your business/organization.)