Archive for the ‘Personalization’ Category

The Power of Big Data and Psychographics [Fact Checking]

Thursday, February 2nd, 2017

From the description:

In a 10 minute presentation at the 2016 Concordia Summit, Mr. Alexander Nix discusses the power of big data in global elections. Cambridge Analytica’s revolutionary approach to audience targeting, data modeling, and psychographic profiling has made them a leader in behavioral microtargeting for election processes around the world.

A highly entertaining but deceptive presentation on the state of the art for marketing political candidates.

Nix claims that most marketing companies base their advertising on demographics and geographics, sending the same message to all women, all African-Americans, etc.

Worse than a “straw man,” that’s simply false. If you know the work Selling Blue Elephants by Howard Moskowitz and Alex Gofman, then you know that marketers tweak their pitches to very small market slices.

But you don’t need to find a copy of Selling Blue Elephants or take my word for that. On your next visit to the grocery store see for yourself how many variations of a popular shampoo or spaghetti sauce are offered. Each one is calculated to attract a particular niche of the overall market.

Nix goes on to describe advertising in the 1960’s as “top down,” “hope messages resonant,” etc.

Not only is that another false claim, but the application described by Nix was pioneered for the 1960 presidential campaign.

Ithiel de Sola Pool, with others, developed the Simulmatics program for the computation of a great variety of factors thought to influence voting, for specific use in the 1960 presidential election. A multitude of influences can be introduced into the program, together with modifications of a strategic nature, and the results bear on both prediction and choice of strategy, much in the manner that elaborate market research influences business decision on manufacture and sale of a new product. The Simulmatics project assembled a basic matrix of voter types and “issue clusters” (480 of the former and 52 of the latter, making a total of 24,960 cells), consolidating as values the accumulated archives of polling on all kinds of questions. The records of the Roper Public Opinion Research Center at Williamstwon were used as source material. With no data later than 1958, the simulation achieved a correlation by states of .82 with the actual Kennedy vote.

(“The Mathematical Approach to Political Science” by Oliver Benson, in Contemporary Political Analysis, edited by James C. Charlesworth, The Free Press, 1967, at pp. 129-130)

I’ll grant that Nix has more data at his disposal and techniques have changed in the last fifty-seven (57) years, but there’s no legitimate reason to not credit prior researchers in the field.

PS: If you find a hard (or scanned) copy of The Simulmatics Project by Ithiel de Sola Pool, let me know.

Israel, Gaza, War & Data…

Wednesday, August 6th, 2014

Israel, Gaza, War & Data – social networks and the art of personalizing propaganda by Gilad Lotan.

From the post:

It’s hard to shake away the utterly depressing feeling that comes with news coverage these days. IDF and Hamas are at it again, a vicious cycle of violence, but this time it feels much more intense. While war rages on the ground in Gaza and across Israeli skies, there’s an all-out information war unraveling in social networked spaces.

Not only is there much more media produced, but it is coming at us at a faster pace, from many more sources. As we construct our online profiles based on what we already know, what we’re interested in, and what we’re recommended, social networks are perfectly designed to reinforce our existing beliefs. Personalized spaces, optimized for engagement, prioritize content that is likely to generate more traffic; the more we click, share, like, the higher engagement tracked on the service. Content that makes us uncomfortable, is filtered out.

You are familiar with the “oooh” and “aaah” social network graphs. Interesting but too dense in most cases to be useful.

The first thing you will notice about Gilad’s post is that he is making effective use of fairly dense social network graphs. The second thing you will notice is the post is one of the relatively few that can be considered sane on the topic of Israel and Gaza. It is worth reading for its sanity if nothing else.

Gilad argues algorithms are creating information cocoons about us “…where never is heard a discouraging word…” or at least any that we would find disagreeable.

Social network graphs are used to demonstrate such information cocoons for the IDF and Hamas and to show possible nodes that may be shared by those cocoons.

I encourage you to read Gilad’s post as an illustration of good use of social network graphics, an interesting analysis of bridging information cocoons and a demonstration that relatively even-handed reporting remains possible.

I first saw this in a tweet by Wandora which read: “Thinking of #topicmaps and #LOD.”

Realtime Personalization/Recommendataion

Friday, May 30th, 2014

Realtime personalization and recommendation with stream mining by Mikio L. Braun.

From the post:

Last Tuesday, I gave a talk at this year’s Berlin Buzzword conference on using stream mining algorithms to efficiently store information extracted from user behavior to perform personalization and recommendation effectively already using a single computer, which is of course key behind streamdrill.

If you’ve been following my talks, you’ll probably recognize a lot of stuff I’ve talked about before, but what is new in this talk is that I tried to take the next step from simply talking about Heavy Hitters and Count- Min Sketches to using these data structures as an approximate storage for all kinds of analytics related data like counts, profiles, or even sparse matrices, as they occur recommendations algorithms.

I think reformulating our approach as basically an efficient approximate data structure also helped to steer the discussion away from comparing streamdrill to other big data frameworks (“Can’t you just do that in Storm?” — “define ‘just’”). As I said in the talk, the question is not whether you can do it in Big Data Framework X, because you probably could. I have started look at it from the other direction: we did not use any Big Data framework and were still able to achieve some serious performance numbers.

Slides and video are available at this page.

Big Data: Main Research/Business Challenges Ahead?

Wednesday, November 20th, 2013

Big Data Analytics at Thomson Reuters. Interview with Jochen L. Leidner by Roberto V. Zicari.

In case you don’t know, Jochen L. Leidner has the title: “Lead Scientist, of the London R&D at Thomson Reuters.”

Which goes a long way to explaining the importance of this Q&A exchange:

Q12 What are the main research challenges ahead? And what are the main business challenges ahead?

Jochen L. Leidner: Some of the main business challenges are the cost pressure that some of our customers face, and the increasing availability of low-cost or free-of-charge information sources, i.e. the commoditization of information. I would caution here that whereas the amount of information available for free is large, this in itself does not help you if you have a particular problem and cannot find the information that helps you solve it, either because the solution is not there despite the size, or because it is there but findability is low. Further challenges include information integration, making systems ever more adaptive, but only to the extent it is useful, or supporting better personalization. Having said this sometimes systems need to be run in a non-personalized mode (e.g. in the field of e-discovery, you need to have a certain consistency, namely that the same legal search systems retrieves the same things today and tomorrow, and to different parties.

How are you planning to address:

  1. The required information is not available in the system. A semantic 404 as it were. To distinguish the case of its there but wrong search terms in use.
  2. Low findability.
  3. Information integration (not normalization)
  4. System adaptability/personalization, but to users and not developers.
  5. Search consistency, same result tomorrow as today.


The rest of the interview is more than worth your time.

I singled out the research/business challenges as a possible map forward.

We all know where we have been.

The Filter Bubble: Algorithm vs. Curator & the Value of Serendipity

Monday, May 16th, 2011

The Filter Bubble: Algorithm vs. Curator & the Value of Serendipity by Maria Popova.

Covers the same TED presentation that I mention at On the dangers of personalization but with the value-add that Maria both interviews Eli Pariser and talks about his new book, The Filter Bubble.

I remain untroubled by filtering.

We filter the information we give others around us.

Advertisers filter the information they present in commercials.

For example, I don’t recall any Toyota ads that end with: Buy a Toyota ****, your odds of being in a recall are 1 in ***. That’s filtering.

Two things would increase my appreciation for Google filtering:

First, much better filtering, where I can choose narrow-band filter(s) based on my interests.

Second, the ability to turn the filters off at my option.

You see, I don’t agree that there is information I need to know as determined by someone else.

Here’s an interesting question: What information would you filter from:

On the dangers of personalization

Saturday, May 7th, 2011

On the dangers of personalization

From the post:

We’re getting our search results seriously edited and, I bet, most of us don’t even know it. I didn’t. One Google engineer says that their search engine uses 57 signals to personalize your search results, even when you’re logged out.

Do we really want to live in a web bubble?

What I find interesting about this piece is that it describes a data silo but from the perspective of an individual.

Think about it.

A data silo is based on data that is filtered and stored.

Personalization is based on data that is filtered and presented.

Do you see any difference?