Archive for the ‘Neighborhood’ Category

Probabilistic User Modeling in the Presence of Drifting Concepts

Saturday, December 4th, 2010

Probabilistic User Modeling in the Presence of Drifting Concepts Authors(s): Vikas Bhardwaj, Ramaswamy Devarajan

Abstract:

We investigate supervised prediction tasks which involve multiple agents over time, in the presence of drifting concepts. The motivation behind choosing the topic is that such tasks arise in many domains which require predicting human actions. An example of such a task is recommender systems, where it is required to predict the future ratings, given features describing items and context along with the previous ratings assigned by the users. In such a system, the relationships among the features and the class values can vary over time. A common challenge to learners in such a setting is that this variation can occur both across time for a given agent, and also across different agents, (i.e. each agent behaves differently). Furthermore, the factors causing this variation are often hidden. We explore probabilistic models suitable for this setting, along with efficient algorithms to learn the model structure. Our experiments use the Netflix Prize dataset, a real world dataset which shows the presence of time variant concepts. The results show that the approaches we describe are more accurate than alternative approaches, especially when there is a large variation among agents. All the data and source code would be made open-source under the GNU GPL.

Interesting because not only do concepts drift from user to user but modeling users as existing in neighborhoods of other users was more accurate than purely homogeneous or heterogeneous models.

Questions:

  1. If there is a “neighborhood” effect on users, what, if anything does that imply for co-occurrence of terms? (3-5 pages, no citations)
  2. How would you determine “neighborhood” boundaries for terms? (3-5 pages, citations)
  3. Do “neighborhoods” for terms vary by semantic domains? (3-5 pages, citations)

*****
Be aware that the Netflix dataset is no longer available. Possibly in response to privacy concerns. A demonstration of the utility of such concerns and their advocates.