Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 26, 2012

Simple tools for building a recommendation engine

Filed under: Dataset,R,Recommendation — Patrick Durusau @ 6:31 pm

Simple tools for building a recommendation engine by Joseph Rickert.

From the post:

Revolution’s resident economist, Saar Golde, is very fond of saying that “90% of what you might from a recommendation engine can be achieved with simple techniques”. To illustrate this point (without doing a lot of work), we downloaded the million row movie dataset from www.grouplens.org with the idea of just taking the first obvious exploratory step: finding the good movies. Three zipped up .dat files comprise this data set. The first file, ratings.dat, contains 1,000,209 records of UserID, MovieID, Rating, and Timestamp for 6,040 users rating 3,952 movies. Ratings are whole numbers on a 1 to 5 scale. The second file, users.dat, contains the UserID, Gender, Age, Occupation and Zip-code for each user. The third file, movies.dat, contains the MovieID, Title and Genre associated with each movie.

Curious, if a topic map engine performed 90% of the possible merges in a topic map, would that be enough?

Would your answer differ if the topic map had less than 10,000 topics and associations versus a topic map with 100 million topics and associations?

Would your answer differ based on a timeline of the data? Say the older the data, the less reliable the merging. Recent medical data < 1% error rate, up to ten years, ten to twenty years, <= 10% error rate, more than twenty years, best efforts. Which of course raises the question of how you would test for conformance to such requirements?

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress