(no title)
jstepien | 14 years ago
I've used methods known as collaborative filtration, whose goal was to estimate how a given user would rate a given item basing on knowledge of preferences of other users of similar interests. The initial scope included a naïve Bayesian classifier and a technique called Slope One [1]. The latter one is particularly interesting as according to claims of its authors allows to make a very good estimation in a very short time using solely a very simple linear model. The preprocessing is both time- and space-wise expensive though as it requires you to build a matrix of deviations between rated items.
After reducing the data set to a single subreddit and filtering it from users who weren't avid voters I ran the algorithms and after some tuning I was very content to see promising ROC curves and decent AUC values. Models built around NBC and S1 achieved comparable results when it came to such metrics as precision, recall and F-measure.
When I went to discuss the results with the professor teaching the class I've heard "That's indeed promising, but how about comparing those results with a really naïve model which would just take an average of existing votes by a given user?". Guess what: the model built solely using a single call to the avg function was nearly as good as the NBC and S1 models.
Now I understand why the guys from Reddit are looking for external help with the recommender. It's a way less obvious task than it might seem to be.
[1] http://lemire.me/fr/documents/publications/lemiremaclachlan_...
Edit: s/machine learning/data mining/
_dps|14 years ago
1) Say you estimate (as you propose) that a user will always give their average rating. This might get you good-ish error and ROC as a prediction task, but will give zero recommendation value because the prediction for a given user will be constant for all possible recommendations.
2) Say you estimate that a user will give the average score that the item has received across all users. Again, possibly good-ish in terms of prediction ROC and RMS error, but this offers no personalization (all users get the same predictions, i.e. you're basically just showing the default Reddit ranking).
Both of these baselines are vastly inferior to even really stupid models like "how many times have I upvoted stories from this submitter" in terms of recommendation value, but the latter is (if I recall from my own experiments) much worse when evaluated on the basis of overall ROC.
I would strongly suspect that a correctly implemented NB or S1 would vastly outperform either of the two baselines in terms of actual recommendation utility (even though when you look at the baseline's ability to predict actual numbers, they might be comparably good in an RMS sense).
The moral of the story: one must be very careful when trying to quantify the performance of learning systems; actual utility is often difficult to evaluate merely by looking at standard statistical measures of accuracy.
jstepien|14 years ago
greendestiny|14 years ago
Turns out increasing the dimensionality of the input 17 thousand times just reduces the amount training data for each attribute. Duh :)