/ recommendation

Generating Recommendations

Certain web applications seem to know us better than we know ourselves. For example, have you ever wondered how Amazon generates its "Customers Who Bought This Item Also Bought", and Google Play lists similar applications in its store? Online shops, blogs, and social networks are in the business of recommending us similar items to purchase or read and people to connect to. At the core of these applications lies a set of recommendation algorithms [^n][^n] which find items which are similar in some sense. This article describes some of these algorithms and gives a practical example on movie recommendation in Python.

Problem Statement

To begin, we need to be clear about the problem recommendation seeks to solve. Let's use the example of movie recommendation in which there are m users and n movies in the dataset. Each user has watched some movies and rated them from 1-5, with 5 being the top rating and 1 being the worst. For a particular user, let's call her Alice, the problem of recommendation is to find a movie that Alice will like given the movies she has previously seen. More generally we can say that there are p movies that Alice has not seen and we would like to predict her rating for each one, and then select the best few to recommend to her. In this sense recommendation finds a ranking of items to recommend to the user, and typically only a few are presented to her. Of course, the predicted ratings should be close to the actual ratings. The challenge in this problem is that a typical user had only watched a tiny fraction of the total number of movies. In a more general sense the ratings can refer to anything such as books, music, news articles.

Recommender systems can be split into three general categories:

  1. Content-based: the user will be recommended items similar to the ones he/she preferred in the past. The notion of similarity is based on the attributes of the items.
  2. Collaborative recommendations: the user will be recommended items that people with similar tastes have liked in the past
  3. Hybrid approaches: combines the above two strategies.

Note that often there is "side information" pertaining to users and items, for example demographic information about users. Some recommendation algorithms use this information to improve results.

MovieLens Recommendations

Here we give some practical examples of recommendation using the MovieLens dataset. First, download the MovieLens 100K Dataset (ml-100k.zip). It contains 100,000 movie ratings (1-5) from 943 users on 1682 movies.

The dataset is split into training/test sets, for example ua.base/ua.test split is sampled so that the test set contains 10 items per user. We load this dataset as follows using the pandas read_csv function and put the ratings into a sparse matrix:

data_shape = (943, 1682)

df = pandas.read_csv(data_dir + "ua.base", sep="\t", header=-1)
values = df.values
values[:, 0:2] -= 1
X_train = scipy.sparse.csr_matrix((values[:, 2], (values[:, 0], values[:, 1])), dtype=numpy.float, shape=data_shape)

Note that the user and item indices are 1-indexed and so we adjust them for Python's zero-indexing. We then create a scipy.sparse matrix using the ratings. An almost identical process is used to read the test ratings in ua.test.

Before we generate recommendations we first need to preproces the data to improve results. A useful way of doing so is to take the mean rating of each user, and subtracting this from the corresponding user's ratings. In this way any bias in the way that users rate movies is removed. The way to achieve this is slightly laborious, since sparse matrices generally do not have operations that only act on their nonzero elements. For example the mean() method acts on all elements. The following code achieves this preprocessing step:

# Iterate through nonzero elements to compute sums and counts of rows elements
for i in range(train_rows.shape[0]):
    X_row_mean[train_rows[i]] += X_train[train_rows[i], train_cols[i]]
    X_row_sum[train_rows[i]] += 1

# Note that (X_row_sum == 0) is required to prevent divide by zero
X_row_mean /= X_row_sum + (X_row_sum == 0)

# Subtract mean rating for each user
for i in range(train_rows.shape[0]):
    X_train[train_rows[i], train_cols[i]] -= X_row_mean[train_rows[i]]

test_rows, test_cols = X_test.nonzero()
for i in range(test_rows.shape[0]):
    X_test[test_rows[i], test_cols[i]] -= X_row_mean[test_rows[i]]

Feel free to skip resding this code block as it is not crucial to understanding the recommendation process.

Next comes the recommendation, and we use the Singular Value Decomposition (SVD). The SVD is a matrix decomposition such that a matrix X can be written as U S V for particular matrices U, diagonal S and V. We say that we have a rank-k SVD if U, S and V are restricted to k dimensions. Here is the line which computes the SVD of the training matrix:

U, s, Vt = numpy.linalg.svd(X_train, full_matrices=False)

The matrix Vt in this case is simply the transpose of V (i.e. its rows and columns are swapped). Next, we compute the approximated matrix of ratings from this decomposition. The rank of the decompositions are given by array ks

for j, k in enumerate(ks):
    X_pred = U[:, 0:k].dot(numpy.diag(s[0:k])).dot(Vt[0:k, :])

    pred_train_scores = X_pred[(train_rows, train_cols)]
    pred_test_scores = X_pred[(test_rows, test_cols)]

    train_mae[j] = mean_absolute_error(train_scores, pred_train_scores)
    test_mae[j] = mean_absolute_error(test_scores, pred_test_scores)

    print(k,  train_mae[j], test_mae[j])

The first line in the for loop computes the the rank-k SVD approximation of the training matrix. In the following two lines we compute the predicted scores using X_pred matrix and the nonzero elements in the training and testing data. Finally, the Mean Absolute Error (MAE) is computed between the real ratings and predicted ones. The MAE is simply the mean of the absolute differences between predicted and real ratings. The plot below shows how these error vary with the rank k.

Plot of rank vs MAE

We can see that the training error consistently drops with the rank, which is to be expected. With a large enough k the training MAE will drop to zero. The test error is lowest at rank 17 with an MAE of 0.792. This is not a bad result, however there are many other methods that can improve upon it. For example, see the results with MyMediaLite although they are not directly comparable with the figures above.

The full code for this example is available here.

Summary

We introduced the problem of recommendation which is ubiquitous in suggesting items, articles and people in online stores, blogs and social networks. Item ratings can be predicted by matrix factorisation using the SVD and we gave a simple example of this with movie ratings using the MovieLens dataset.

Footnotes