/ svd

Benchmarking the Singular Value Decomposition

The Singular Value Decomposition (SVD) is arguably the most useful matrix decomposition there is. It has applications in image compression, recommender systems, text analysis, feature extraction and graph clustering to name just a few. The SVD is also a computationally expensive procedure and there is an increasing demand to solve it for large matrices. To help practitioners make the best choice, we benchmark different approaches to solving the SVD on large sparse matrices in this article.

What is the SVD?

First, let's define the SVD. For a matrix \(\textbf{A}\), it is the decomposition as follows:

$$ \textbf{A}= \textbf{P} \Sigma \textbf{Q}^T$$

where the columns of \(\textbf{P}\) and \(\textbf{Q}\) are matrices whose columns are called the left and right singular vectors, and \(\Sigma\) is a diagonal matrix of \(r\) singular values denoted by \(\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r\). The columns of \(\textbf{P}\) and \(\textbf{Q}\) are mutually orthogonal so that for example the \(i\)th and \(j\)th columns of \(\textbf{P}\) are perpendicular when \(i \neq j\). In essence the SVD is a way of expressing \(\textbf{A}\) as a set of unique factors, and these factors tell us something interesting and useful about \(\textbf{A}\).

Note that the singular values are ordered, and we get a good approximation of \(\textbf{A}\) using the largest \(k\) singular values, for some user-defined \(k\), setting the remaining ones to zero. We will call this approximation the rank-\(k\) SVD of \(\textbf{A}\), and many applications of the SVD use this truncated decomposition. For example, to generate \(k\) features using PCA we need the rank-\(k\) SVD of the matrix of examples \(\textbf{X}\). In some cases, using the first \(k\) singular vectors, with \(k\) much smaller than \(r\), gives us a very good approximation of \(\textbf{A}\).

Benchmarking

Now we will introduce the benchmarking procedure. The general idea is to generate random sparse matrices and then compare the times of various approaches to computing the SVD. The SVD algorithms we compare are:

  • The scipy.sparse.linalg.svds function which uses ARPACK.
  • PyPropack which is a Python wrapper for PROPACK.
  • SparseSVD which wraps SVDLIBC.
  • TruncatedSVD in scikit-learn which is an approximate approach which uses random projections [^n].
  • The SpPy function sppy.linalg.rsvd which uses a parallel random projection algorithm in conjunction with sparse matrices based on Eigen. Disclaimer: I wrote this library.

The results are given for my laptop which runs Linux Mint 17.3, has a Intel core i7-2670-QM with 8 cores and 8 GiB RAM. For the first benchmark we compute timings for random sparse matrices whilst varying the density of non-zero elements. The density is the proportion of non-zero elements, so that a density of 0.1 for example has 10% of the elements as non-zero. Here is the corresponding code:

n = 10**4
densities = var_range * 2 * 10**-3
times = numpy.zeros((5, densities.shape[0]))

for i, density in enumerate(densities):
    # Generate random sparse matrix
    inds = numpy.random.randint(n, size=(2, n * n * density))
    data = numpy.random.rand(n * n * density)
    A = scipy.sparse.csc_matrix((data, inds), (n, n))
    A_sppy = sppy.csarray(A, storagetype="row")
    L = GeneralLinearOperator.asLinearOperator(A_sppy, parallel=True)

    # Record timings
    times[0, i] = time_reps(svds, (A, k), reps)
    times[1, i] = time_reps(svdp, (A, k), reps)
    # times[2, i] = time_reps(sparsesvd, (A, k), reps)
    times[3, i] = time_reps(truncated_svd.fit, (A,), reps)
    times[4, i] = time_reps(sppy.linalg.rsvd, (L, k, p, n_iter), reps)

In the for loop we create a sparse matrix of size n by n with a given density of non-zeros. This is then used to create a csc_matrix (column major sparse matrix) and also a GeneralLinearOperator which is used by sppy. The second half of the for loop computes timings for each of the SVD approaches, storing them in the times array. We compute 50 singular vectors with each algorithm. The time_reps function simply calls the input function with given parameters and repeats it reps times, returning the average of the results.

Here is the result of that experiment:

time_densities

It is fairly clear that ARPACK is the slowest algorithm followed by PROPACK. The Randomised SVD (RSVD) methods are faster since they are approximations. In this case sppy RSVD is faster for denser matrices as it can make use of parallel processing. Note that the results for SparseSVD are missing since it took an incredible amount of time (165s versus 3.5s for ARPACK for a density of 0.002), and it we excluded it from the latter plots. PROPACK used all the CPU cores which may in part explain its speed.

Next let us look at the the timings as the number of singular vectors k varies.

time_ks

Note that we fixed n = 10**4 and density = 10**-3. Again we see that ARPACK is the slowest approach taking approximately 22.5s versus 6.5s for PROPACK when k = 200. The scikit-learn RSVD approach is faster than the sppy equivalent, probably due to the low density of non-zero elements.

Finally, let's fix density = 10**-3 and k = 50 and vary the size of the matrices n.

time_ns

Again we observe similar trends with the exact approaches: PROPACK gives a huge advantage relative to ARPACK. In addition sppy is faster than the scikit-learn approach. It is worth noting that the parallel sppy approach doesn't give anywhere near the 8 times speedup corresponding to the number of CPU cores. This is due to overheads in moving and storing data and splitting the original matrix into chunks for computation on each core.

The complete code for these benchmarks is given here.

Improvements

How can these benchmarks be improved? One way is by performing tuning of the parameters to optimise them for the particular matrices. One could also measure the memory consumption of the algorithms which becomes important for very large matrices. Furthermore, we worked with square matrices only and it would be interesting to compare results on rectangular ones. Finally, we could have also tested the distributed Apache Spark SVD algorithm although there is no Python implementation at the time of writing.

Summary

The Singular Value Decomposition is an incredibly important matrix factorisation algorithm. In this article we benchmarked 5 algorithms for computing the SVD on sparse matrices. The results show that PROPACK is significantly faster than the ARPACK approach used in scipy when varying matrix size, density and the number of singular vectors. Methods based on random projections are faster still, with sppy providing best results on large matrices that are not too sparse.

Footnotes

Image credit: Nevit Dilmen (cropped and resized)