Warm tip: This article is reproduced from stackoverflow.com, please click
matrix numpy python

Understanding the logic behind numpy code for Moore-Penrose inverse

发布于 2020-04-10 16:17:31

I was going through the book called Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow and the author was explaining how the pseudo-inverse (Moore-Penrose inverse) of a matrix is calculated in the context of Linear Regression. I'm quoting verbatim here:

The pseudoinverse itself is computed using a standard matrix factorization technique called Singular Value Decomposition (SVD) that can decompose the training set matrix X into the matrix multiplication of three matrices U Σ VT (see numpy.linalg.svd()). The pseudoinverse is calculated as X+ = V * Σ+ * UT. To compute the matrix Σ+, the algorithm takes Σ and sets to zero all values smaller than a tiny threshold value, then it replaces all nonzero values with their inverse, and finally it transposes the resulting matrix. This approach is more efficient than computing the Normal equation.

I've got an understanding of how the pseudo-inverse and SVD are related from this post. But I'm not able to grasp the rationale behind setting all values less than the threshold to zero. The inverse of a diagonal matrix is obtained by taking the reciprocals of the diagonal elements. Then small values would be converted to large values in the inverse matrix, right? Then why are we removing the large values?

I went and looked into the numpy code, and it looks like follows, just for reference:

@array_function_dispatch(_pinv_dispatcher)
def pinv(a, rcond=1e-15, hermitian=False):
a, wrap = _makearray(a)
    rcond = asarray(rcond)
    if _is_empty_2d(a):
        m, n = a.shape[-2:]
        res = empty(a.shape[:-2] + (n, m), dtype=a.dtype)
        return wrap(res)
    a = a.conjugate()
    u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)

    # discard small singular values
    cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True)
    large = s > cutoff
    s = divide(1, s, where=large, out=s)
    s[~large] = 0

    res = matmul(transpose(vt), multiply(s[..., newaxis], transpose(u)))
    return wrap(res)
Questioner
Bitswazsky
Viewed
75
senderle 2020-02-03 23:06

It's almost certainly an adjustment for numerical error. To see why this might be necessary, look what happens when you take the svd of a rank-one 2x2 matrix. We can create a rank-one matrix by taking the outer product of a vector like so:

>>> a = numpy.arange(2) + 1
>>> A = a[:, None] * a[None, :]
>>> A
array([[1, 2],
       [2, 4]])

Although this is a 2x2 matrix, it only has one linearly independent column, and so its rank is one instead of two. So we should expect that when we pass it to svd, one of the singular values will be zero. But look what happens:

>>> U, s, V = numpy.linalg.svd(A)
>>> s
array([5.00000000e+00, 1.98602732e-16])

What we actually get is a singular value that is not quite zero. This result is inevitable in many cases given that we are working with finite-precision floating point numbers. So although the problem you have identified is a real one, we will not be able to tell in practice the difference between a matrix that really has a very small singular value and a matrix that ought to have a zero singular value but doesn't. Setting small values to zero is the safest practical way to handle that problem.