Warm tip: This article is reproduced from stackoverflow.com, please click

发布于 2020-04-10 16:17:31

I was going through the book called *Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow* and the author was explaining how the pseudo-inverse (Moore-Penrose inverse) of a matrix is calculated in the context of Linear Regression. I'm quoting verbatim here:

The pseudoinverse itself is computed using a standard matrix factorization technique called Singular Value Decomposition (SVD) that can decompose the training set matrix

Xinto the matrix multiplication of three matricesUΣVT (see numpy.linalg.svd()). The pseudoinverse is calculated asX+ =V*Σ+ *UT. To compute the matrixΣ+, the algorithm takesΣand sets to zero all values smaller than a tiny threshold value, then it replaces all nonzero values with their inverse, and finally it transposes the resulting matrix. This approach is more efficient than computing the Normal equation.

I've got an understanding of how the pseudo-inverse and SVD are related from this post. But I'm not able to grasp the rationale behind setting all values less than the threshold to zero. The inverse of a diagonal matrix is obtained by taking the reciprocals of the diagonal elements. Then small values would be converted to large values in the inverse matrix, right? Then why are we removing the large values?

I went and looked into the numpy code, and it looks like follows, just for reference:

```
@array_function_dispatch(_pinv_dispatcher)
def pinv(a, rcond=1e-15, hermitian=False):
a, wrap = _makearray(a)
rcond = asarray(rcond)
if _is_empty_2d(a):
m, n = a.shape[-2:]
res = empty(a.shape[:-2] + (n, m), dtype=a.dtype)
return wrap(res)
a = a.conjugate()
u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)
# discard small singular values
cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True)
large = s > cutoff
s = divide(1, s, where=large, out=s)
s[~large] = 0
res = matmul(transpose(vt), multiply(s[..., newaxis], transpose(u)))
return wrap(res)
```

Questioner

Bitswazsky

Viewed

74

It's almost certainly an adjustment for numerical error. To see why this might be necessary, look what happens when you take the `svd`

of a rank-one 2x2 matrix. We can create a rank-one matrix by taking the outer product of a vector like so:

```
>>> a = numpy.arange(2) + 1
>>> A = a[:, None] * a[None, :]
>>> A
array([[1, 2],
[2, 4]])
```

Although this is a 2x2 matrix, it only has one linearly independent column, and so its rank is one instead of two. So we should expect that when we pass it to `svd`

, one of the singular values will be zero. But look what happens:

```
>>> U, s, V = numpy.linalg.svd(A)
>>> s
array([5.00000000e+00, 1.98602732e-16])
```

What we actually get is a singular value that is *not quite* zero. This result is inevitable in many cases given that we are working with finite-precision floating point numbers. So although the problem you have identified is a real one, we will not be able to tell in practice the difference between a matrix that really has a very small singular value and a matrix that ought to have a zero singular value but doesn't. Setting small values to zero is the safest practical way to handle that problem.