I was going through the book called Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow and the author was explaining how the pseudo-inverse (Moore-Penrose inverse) of a matrix is calculated in the context of Linear Regression. I'm quoting verbatim here:
The pseudoinverse itself is computed using a standard matrix factorization technique called Singular Value Decomposition (SVD) that can decompose the training set matrix X into the matrix multiplication of three matrices U Σ VT (see numpy.linalg.svd()). The pseudoinverse is calculated as X+ = V * Σ+ * UT. To compute the matrix Σ+, the algorithm takes Σ and sets to zero all values smaller than a tiny threshold value, then it replaces all nonzero values with their inverse, and finally it transposes the resulting matrix. This approach is more efficient than computing the Normal equation.
I've got an understanding of how the pseudo-inverse and SVD are related from this post. But I'm not able to grasp the rationale behind setting all values less than the threshold to zero. The inverse of a diagonal matrix is obtained by taking the reciprocals of the diagonal elements. Then small values would be converted to large values in the inverse matrix, right? Then why are we removing the large values?
I went and looked into the numpy code, and it looks like follows, just for reference:
@array_function_dispatch(_pinv_dispatcher) def pinv(a, rcond=1e-15, hermitian=False): a, wrap = _makearray(a) rcond = asarray(rcond) if _is_empty_2d(a): m, n = a.shape[-2:] res = empty(a.shape[:-2] + (n, m), dtype=a.dtype) return wrap(res) a = a.conjugate() u, s, vt = svd(a, full_matrices=False, hermitian=hermitian) # discard small singular values cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True) large = s > cutoff s = divide(1, s, where=large, out=s) s[~large] = 0 res = matmul(transpose(vt), multiply(s[..., newaxis], transpose(u))) return wrap(res)