将S
和V
相乘,就是使用SVD/LSA执行降维所必须做的操作。
>>> C = np.array([[1, 0, 1, 0, 0, 0],
... [0, 1, 0, 0, 0, 0],
... [1, 1, 0, 0, 0, 0],
... [1, 0, 0, 1, 1, 0],
... [0, 0, 0, 1, 0, 1]])
>>> from scipy.linalg import svd
>>> U, s, VT = svd(C, full_matrices=False)
>>> s[2:] = 0
>>> np.dot(np.diag(s), VT)
array([[ 1.61889806, 0.60487661, 0.44034748, 0.96569316, 0.70302032,
0.26267284],
[-0.45671719, -0.84256593, -0.29617436, 0.99731918, 0.35057241,
0.64674677],
[ 0. , 0. , 0. , 0. , 0. ,
0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. ],
[ 0. , 0. , 0. , 0. , 0. ,
0. ]])
这将得到一个矩阵,其中除了最后几行以外都是零,因此它们可以被删除,在实际应用中,这就是您要使用的矩阵:
>>> np.dot(np.diag(s[:2]), VT[:2])
array([[ 1.61889806, 0.60487661, 0.44034748, 0.96569316, 0.70302032,
0.26267284],
[-0.45671719, -0.84256593, -0.29617436, 0.99731918, 0.35057241,
0.64674677]])
第10页上PDF所描述的是获取输入C
的低秩重建配方。 秩!=维数,重建矩阵的巨大尺寸和密度使其在LSA中不实用;它的主要目的是数学上的。您可以使用它来检查各个k
值的重构效果如何:
>>> U, s, VT = svd(C, full_matrices=False)
>>> C2 = np.dot(U[:, :2], np.dot(np.diag(s[:2]), VT[:2]))
>>> from scipy.spatial.distance import euclidean
>>> euclidean(C2.ravel(), C.ravel())
1.6677932876555255
>>> C3 = np.dot(U[:, :3], np.dot(np.diag(s[:3]), VT[:3]))
>>> euclidean(C3.ravel(), C.ravel())
1.0747879905228703
对scikit-learn的TruncatedSVD
进行健全性检查(全面披露:本人撰写了该程序):
>>> from sklearn.decomposition import TruncatedSVD
>>> TruncatedSVD(n_components=2).fit_transform(C.T)
array([[ 1.61889806, -0.45671719],
[ 0.60487661, -0.84256593],
[ 0.44034748, -0.29617436],
[ 0.96569316, 0.99731918],
[ 0.70302032, 0.35057241],
[ 0.26267284, 0.64674677]])