我有一个函数用于计算向量x到均值的平方马氏距离:
def mahalanobis_sqdist(x, mean, Sigma):
'''
Calculates squared Mahalanobis Distance of vector x
to distibutions' mean
'''
Sigma_inv = np.linalg.inv(Sigma)
xdiff = x - mean
sqmdist = np.dot(np.dot(xdiff, Sigma_inv), xdiff)
return sqmdist
我有一个形状为(25, 4)
的numpy数组。因此,我想在不使用for循环的情况下将该函数应用于数组的所有25行。那么,基本上,我该如何编写这个循环的向量化形式:
for r in d1:
mahalanobis_sqdist(r[0:4], mean1, Sig1)
其中 mean1
和 Sig1
是:
>>> mean1
array([ 5.028, 3.48 , 1.46 , 0.248])
>>> Sig1 = np.cov(d1[0:25, 0:4].T)
>>> Sig1
array([[ 0.16043333, 0.11808333, 0.02408333, 0.01943333],
[ 0.11808333, 0.13583333, 0.00625 , 0.02225 ],
[ 0.02408333, 0.00625 , 0.03916667, 0.00658333],
[ 0.01943333, 0.02225 , 0.00658333, 0.01093333]])
我已经尝试了以下方法,但没有成功:
>>> vecdist = np.vectorize(mahalanobis_sqdist)
>>> vecdist(d1, mean1, Sig1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1862, in __call__
theout = self.thefunc(*newargs)
File "<stdin>", line 6, in mahalanobis_sqdist
File "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 445, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
IndexError: tuple index out of range
scipy.spatial.distance
模块也可以为您完成所有这些操作。例如,代码将是cdist(d1, mean1[None], 'mahalanobis')**2
。如果mean1
不是点的实际平均值,则应单独计算协方差和逆,并进行cdist(d1, mean1[None], 'mahalanobis', VI=Sigma_inv)**2
。 - user2379410