如何在Python中找到两个一维数组之间的马氏距离?

5

我有两个一维数组,需要找出它们之间的马氏距离。

数组1

-0.125510275,0.067021735,0.140631825,-0.014300184,-0.122152582,0.002372072,-0.050777748,-0.106606245,0.149123222,-0.159149423,0.210138127,0.031959131,-0.068411253,-0.038253143,-0.024590122,0.101361006,-0.160774037,-0.183688596,-0.07163775,-0.096662685,-0.000117288,0.14251323,-0.030461289,-0.006710192,-0.217195332,-0.338565469,-0.030219197,-0.100772612,0.144092739,-0.092911556,-0.008420993,0.042907588,-0.212668449,-0.009366207,-7.01E-05,0.134508118,-0.015715659,-0.050884761,0.18804647,0.04946585,-0.242626131,0.099951334,0.053660966,0.275807977,0.216019884,-0.009127878,0.019819722,-0.043750495,0.12940146,-0.259942383,0.061821692,0.107142501,0.098196507,0.022301452,0.079412982,-0.131031215,-0.049483716,0.126781181,-0.195536733,0.077051811,0.061049294,-0.039563753,0.02573989,0.025330214,0.204785526,0.099218346,-0.050533134,-0.109173119,0.205652237,-0.168003649,-0.062734045,0.100320764,-0.063513778,-0.120843001,-0.223983109,0.075016715,0.481291831,0.107607022,-0.141365036,0.075003348,-0.042418435,-0.041501854,0.096700639,0.083469011,-0.033227846,-0.050748199,-0.045331556,0.065955319,0.26927036,0.082820699,-0.014033476,0.176714703,0.042264186,-0.011814327,0.041769091,-0.00132945,-0.114337325,-0.013483777,-0.111367472,-0.051828772,-0.022199111,0.030011443,0.015529033,0.171916366,-0.172722578,0.214662731,-0.0219073,-0.067695767,0.040487193,0.04814541,0.003313571,-0.01360167,0.115932293,-0.235844463,0.185181856,0.130868644,0.010789306,0.171733275,0.059378762,0.003508842,0.039326921,0.024174646,-0.195897669,-0.088932432,0.025385177,-0.134177506,0.08158315,0.049005955

还有,数组 2

-0.120652862,0.030241199,0.146165773,-0.044423241,-0.138606027,-0.048646796,-0.00780057,-0.101798892,0.185339138,-0.210505784,0.1637595,0.015000292,-0.10359703,0.102251172,-0.043159217,0.183324724,-0.171825036,-0.173819616,-0.112194099,-0.161590934,-0.002507193,0.163269699,-0.037766434,0.041060638,-0.178659558,-0.268946916,-0.055348843,-0.11808344,0.113775767,-0.073903576,-0.039505914,0.032382272,-0.159118786,0.007761603,0.057116233,0.043675732,-0.057895001,-0.104836114,0.22844176,0.055832602,-0.245030299,0.006276659,0.140012532,0.21449241,0.159539059,-0.049584024,0.016899824,-0.074179329,0.119686954,-0.242336214,-0.001390997,0.097442642,0.059720818,0.109706804,0.073196828,-0.16272822,0.022305552,0.102650747,-0.192103565,0.104134969,0.099571452,-0.101140082,-0.038911857,0.071292967,0.202927336,0.12729995,-0.047885433,-0.165100336,0.220239595,-0.19612211,-0.075948663,0.096906625,-0.07410948,-0.108219706,-0.155030385,-0.042231761,0.484629512,0.093194947,-0.105109185,0.072906494,-0.056871444,-0.057923764,0.101847053,0.092042476,-0.061295755,-0.031595342,-0.01854251,0.074671492,0.266587347,0.052284949,0.003548023,0.171518356,0.053180017,-0.022400264,0.061757766,0.038441688,-0.139473096,-0.05759665,-0.101672307,-0.074863717,-0.02349415,-0.011674869,0.010008151,0.141401738,-0.190440938,0.216421023,-0.028323224,-0.078021556,-0.011468113,0.100600921,-0.019697987,-0.014288296,0.114862509,-0.162037179,0.171686187,0.149788797,-0.01235011,0.136169329,0.008751356,0.024811052,0.003802934,0.00500867,-0.1840965,-0.086204343,0.018549766,-0.110649876,0.068768717,0.03012047

我发现Scipy已经实现了该函数。然而,我对IV的值感到困惑。我尝试了以下操作

V = np.cov(np.array([array_1, array_2]))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))

然而,我遇到了以下错误:

文件 "C:\Users\XXXXXX\AppData\Local\Continuum\anaconda3\envs\face\lib\site-packages\scipy\spatial\distance.py" 的第 1043 行,mahalanobis 函数中, m = np.dot(np.dot(delta, VI), delta)

ValueError: 形状为 (128,) 和 (2,2) 的数组无法对齐:128 (dim 0) != 2 (dim 0)

编辑:

array_1 = [-0.10577646642923355, 0.09617947787046432, 0.029290344566106796, 0.02092641592025757, -0.021434104070067406, -0.13410840928554535, 0.028282659128308296, -0.12082239985466003, 0.21936850249767303, -0.06512433290481567, 0.16812698543071747, -0.03302834928035736, -0.18088334798812866, -0.04598559811711311, -0.014739632606506348, 0.06391328573226929, -0.15650317072868347, -0.13678401708602905, 0.01166679710149765, -0.13967938721179962, 0.14632365107536316, 0.025218486785888672, 0.046839646995067596, 0.09690812975168228, -0.13414686918258667, -0.2883925437927246, -0.1435326784849167, -0.17896348237991333, 0.10746842622756958, -0.09142691642045975, 0.04860316216945648, 0.031577128916978836, -0.17280976474285126, -0.059613555669784546, -0.05718057602643967, 0.0401446670293808, 0.026440180838108063, -0.017025159671902657, 0.22091664373874664, 0.024703698232769966, -0.15607595443725586, -0.0018572667613625526, -0.037675946950912476, 0.3210170865058899, 0.10884962230920792, 0.030370134860277176, 0.056784629821777344, -0.030112050473690033, 0.023124486207962036, -0.1449904441833496, 0.08885903656482697, 0.17527811229228973, 0.08804896473884583, 0.038310401141643524, -0.01704210229218006, -0.17355971038341522, -0.018237406387925148, 0.030551932752132416, -0.23085585236549377, 0.13475817441940308, 0.16338199377059937, -0.06968289613723755, -0.04330683499574661, 0.04434924200177193, 0.22637797892093658, 0.07463733851909637, -0.15070196986198425, -0.07500549405813217, 0.10863590240478516, -0.22288714349269867, 0.0010778247378766537, 0.057608842849731445, -0.12828609347343445, -0.17236559092998505, -0.23064571619033813, 0.09910193085670471, 0.46647992730140686, 0.0634111613035202, -0.13985536992549896, 0.052741192281246185, -0.1558966338634491, 0.022585246711969376, 0.10514408349990845, 0.11794176697731018, -0.06241249293088913, 0.06389056891202927, -0.14145469665527344, 0.060088545083999634, 0.09667345881462097, -0.004665130749344826, -0.07927791774272919, 0.21978208422660828, -0.0016187895089387894, 0.04876316711306572, 0.03137822449207306, 0.08962501585483551, -0.09108036011457443, -0.01795950159430504, -0.04094596579670906, 0.03533276170492172, 0.01394269522279501, -0.08244197070598602, -0.05095399543642998, 0.04305890575051308, -0.1195211187005043, 0.16731074452400208, 0.03894471749663353, -0.0222858227789402, -0.07944411784410477, 0.0614166259765625, -0.1481470763683319, -0.09113290905952454, 0.14758692681789398, -0.24051085114479065, 0.164126917719841, 0.1753545105457306, -0.003193420823663473, 0.20875433087348938, 0.03357946127653122, 0.1259773075580597, -0.00022807717323303223, -0.039092566817998886, -0.13582147657871246, -0.01937306858599186, 0.015938198193907738, 0.00787206832319498, 0.05792934447526932, 0.03294186294078827]
array_2 = [-0.1966051608324051, 0.0940953716635704, -0.0031937970779836178, -0.03691547363996506, -0.07240629941225052, -0.07114037871360779, -0.07133384048938751, -0.1283963918685913, 0.15377545356750488, -0.091400146484375, 0.10803385823965073, -0.09235749393701553, -0.1866973638534546, -0.021168243139982224, -0.09094691276550293, 0.07300164550542831, -0.20971564948558807, -0.1847742646932602, -0.009817334823310375, -0.05971141159534454, 0.09904412180185318, 0.0278592761605978, -0.012554554268717766, 0.09818517416715622, -0.1747943013906479, -0.31632938981056213, -0.0864541232585907, -0.13249783217906952, 0.002135572023689747, -0.04935726895928383, 0.010047778487205505, 0.04549024999141693, -0.26334646344184875, -0.05263081565499306, -0.013573898002505302, 0.2042253464460373, 0.06646320968866348, 0.08540669083595276, 0.12267164140939713, -0.018634958192706108, -0.19135263562202454, 0.01208433136343956, 0.09216200560331345, 0.2779296934604645, 0.1531585156917572, 0.10681629925966263, -0.021275708451867104, -0.059720948338508606, 0.06610126793384552, -0.21058350801467896, 0.005440462380647659, 0.18833838403224945, 0.08883830159902573, 0.025969548150897026, 0.0337764173746109, -0.1585341989994049, 0.02370697632431984, 0.10416869819164276, -0.19022507965564728, 0.11423652619123459, 0.09144753962755203, -0.08765758574008942, -0.0032832929864525795, -0.0051014479249715805, 0.19875964522361755, 0.07349056005477905, -0.1031823456287384, -0.10447365045547485, 0.11358538269996643, -0.24666038155555725, -0.05960353836417198, 0.07124857604503632, -0.039664581418037415, -0.20122921466827393, -0.31481748819351196, -0.006801256909966469, 0.41940364241600037, 0.1236235573887825, -0.12495145946741104, 0.12580059468746185, -0.02020396664738655, -0.03004150651395321, 0.11967054009437561, 0.09008713811635971, -0.07470540702342987, 0.09324200451374054, -0.13763070106506348, 0.07720538973808289, 0.19568027555942535, 0.036567769944667816, 0.030284458771348, 0.14119629561901093, -0.03820852190256119, 0.06232285499572754, 0.036639824509620667, 0.07704029232263565, -0.12276224792003632, -0.0035170004703104496, -0.13103705644607544, 0.027697769924998283, -0.01527332328259945, -0.04027168080210686, -0.03659897670149803, 0.03330300375819206, -0.12293602526187897, 0.09043421596288681, -0.019673841074109077, -0.07563626766204834, -0.13991905748844147, 0.014788001775741577, -0.07630413770675659, 0.00017269013915210962, 0.16345393657684326, -0.25710681080818176, 0.19869503378868103, 0.19393865764141083, -0.07422225922346115, 0.19553625583648682, 0.09189949929714203, 0.051557887345552444, -0.0008843056857585907, -0.006250975653529167, -0.1680600494146347, -0.10320111364126205, 0.03232177346944809, -0.08931156992912292, 0.11964476853609085, 0.00814182311296463]

以上数组的协方差矩阵是奇异矩阵,因此我无法求其逆矩阵。为什么会变成奇异矩阵?
编辑2:解决方法
由于这里的协方差矩阵是奇异矩阵,我必须使用np.linalg.pinv(V)进行伪逆操作。

3
IV 应该是从采样向量的 128 维分布的协方差矩阵的逆矩阵。你可以使用 V = np.cov(np.array([array_1, array_2]).T) 来估算协方差矩阵(在 np.cov 中,行是变量,列是观测值),但这只能使用这两个样本。理想情况下,你应该使用更好的估计方法,例如整个数据集的协方差矩阵。 - jdehesa
@jdehesa 我没有从中提取128维数组的分布,这些值是从图像生成的。基本上是面部的编码。当我尝试使用你上面提到的.T时,我在尝试反转协方差矩阵时遇到了错误,即numpy.linalg.linalg.LinAlgError: Singular matrix - Parthapratim Neog
在计算协方差之前,您只需要对数组进行转置即可。而且@jdehesa是正确的,从两个观测值计算协方差是一个坏主意。首先,您应该使用整个图像计算cov。然后,您可以使用相同的协方差矩阵找到任意两行之间的马氏距离。 - tel
1个回答

4
numpy.cov文档中可以看到,第一个参数应该是一个数组m,其中:

矩阵m的每一行代表一个变量,每一列代表所有这些变量的单个观测值。

因此,在调用cov之前,请将您的数组转置(使用.T)以修复您的代码:

V = np.cov(np.array([array_1, array_2]).T)
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))

我刚刚在一些随机数据上进行了测试,可以确认它有效。

此外,仅从两个观测值计算协方差是一个不好的想法,也不太可能非常准确。如果您的数据来自图像,则在计算协方差矩阵时应使用整个图像 img(或至少感兴趣的整个区域),然后使用该矩阵来找到两个感兴趣向量之间的马氏距离:

V = np.cov(np.array(img))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))

根据您生成array_1array_2的方式,您可能需要将img替换为img.T,也可能不需要。

如果您得到的是奇异协方差矩阵,那么您面临的是一个数学问题,而不是代码问题。显然,这是一个常见的问题,已经有人提出了问题“为什么我的协方差矩阵是奇异的?”并得到了解答。非常广泛地说,当足够多的数据点在某种意义上“太相似”时,它似乎会发生。我想使用只有两个数据点也会增加这种情况发生的可能性。


你能否尝试使用我提到的数组来测试一下。因为np.array(img)实际上并不是我需要的东西。我需要使用的是我提到的那些经过两个不同图像上的许多计算生成的数组。但是,当我使用我提到的数组时,我实际上遇到了一个错误。错误提示:numpy.linalg.linalg.LinAlgError: Singular matrix - Parthapratim Neog
值得一提的是,array_1array_2来自不同的图像。 - Parthapratim Neog
没事了,对于我提供的数据它实际上是有效的,但是对于另一个数组对则会出现上述错误。谢谢。如果您有兴趣查看这个数组对,我会更新问题。 - Parthapratim Neog
@ParthapratimNeog 我在我的回答中添加了一个链接,其中深入讨论了为什么协方差矩阵最终会变成奇异的原因。如果我是你,我会找到一种使用两个以上数据点计算协方差的方法。这可能是最简单的解决方法。 - tel

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接