在具有不同列名称的pandas数据帧中使用pd.corrwith

Question

在具有不同列名称的pandas数据帧中使用pd.corrwith

5

我希望以有效的方式获得x1与y中三列每个列之间的皮尔逊r。

看起来pd.corrwith()只能计算具有完全相同列标签的列之间的相关性，例如x和y。

这似乎有点不切实际，因为我认为计算不同变量之间的相关性将是一个常见问题。

In [1]: import pandas as pd; import numpy as np

In [2]: x = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])

In [3]: y = pd.DataFrame(np.random.randn(5,3),columns=['A','B','C'])

In [4]: x1 = pd.DataFrame(x.ix[:,0])

In [5]: x.corrwith(y)
Out[5]:
A   -0.752631
B   -0.525705
C    0.516071
dtype: float64

In [6]: x1.corrwith(y)
Out[6]:
A   -0.752631
B         NaN
C         NaN
dtype: float64

- themachinist

2个回答

0

你可以这样做（使用np.random.seed(0)）：

x1 = pd.DataFrame(pd.Series(x.ix[:,0]).repeat(x.shape[1]).reshape(x.shape), columns=x.columns)
x1.corrwith(y)

要得到这个结果：

A   -0.509
B    0.041
C   -0.732

- Primer

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- seth-p · Accepted Answer

你可以使用DataFrame.corrwith(Series)来实现你想要的效果，而不是使用DataFrame.corrwith(DataFrame)：

In [203]: x1 = x['A']

In [204]: y.corrwith(x1)
Out[204]:
A    0.347629
B   -0.480474
C   -0.729303
dtype: float64

或者，您可以按照以下方式形成每个 x 列和每个 y 列之间的相关矩阵：

Alternatively, 意思是"或者"，提供另一种方法。本文中介绍了一种计算相关矩阵的方法。

In [214]: pd.expanding_corr(x, y, pairwise=True).iloc[-1, :, :]
Out[214]:
          A         B         C
A  0.347629 -0.480474 -0.729303
B -0.334814  0.778019  0.654583
C -0.453273  0.212057  0.149544

哎呀，可惜 DataFrame.corrwith() 没有 pairwise=True 的选项。