使用ix()方法对带有负索引的pandas DataFrame进行切片

6

当使用负索引时,DataFrame.ix()似乎无法切片我想要的DataFrame。我有一个DataFrame对象,想切片最后2行。

    In [90]: df = pd.DataFrame(np.random.randn(10, 4))

    In [91]: df
    Out[91]: 
            0         1         2         3
    0  1.985922  0.664665 -2.800102  1.695480
    1  0.580509  0.782473  1.032970  1.559917
    2  0.584387  1.798743  0.095950  0.071999
    3  1.956221  0.075530 -0.391008  1.692585
    4 -0.644979 -1.959265  0.749394 -0.437995
    5 -1.204964  0.653912 -1.426602  2.409855
    6  1.178886  2.177259 -0.165106  1.145952
    7  1.410595 -0.761426 -1.280866  0.609122
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

有一种方法:

    In [92]: df[-2:]
    Out[92]: 
              0         1         2         3
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

另一种方法是这样做:
    In [93]: df.ix[len(df)-2:, :]
    Out[93]: 
              0         1         2         3
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

现在我想使用负索引,但遇到了问题:
    In [94]: df.ix[-2:, :]
    Out[94]: 
              0         1         2         3
    0  1.985922  0.664665 -2.800102  1.695480
    1  0.580509  0.782473  1.032970  1.559917
    2  0.584387  1.798743  0.095950  0.071999
    3  1.956221  0.075530 -0.391008  1.692585
    4 -0.644979 -1.959265  0.749394 -0.437995
    5 -1.204964  0.653912 -1.426602  2.409855
    6  1.178886  2.177259 -0.165106  1.145952
    7  1.410595 -0.761426 -1.280866  0.609122
    8  0.110534 -0.234781 -0.819976  0.252080
    9  1.798894  0.553394 -1.358335  1.278704

我应该如何正确地使用DataFrame.ix()的负索引?谢谢。
2个回答

8

这是一个bug:

In [1]: df = pd.DataFrame(np.random.randn(10, 4))

In [2]: df
Out[2]: 
          0         1         2         3
0 -3.100926 -0.580586 -1.216032  0.425951
1 -0.264271 -1.091915 -0.602675  0.099971
2 -0.846290  1.363663 -0.382874  0.065783
3 -0.099879 -0.679027 -0.708940  0.138728
4 -0.302597  0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802  0.037350  0.369167
6  0.754915 -0.569134 -0.297824 -0.600527
7  0.644742  0.038862  0.216869  0.294149
8  0.101684  0.784329  0.218221  0.965897
9 -1.482837 -1.325625  1.008795 -0.150439

In [3]: df.ix[-2:]
Out[3]: 
          0         1         2         3
0 -3.100926 -0.580586 -1.216032  0.425951
1 -0.264271 -1.091915 -0.602675  0.099971
2 -0.846290  1.363663 -0.382874  0.065783
3 -0.099879 -0.679027 -0.708940  0.138728
4 -0.302597  0.753350 -0.112674 -1.253316
5 -0.213237 -0.467802  0.037350  0.369167
6  0.754915 -0.569134 -0.297824 -0.600527
7  0.644742  0.038862  0.216869  0.294149
8  0.101684  0.784329  0.218221  0.965897
9 -1.482837 -1.325625  1.008795 -0.150439

请注意,df[-2:]也可以使用:

https://github.com/pydata/pandas/issues/2600

In [4]: df[-2:]
Out[4]: 
          0         1         2         3
8  0.101684  0.784329  0.218221  0.965897
9 -1.482837 -1.325625  1.008795 -0.150439

3

ix的主要目的是允许类似于numpy的索引,并支持行和列标签。因此,我不确定您的用例是否符合预期用途。以下是我能想到的几种方式,大多数都很琐碎:

In [142]: df.ix[:][-2:]
Out[142]:
          0         1         2         3
8  0.386882 -0.836112 -0.108250 -0.433797
9  0.642468 -0.399255 -0.911456 -0.497720

In [161]: df.ix[df.index[-2:],:]
Out[161]:
          0         1         2         3
8  0.386882 -0.836112 -0.108250 -0.433797
9  0.642468 -0.399255 -0.911456 -0.497720

我认为ix根本不支持负索引。它似乎完全忽略了它:

In [181]: df.ix[-100:,:]
Out[181]:
          0         1         2         3
0 -1.144137 -1.042034 -2.158838  0.674055
1 -0.424184  1.237318 -1.846130  0.575357
2 -0.844974 -0.541060  2.197364 -0.031898
3  0.846263  1.244450 -1.570566 -0.477919
4 -0.193445  0.171045 -0.235587 -1.185583
5  1.361539 -1.107389 -1.321081 -0.776407
6  0.505907 -1.364414 -2.093770  0.144016
7 -0.888465 -0.329153  0.491264 -0.363472
8  0.386882 -0.836112 -0.108250 -0.433797
9  0.642468 -0.399255 -0.911456 -0.497720

编辑:根据pandas文档,我们有:

Label-based indexing with integer axis labels is a thorny topic. It has been discussed heavily on mailing lists and among various members of the scientific Python community. In pandas, our general viewpoint is that labels matter more than integer locations. Therefore, with an integer axis index only label-based indexing is possible with the standard tools like .ix. The following code will generate exceptions:

s = Series(range(5))
s[-1]
df = DataFrame(np.random.randn(5, 4))
df
df.ix[-2:]

This deliberate decision was made to prevent ambiguities and subtle bugs (many users reported finding bugs when the API change was made to stop “falling back” on position-based indexing).


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接