Python Pandas选择头和尾

Question

Python Pandas选择头和尾

30

在Pandas中，如何选择DataFrame的前5个值和后5个值？

例如：

In [11]: df
Out[11]: 
        A  B  C
2012-11-29  0  0  0
2012-11-30  1  1  1
2012-12-01  2  2  2
2012-12-02  3  3  3
2012-12-03  4  4  4
2012-12-04  5  5  5
2012-12-05  6  6  6
2012-12-06  7  7  7
2012-12-07  8  8  8
2012-12-08  9  9  9

如何显示第一行和最后一行的两行？

- fu xue

你的问题不太清楚，你说你想选择前5个值和后5个值，你是指行还是单个值？请展示所需输出。 - EdChum

除了其他值之外，头和尾可以链接（就像在bash中一样 :)），以便在中间给您提供值（df.head(90).tail(10)），以获取80到90的值。 - tjb

9个回答

15

虽然不完全是相同的问题，但如果您只想要展示前/后5行（例如在jupyter中使用display或普通的print），则可能存在一种比这更简单的方法，即使用pd.option_context上下文。

#make 100 3d random numbers
df = pd.DataFrame(np.random.randn(100,3))

# sort them by their axis sum
df = df.loc[df.sum(axis=1).index]

with pd.option_context('display.max_rows',10):
    print(df)

输出：

           0         1         2
0  -0.649105 -0.413335  0.374872
1   3.390490  0.552708 -1.723864
2  -0.781308 -0.277342 -0.903127
3   0.433665 -1.125215 -0.290228
4  -2.028750 -0.083870 -0.094274
..       ...       ...       ...
95  0.443618 -1.473138  1.132161
96 -1.370215 -0.196425 -0.528401
97  1.062717 -0.997204 -1.666953
98  1.303512  0.699318 -0.863577
99 -0.109340 -1.330882 -1.455040

[100 rows x 3 columns]

- Bolster

1

在这里以几种方式扩展了这个答案：[https://dev59.com/O1gQ5IYBdhLWcg3wnFLJ#53311747]... - watsonic

15

你可以使用df.head(5)和df.tail(5)分别获取前五个和后五个数据。你也可以选择创建新的数据框，并使用append()添加头部和尾部：

new_df = df.tail(5)
new_df = new_df.append(df.head(5))

- Linas Fx

13

为了达到这个目的，您应该同时使用head()和tail()。我认为最简单的方法是：

df.head(5).append(df.tail(5))

- Hamid

在pandas 2.0+中，append已被移除。相反，需要使用concat - pd.concat([df.head(5), df.tail(5)])。 - philipnye

11

简单小函数：

def ends(df, x=5):
    return df.head(x).append(df.tail(x))

使用方式如下：

df = pd.DataFrame(np.random.rand(15,6))
ends(df,2)

实际上我经常使用它，~~我认为它是添加到pandas中的一个很棒的功能。~~ （不会将任何功能添加到pandas.DataFrame核心API中）我像这样在导入后添加：

import pandas as pd
def ends(df, x=5):
    return df.head(x).append(df.tail(x))
setattr(pd.DataFrame,'ends',ends)

使用方式如下：

import numpy as np
df = pd.DataFrame(np.random.rand(15,6))
df.ends(2)

- ic_fl2

1

你能否将此提交到Pandas Git；这应该是一个默认函数。 - n3rd

1

为了避免警告-def ends(df, x=5): return pd.concat([df.head(x), df.tail(x)], axis=0) setattr(pd.DataFrame,'ends',ends) - undefined

5

在Jupyter中，扩展@bolster的回答，我们将创建一个可重复使用的便利函数：

def display_n(df,n): 
    with pd.option_context('display.max_rows',n*2):
        display(df)

那么

display_n(df,2)

返回

         0           1           2
0        0.167961    -0.732745   0.952637
1        -0.050742   -0.421239   0.444715
...      ...         ...         ...
98       0.085264    0.982093    -0.509356
99       -0.758963   -0.578267   -0.115865

当df为df = pd.DataFrame(np.random.randn(100,3))时:

注：

当然，你可以通过修改display为print来将相同的内容打印成文本。
在类Unix系统中，您可以按照这里所述，在~/.ipython/profile_default/startup中放置py或ipy文件，从而在所有笔记本中自动加载上述函数。

（除了一个漂亮格式化的HTML表格）

- watsonic

1

与Linas Fx相关联。

以下是定义

pd.DataFrame.less = lambda df, n=10: df.head(n//2).append(df.tail(n//2))

然后你只需要输入 df.less()

这与输入 df.head().append(df.tail()) 相同

如果你输入 df.less(2)，结果与 df.head(1).append(df.tail(1)) 相同

- You Oneandzero

很好，我不知道你可以像这样向Pandas数据帧添加方法！ - Roald

1

如果您只想使用Pandas，您可以使用apply()将头部和尾部连接起来：

import pandas as pd
from string import ascii_lowercase, ascii_uppercase

df = pd.DataFrame(
    {"upper": list(ascii_uppercase), "lower": list(ascii_lowercase)}, index=range(1, 27)
)

df.apply(lambda x: pd.concat([x.head(2), x.tail(2)]))


   upper lower
1      A     a
2      B     b
25     Y     y
26     Z     z

- user1717828

0

将@ic_fl2和@watsonic结合起来，在Jupyter中形成以下代码：

def ends_attr():
    def display_n(df,n):
        with pd.option_context('display.max_rows',n*2):
            display(df)
    # set pd.DataFrame attribute where .ends runs display_n() function
    setattr(pd.DataFrame,'ends',display_n)

ends_attr()

查看数据框的前三行和后三行：

your_df.ends(3)

我喜欢这个，因为我可以复制一个函数，知道我拥有使用ends属性所需的一切。

- Me J

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

您可以使用iloc和numpy.r_：

print (np.r_[0:2, -2:0])
[ 0  1 -2 -1]

df = df.iloc[np.r_[0:2, -2:0]]
print (df)
            A  B  C
2012-11-29  0  0  0
2012-11-30  1  1  1
2012-12-07  8  8  8
2012-12-08  9  9  9

df = df.iloc[np.r_[0:4, -4:0]]
print (df)
            A  B  C
2012-11-29  0  0  0
2012-11-30  1  1  1
2012-12-01  2  2  2
2012-12-02  3  3  3
2012-12-05  6  6  6
2012-12-06  7  7  7
2012-12-07  8  8  8
2012-12-08  9  9  9