每行获取前n列的pandas操作

Question

每行获取前n列的pandas操作

3

我有以下数据框：

df1 = pd.DataFrame(data={'1': ['a', 'd', 'g', 'j'], 
                         '2': ['b', 'e', 'h', 'k'], 
                         '3': ['c', 'f', 'i', 'l'],
                         'top_n': [1, 3, 2, 1]},
                   index=pd.Series(['ind1', 'ind2', 'ind3', 'ind4'], name='index'))
>>> df1
       1  2  3   top_n
index
ind1   a  b  c    1
ind2   d  e  f    3
ind3   g  h  i    2
ind4   j  k  l    1

如何基于top_n列获取每行的前N个值？

>>> df1
       1  2    3      top_n
index
ind1   a  NaN  NaN     1
ind2   d  e    f       3
ind3   g  h    NaN     2
ind4   j  NaN  NaN     1

在这个例子中，ind3 拥有 g 和 h，因为 top_n 值为 2。

- bltSandwich21

3个回答

1

让我们使用numpy的广播创建一个布尔型mask，然后使用这个mask和where函数来选择每行中前N个值。

cols = ['1', '2', '3']
mask = df1['top_n'].values[:, None] > range(len(cols))

df1.assign(**df1[cols].where(mask))

       1    2    3  top_n
index                    
ind1   a  NaN  NaN      1
ind2   d    e    f      3
ind3   g    h  NaN      2
ind4   j  NaN  NaN      1

- Shubham Sharma

0

假设top_n的值始终大于或等于其他列的数量，那么只需要进行切片操作即可：

df1.apply(lambda row: row[:row.top_n], axis=1)

- Nuri Taş

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mozway · Accepted Answer

使用 numpy 进行布尔掩码：

import numpy as np

a = np.ones_like(df1).cumsum(1)

mask = ((a <= df1['top_n'].values[:,None]) | (a == df1.shape[1]))

out = df1.where(mask)

输出：

       1    2    3  top_n
index                    
ind1   a  NaN  NaN      1
ind2   d    e    f      3
ind3   g    h  NaN      2
ind4   j  NaN  NaN      1

"mask": 面罩，口罩，面具

array([[ True, False, False,  True],
       [ True,  True,  True,  True],
       [ True,  True, False,  True],
       [ True, False, False,  True]])