Pandas基于上一行填充缺失值

Question

Pandas基于上一行填充缺失值

3

I have a dataframe like the following:

import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2, 1], 'col2':[np.nan, 1, np.nan, 1, np.nan, np.nan, np.nan, 2, np.nan]}
df=pd.DataFrame(data,columns=['col1', 'col2'])
print df

   col1  col2
0     1   NaN
1     3   1.0
2     3   NaN
3     1   1.0
4     2   NaN
5     3   NaN
6     2   NaN
7     2   2.0
8     1   NaN

我正在尝试创建第三列，以填充col2中的NaN值，如果col2的值等于1.0或上一行中的col2为1.0。最终的数据框应该像这样：

 col1  col2  col3
0     1   NaN   NaN
1     3   1.0   1.0
2     3   NaN   1.0
3     1   1.0   1.0
4     2   NaN   1.0
5     3   NaN   1.0
6     2   NaN   1.0
7     2   2.0   2.0
8     1   NaN   NaN

我尝试的第一种方法是:

df ['col3'] = ((df ['col2']== 1) | ((df ['col2'].shift()== 1))). astype ('int')

这会让我得到这个数据框:

col1  col2  col3
0     1   NaN     0
1     3   1.0     1
2     3   NaN     1
3     1   1.0     1
4     2   NaN     1
5     3   NaN     0
6     2   NaN     0
7     2   2.0     0
8     1   NaN     0

这段代码可以纠正第一个缺失值，但不能继续填充后续的缺失值。我还尝试使用 np.where() 函数，但结果相同。

有没有一种方法可以在 pandas 中编写代码，以便可以连续修复多个连续的缺失值？

- jth359

3个回答

3

您可以使用 df.fillna 函数进行前向填充，例如：

df.fillna(method='pad')

   col1  col2
0     1   NaN
1     3   1.0
2     3   1.0
3     1   1.0
4     2   1.0
5     3   1.0
6     2   1.0
7     2   2.0
8     1   2.0

- shish023

1

只有当col2为1.0或上一行为1.0时，我才想填入值 - 您的建议还会填补上一行为2.0的缺失值。 - jth359

抱歉我错过了那个细节。其他答案完美地解决了问题。 - shish023

2

ffilled = df.col2.ffill()
df.assign(col3=df.col2.fillna(ffilled[ffilled == 1]))

- piRSquared

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- root · Accepted Answer

您可以使用np.where函数，通过查看前向填充等于1的位置，在True处填充1，在False处回退到“col2”的值：

df['col2'] = np.where(df['col2'].ffill() == 1, 1, df['col2'])

生成的输出：

   col1  col2
0     1   NaN
1     3   1.0
2     3   1.0
3     1   1.0
4     2   1.0
5     3   1.0
6     2   1.0
7     2   2.0
8     1   NaN