用条件填充列

4

我有一个包含2列的数据框:

          Col1          Col2
1          NaN         Someval1
2           Y          Someval2
3           N          Someval3
4          NaN           NaN
5          NaN         Someval4

我希望您能使用以下条件填充NaN值:
If Col1 has NaN and Col2 has a Someval1 that is in list 1 then fillna with Y
If Col1 has NaN and Col2 has a Someval4 that is in list 2 then fillna with N
If Col1 has NaN and Col2 has a NaN that is in list 2 then fillna with N

任何建议?(不知道是否可能)
非常感谢!

1
列表1中的NaN是什么意思? :) - tworec
列表2,感谢验证,我会进行编辑。 - datascana
3个回答

7

我认为你需要使用mask函数,针对isnullisin条件:

L1 = ['Someval1','Someval8']
L2 = ['Someval4','Someval9', np.nan]
m1 = df['Col1'].isnull()
m2 = df['Col2'].isin(L1)
m3 = df['Col2'].isin(L2)

df['Col1'] = df['Col1'].mask(m1 & m2, 'Y')
df['Col1'] = df['Col1'].mask(m1 & m3, 'N')

print (df)
  Col1      Col2
1    Y  Someval1
2    Y  Someval2
3    N  Someval3
4    N       NaN
5    N  Someval4

使用numpy.where的另一种解决方案:

df['Col1'] = np.where(m1 & m2, 'Y',
             np.where(m1 & m3, 'N', df['Col1']))

print (df)
  Col1      Col2
1    Y  Someval1
2    Y  Someval2
3    N  Someval3
4    N       NaN
5    N  Someval4

另一种使用一个条件以及 fillna 的解决方案:
L1 = ['Someval1','Someval8']
L2 = ['Someval4','Someval9', np.nan]

df['Col1'] = df['Col1'].mask(df['Col2'].isin(L1), df['Col1'].fillna('Y'))
df['Col1'] = df['Col1'].mask(df['Col2'].isin(L2), df['Col1'].fillna('N'))
print (df)
  Col1      Col2
1    Y  Someval1
2    Y  Someval2
3    N  Someval3
4    N       NaN
5    N  Someval4

0

这里是.loc的解决方案

df.loc[df.Col1.isnull() & df.Col2.isin(['Someval1']), 'Col1'] = 'Y'
df.loc[df.Col1.isnull() & df.Col2.isin(['Someval4']), 'Col1'] = 'N'
df.loc[df.Col1.isnull() & df.Col2.isin([np.nan]), :] = 'N'

完整脚本:

df = pd.read_csv(StringIO("""Col1          Col2
1          NaN         Someval1
2           Y          Someval2
3           N          Someval3
4          NaN           NaN
5          NaN         Someval4
"""), sep="\s+")

df.loc[df.Col1.isnull() & df.Col2.isin(['Someval1']), 'Col1'] = 'Y'
df.loc[df.Col1.isnull() & df.Col2.isin(['Someval4']), 'Col1'] = 'N'
df.loc[df.Col1.isnull() & df.Col2.isin([np.nan]), :] = 'N'

df

  Col1      Col2
1    Y  Someval1
2    Y  Someval2
3    N  Someval3
4    N         N
5    N  Someval4

0

您可以使用

df.Col1[(df['Col1'].isnull())&(df['Col2']=='Someval1')] = 'Y'
df.Col1[(df['Col1'].isnull())&(df['Col2']=='Someval4')] = 'N'
df.Col1[(df['Col1'].isnull())&(df['Col2'].isnull())] = 'N'

第二行覆盖了第五行的“Someval4”;第二行和第三行不支持在第四行和第五行中使用fillna函数来填充Col1。 - tworec
@tworec 我认为 list 2 指的是 第二列, 我会修改它。 - danche

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接