简单的数据框架示例:
df = pd.DataFrame({'mycol':['foo','bar','hello','there',np.nan,np.nan,np.nan,'foo'],
'mycol2':'this is here to make it a DF'.split()})
print(df)
mycol mycol2
0 foo this
1 bar is
2 hello here
3 there to
4 NaN make
5 NaN it
6 NaN a
7 foo DF
我试图用
mycol
中的数据填充 NaN 值,例如我希望将 NaN 替换为 foo
、bar
、hello
等样本。# fill NA values with n samples (n= number of NAs) from df['mycol']
df['mycol'].fillna(df['mycol'].sample(n=df.isna().sum(), random_state=1,replace=True).values)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
# fill NA values with n samples, n=1. Dropna from df['mycol'] before sampling:
df['mycol'] = df['mycol'].fillna(df['mycol'].dropna().sample(n=1, random_state=1,replace=True)).values
# nothing happens
预期输出:Nas填充了从
mycol
中随机抽样的数据。 mycol mycol2
0 foo this
1 bar is
2 hello here
3 there to
4 foo make
5 foo it
6 hello a
7 foo DF
回答编辑: @Jezrael的下面的答案解决了我的问题,我在索引方面有问题。
df['mycol'] = (df['mycol']
.dropna()
.sample(n=len(df),replace=True)
.reset_index(drop=True))
n = df.isna().sum()
这部分是问题所在;检查一下,你会发现它给出了两个数字而不是一个。 - help-ukraine-now