Python / Pandas：填充特定行和列中的缺失值

Question

Python / Pandas：填充特定行和列中的缺失值

4

我对R有很丰富的经验，现在正在通过尝试将现有的一系列R脚本“翻译”成Python来学习Python（df是一个pandas DataFrame）。我卡在了这一行：

df[df$id != df$id_old, c("col1", "col2")] <- NA

即我正在尝试填充特定行/列中的NA值。我一直在尝试不同的方法，最有希望的路线似乎是

index = np.where(df.id != df.id_old)
df.col1[index] = np.repeat(np.nan, np.size(index))

但是这会在第二行引发以下错误（不是非常理解）。

Can only tuple-index with a MultiIndex

请问如何最简洁地达成我的目标？

示例：：

df = pd.DataFrame({'id' : [1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5], 
    'id_old' : [1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5, 5], 
    'col1' : np.random.normal(size = 12), 
    'col2' : np.random.randint(low = 20, high = 50, size = 12),
    'col3' : np.repeat('other info', 12)})
print(df)

输出：

   id  id_old      col1  col2        col3
0    1       1  0.320982    31  other info
1    1       1  0.398855    42  other info
2    1       2 -0.664073    30  other info
3    2       2  1.428694    48  other info
4    2       3 -1.240363    49  other info
5    3       4  0.023167    42  other info
6    4       4 -0.645114    44  other info
7    4       4 -1.033602    47  other info
8    4       4  0.295143    27  other info
9    4       5  0.531660    32  other info
10   5       5 -0.787401    33  other info
11   5       5  2.033503    48  other info

期望的结果：

   id  id_old      col1  col2        col3
0    1       1  0.320982    31  other info
1    1       1  0.398855    42  other info
2    1       2       NaN   NaN  other info
3    2       2  1.428694    48  other info
4    2       3       NaN   NaN  other info
5    3       4       NaN   NaN  other info
6    4       4 -0.645114    44  other info
7    4       4 -1.033602    47  other info
8    4       4  0.295143    27  other info
9    4       5       NaN   NaN  other info
10   5       5 -0.787401    33  other info
11   5       5  2.033503    48  other info

- KaB

你想要类似 df.loc[index, 'col1'] = ... 这样的东西。 - lmo

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Haleemur Ali · Accepted Answer

使用 .loc 并传递一个列表，相当于在 R 中使用 c(...)

loc 允许进行原地赋值。

示例：

df.loc[df.id!=df.id_old, ['col1', 'col2']] = np.nan

输出：

        col1  col2        col3  id  id_old
0   2.411473  31.0  other info   1       1
1   0.874083  43.0  other info   1       1
2        NaN   NaN  other info   1       2
3   2.156903  20.0  other info   2       2
4        NaN   NaN  other info   2       3
5        NaN   NaN  other info   3       4
6   0.933760  22.0  other info   4       4
7  -1.239806  42.0  other info   4       4
8  -0.493344  41.0  other info   4       4
9        NaN   NaN  other info   4       5
10 -0.751290  30.0  other info   5       5
11  0.327527  31.0  other info   5       5