布尔序列键将被重新索引以匹配数据帧索引。

Question

布尔序列键将被重新索引以匹配数据帧索引。

120

以下是我遇到此警告的方式：

df.loc[a_list][df.a_col.isnull()]

a_list 的类型是 Int64Index，它包含一系列行索引。这些行索引都属于 df。

df.a_col.isnull() 部分是我用于过滤的条件。

如果我单独执行以下命令，就不会收到任何警告：

df.loc[a_list]
df[df.a_col.isnull()]

但如果我将它们放在一起df.loc[a_list][df.a_col.isnull()]，我会收到警告信息（但我可以看到结果）：

布尔系列键将重新索引以匹配DataFrame索引

这个警告信息的含义是什么？它会影响返回的结果吗？

- Cheng

3个回答

19

如果您收到了此警告，请使用.loc[]而不是[]来抑制此警告。¹

df.loc[boolean_mask]           # <--------- OK
df[boolean_mask]               # <--------- warning

对于该问题所述的特例，您可以链接.loc[]索引器：

df.loc[a_list].loc[df['a_col'].isna()]

或者使用 query() 内部的 and 来链接所有条件：

# if a_list is a list of indices of df
df.query("index in @a_list and a_col != a_col")

# if a_list is a list of values in some other column such as b_col
df.query("b_col in @a_list and a_col != a_col")

或者使用 [] 内部的 & 链接多个条件（如@IanS帖子中所示）。

如果出现以下情况，则会出现此警告：

the index of the boolean mask is not in the same order as the index of the dataframe it is filtering.

df = pd.DataFrame({'a_col':[1, 2, np.nan]}, index=[0, 1, 2])
m1 = pd.Series([True, False, True], index=[2, 1, 0])
df.loc[m1]       # <--------- OK
df[m1]           # <--------- warning

the index of a boolean mask is a super set of the index of the dataframe it is filtering. For example:

m2 = pd.Series([True, False, True, True], np.r_[df.index, 10])
df.loc[m2]       # <--------- OK
df[m2]           # <--------- warning

^{1: 如果我们查看 [] 和 loc[] 的源代码，就会发现当布尔掩码的索引是数据框索引的(弱)超集时，唯一的区别就是[]会通过 _getitem_bool_array 方法显示此警告，而loc[]则不会。}

- cottontail

0

在浏览此页面时，我通过查询完整数据框遇到了相同的错误，但是使用结果针对子数据。

创建数据子集并将其存储在变量sub_df中：

sub_df = df[df['a'] == 1]
sub_df = sub_df[df['b'] == 1] # Note "df" hiding here

解决方案：

确保每次使用相同的数据框（在我的情况下，只有sub_df）：

# Last line should instead be:
sub_df = sub_df[sub_df['b'] == 1]

- KJ Price

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- IanS · Accepted Answer

你的方法虽然会有警告，但仍能正常运行，不过最好不要依赖隐式、不明确的行为。 解决方案1，将在a_list中选择索引变为一个布尔掩码：

df[df.index.isin(a_list) & df.a_col.isnull()]

解决方案2，分两步进行：

df2 = df.loc[a_list]
df2[df2.a_col.isnull()]

解决方案3：如果您想要一个一行的代码，可以使用这里找到的技巧：

df.loc[a_list].query('a_col != a_col')

警告来自于这样一个事实，布尔向量df.a_col.isnull()的长度为df的长度，而df.loc[a_list]的长度为a_list的长度，即较短。因此，df.a_col.isnull()中的一些索引不在df.loc[a_list]中。

Pandas所做的是将布尔序列重新索引到调用数据框的索引上。实际上，它从df.a_col.isnull()获取与a_list中的索引相对应的值。这种方法是可行的，但是这种行为是隐式的，并且未来很容易发生变化，因此这就是警告的原因。