去除pandas中的前导NaN值。

Question

去除pandas中的前导NaN值。

18

我该如何在pandas中去除前导NaN？

pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

我想要从上述数据中仅删除前3个NaN值，所以结果应为：

pd.Series([1, 2, np.nan, 3])

- Meh

这样写合适吗？pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3][3:]) - clemtoy

@clemtoy 那只是一个例子。我不知道有多少前导NaN。 - Meh

4个回答

2

查找第一个非NaN索引

要查找第一个非NaN项的索引

s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

nans = s.apply(np.isnan)

first_non_nan = nans[nans == False].index[0] # get the first one

输出

s[first_non_nan:]
Out[44]:
3     1
4     2
5   NaN
6     3
dtype: float64

- bakkal

1

在这里可以提出两种更多的方法，假设输入序列为A。

方法1：使用切片 -

A[np.where(~np.isnan(A))[0][0]:]

方法二：使用掩码 -

A[np.maximum.accumulate(~np.isnan(A))]

样例运行 -

In [219]: A = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])

In [220]: A
Out[220]: 
0   NaN
1   NaN
2   NaN
3     1
4     2
5   NaN
6     3
dtype: float64

In [221]: A[np.where(~np.isnan(A))[0][0]:]       # Approach 1
Out[221]: 
3     1
4     2
5   NaN
6     3
dtype: float64

In [222]: A[np.maximum.accumulate(~np.isnan(A))]  # Approach 2
Out[222]: 
3     1
4     2
5   NaN
6     3
dtype: float64

- Divakar

-1

要移除前导的 np.nan:

tab = [np.nan, np.nan, np.nan, 1, 2, np.nan, 3]
pd.Series(tab[tab.index([n for n in tab if np.isnan(n)].pop(0)):])

- clemtoy

我希望有一个向量化的解决方案。 - Meh

我认为，在Python解释器中，列表推导仍将循环而不是使用数值库中的矢量化操作。 - bakkal

@clemtoy，正如bakkal所说，[n for n in tab ...]不是矢量化的。此外，您必须使用np.isnan(n)来测试NaN，n != np.nan无效（在控制台中尝试np.nan == np.nan）。 - Meh

@bakkal 啊，好的，我误解了“矢量化”，抱歉。 - clemtoy

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- EdChum · Accepted Answer

这里是另一种只使用 pandas 方法的方法:

In [103]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
first_valid = s[s.notnull()].index[0]
s.iloc[first_valid:]

Out[103]:
3     1
4     2
5   NaN
6     3
dtype: float64

因此，我们使用notnull来筛选系列以获取第一个有效索引。然后使用iloc对系列进行切片。

编辑

如@ajcr所指出，最好使用内置方法first_valid_index，因为它不会返回临时系列，我在上面的回答中使用它来进行掩码处理，另外，使用loc使用索引标签而不是iloc使用序数位置，在一般情况下适用于索引不是int64Index的情况：

In [104]:
s = pd.Series([np.nan, np.nan, np.nan, 1, 2, np.nan, 3])
s.loc[s.first_valid_index():]

Out[104]:
3     1
4     2
5   NaN
6     3
dtype: float64