.any()
和.all()
在极端情况下非常有用,但当你要寻找特定数量的空值时则不适用。这里是一种非常简单的方法来做我相信你在问什么。它非常冗长,但功能齐全。
import pandas as pd
import numpy as np
df = pd.DataFrame({'num_legs': [2, 4, np.nan, 0, np.nan],
'num_wings': [2, 0, np.nan, 0, 9],
'num_specimen_seen': [10, np.nan, 1, 8, np.nan]})
def row_nan_sums(df):
sums = []
for row in df.values:
sum = 0
for el in row:
if el != el:
sum+=1
sums.append(sum)
return sums
def query_k_plus_sums(df, k):
sums = row_nan_sums(df)
indices = []
i = 0
for sum in sums:
if (sum >= k):
indices.append(i)
i += 1
return indices
print(df)
print(query_k_plus_sums(df, 2))
输出
num_legs num_wings num_specimen_seen
0 2.0 2.0 10.0
1 4.0 0.0 NaN
2 NaN NaN 1.0
3 0.0 0.0 8.0
4 NaN 9.0 NaN
[2, 4]
如果你和我一样,想要清除那些行,你只需写入以下代码:
df.drop(query_k_plus_sums(df, 2),inplace=True)
df = df.sample(frac=1).reset_index(drop=True)
print(df)
输出:
num_legs num_wings num_specimen_seen
0 4.0 0.0 NaN
1 0.0 0.0 8.0
2 2.0 2.0 10.0
df[df.isnull().any(axis=1)]
这个语句可以工作,但会抛出UserWarning: Boolean Series key will be reindexed to match DataFrame index.
的警告信息。如何更明确地重写这个语句,以避免触发该警告信息? - Vishaldf.loc[df.isnull().any(axis=1)]
- James Draper