如果我理解正确,您可以尝试将列id
to_numeric
转换,然后与1
进行比较:
print pd.to_numeric(df.id, errors='coerce') == 1
0 True
1 False
2 False
Name: id, dtype: bool
print df[pd.to_numeric(df.id, errors='coerce') == 1]
id text country datetime
0 1 hello bye USA 3/20/2016
如果你需要删除行,其中列id
不是0
或1
,请使用isin
:
print df.id.isin(['0','1'])
0 True
1 True
2 False
Name: id, dtype: bool
print df[df.id.isin(['0','1'])]
id text country datetime
0 1 hello bye USA 3/20/2016
1 0 good morning UK 3/21/2016
或者
to_numeric
与
notnull
:
print pd.to_numeric(df.id, errors='coerce').notnull()
0 True
1 True
2 False
Name: id, dtype: bool
print df[pd.to_numeric(df.id, errors='coerce').notnull()]
id text country datetime
0 1 hello bye USA 3/20/2016
1 0 good morning UK 3/21/2016
最后,您可以通过 replace
或双重 astype
将列 id
转换为 bool
:
print df.loc[df.id.isin(['0','1']),'id'].replace({'0': False, '1': True})
0 True
1 False
Name: id, dtype: bool
print df.loc[df.id.isin(['0','1']),'id'].astype(int).astype(bool)
0 True
1 False
Name: id, dtype: bool
print df.loc[pd.to_numeric(df.id, errors='coerce').notnull(),'id'].astype(int).astype(bool)
0 True
1 False
Name: id, dtype: bool
编辑:
时间,如果转换为bool
的值仅为0
和1
:
df = pd.concat([df]*10000).reset_index(drop=True)
In [628]: %timeit df.loc[np.in1d(df['id'], ['0','1']),'id'].map({'0': False, '1': True})
100 loops, best of 3: 2.19 ms per loop
In [629]: %timeit df.loc[np.in1d(df['id'], ['0','1']),'id'].replace({'0': False, '1': True})
The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached
100 loops, best of 3: 4.72 ms per loop
In [630]: %timeit df.loc[df['id'].isin(['0','1']),'id'].map({'0': False, '1': True})
100 loops, best of 3: 2.78 ms per loop
In [631]: %timeit df.loc[df['id'].str.contains('0|1'),'id'].map({'0': False, '1': True})
10 loops, best of 3: 20 ms per loop
In [632]: %timeit df.loc[df['id'].isin(['0','1']),'id'].astype(int).astype(bool)
100 loops, best of 3: 9.5 ms per loop
最好使用numpy.in1d和map
结合使用:
In [628]: %timeit df.loc[np.in1d(df['id'], ['0','1']),'id'].map({'0': False, '1': True})
100 loops, best of 3: 2.19 ms per loop