去重 - ValueError: keep 参数必须是 "first"、"last" 或 False。

3
我安装了Pandas 17.0版本,但现在遇到一个奇怪的错误:

ValueError: keep must be either "first", "last" or False

当我尝试执行以下操作时:

ids=ids.drop_duplicates('ID')

之前的Pandas版本中,这总是有效的,代码没有改变。顺便提一下,ids是包含整数列的数据帧...

以下是回溯信息:

Traceback (most recent call last):

File "<ipython-input-34-6e98a890591b>", line 1, in <module>
     ids=ids.drop_duplicates('ID')

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
 1164, in drop_duplicates
     return super(Series, self).drop_duplicates(keep=keep, inplace=inplace)

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 576,
 in drop_duplicates
     duplicated = self.duplicated(keep=keep)

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
 1169, in duplicated
     return super(Series, self).duplicated(keep=keep)

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 603,
 in duplicated
     duplicated = lib.duplicated(keys, keep=keep)

File "pandas\lib.pyx", line 1383, in pandas.lib.duplicated
 (pandas\lib.c:24490)

ValueError: keep must be either "first", "last" or False

注意keep=keep吗?在Pandas 17.0中,drop_duplicates的默认值为keep='first'。所以如果我不指定,它不应该默认为这个吗?为什么我会在这里得到一个错误?是Pandas 17.0的错误吗?


您的回溯信息表明ids是一个Series,那么type(ids)显示什么? - EdChum
2个回答

4
该错误表示ids实际上是一个Series,其中第一个参数是keep参数。如果ids真的是一个df,那么不会发生这个错误,因为drop_duplicates的第一个参数是subset

哇,这么快就完成了。谢谢Ed。我很尴尬错过了这个问题。在17中是否有任何更改会导致此错误?我的错误是这是一个系列。非常感谢...当他们允许我时,我将在10分钟内接受答案! - clg4
这是在0.17.0中添加的,请向下滚动一点,查看有关drop_duplicates的内容。 - EdChum

1

我尝试了语法(使用keep),之前是take_last...

import pandas as pd
df = pd.DataFrame({'c1': ['cat'] * 3 + ['dog'] * 4,
                   'c2': [1, 1, 2, 3, 3, 4, 4]})

print(df)
print(df.drop_duplicates())   
print(df.drop_duplicates(['c1', 'c2'],keep='first'))   
print(df.drop_duplicates(['c1', 'c2'],keep='last'))   
print(df.drop_duplicates(['c1', 'c2'],keep=False))   #drops all but one cat stays

默认情况下,对于drop_duplicates(),它是keep='first'并且所有列都会被考虑在内。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接