使用 Pandas query() 函数基于时间戳列筛选数据框。

5

我正在尝试使用字符串和函数query()在 Pandas 数据框中筛选一个时间戳列:

df.query('Timestamp < "2020-02-01"')

然而,我遇到了以下错误:
Traceback (most recent call last):   
File "C:\ENERCON\Python 3.7.2\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
     exec(code_obj, self.user_global_ns, self.user_ns)   
File "<ipython-input-3-7bb40e9c631a>", line 1, in <module>
     df.query('Timestamp < "2020-02-01"')   
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3199, in query
     res = self.eval(expr, **kwargs)   
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3315, in eval
     return _eval(expr, inplace=inplace, **kwargs)   
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\eval.py", line 327, in eval
     ret = eng_inst.evaluate()   
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\engines.py", line 142, in evaluate
     return self.expr()   
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 837, in __call__
     return self.terms(self.env)   
File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\ops.py", line 380, in __call__
     return self.func(left, right) 
TypeError: '<' not supported between instances of 'type' and 'str'

也尝试将字符串转换为日期时间,但错误类似。

df.query('Timestamp < @pd.to_datetime("2020-02-01")')
Traceback (most recent call last):
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\IPython\core\interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-23540526aad9>", line 1, in <module>
    df.query('Timestamp < @pd.to_datetime("2020-02-01")')
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3199, in query
    res = self.eval(expr, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\frame.py", line 3315, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\eval.py", line 322, in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env, truediv=truediv)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 830, in __init__
    self.terms = self.parse()
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 847, in parse
    return self._visitor.visit(self.expr)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
    return visitor(node, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 447, in visit_Module
    return self.visit(expr, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
    return visitor(node, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 450, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
    return visitor(node, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 747, in visit_Compare
    return self.visit(binop)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 441, in visit
    return visitor(node, **kwargs)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 565, in visit_BinOp
    return self._maybe_evaluate_binop(op, op_class, left, right)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 547, in _maybe_evaluate_binop
    return self._maybe_eval(res, self.binary_ops)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\expr.py", line 519, in _maybe_eval
    self.env, self.engine, self.parser, self.term_type, eval_in_python
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\ops.py", line 399, in evaluate
    res = self(env)
  File "C:\ENERCON\Python 3.7.2\lib\site-packages\pandas\core\computation\ops.py", line 380, in __call__
    return self.func(left, right)
TypeError: '<' not supported between instances of 'type' and 'Timestamp'

如果我使用.loc运行相同的函数,我将获得所需的结果。(但我不能使用用户输入字符串。)
df.loc[df['Timestamp'] < "2020-02-01"]
Out[4]:                 
     Timestamp  Error  ...  ToD  Day_Night
0    2020-01-17 00:00:00      0  ...    0      Night  
1    2020-01-17 00:10:00      0  ...    0      Night
2    2020-01-17 00:20:00      0  ...    0      Night
3    2020-01-17 00:30:00      0  ...    0      Night 
4    2020-01-17 00:40:00      0  ...    0      Night 
2154 2020-01-31 23:10:00      0  ...   23      Night  
2155 2020-01-31 23:20:00      0  ...   23      Night 
2156 2020-01-31 23:30:00      0  ...   23      Night
2157 2020-01-31 23:40:00      0  ...   23      Night 
2158 2020-01-31 23:50:00      0  ...   23      Night
[2159 rows x 37 columns]

有人知道如何在datetime列上使用query()吗?


2
我认为错误信息已经给出了线索——Timestamp是一种类型,不能与str或datetime进行比较。运行一个测试并将Timestamp名称更改为其他名称,看看代码是否能正常工作。df ['Timestamp']被pandas允许,这就是为什么它可以工作的原因,因为它不被视为一种类型,而是被视为一列。阅读警告框以获取更多信息:https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#attribute-access - sammywemmy
谢谢,那就是问题所在。重命名列之后它就可以工作了。 - bnando
1个回答

3

Timestamp 列名与内置类型 timestamp 重名。首先,你可以使用 rename() 将列重命名为其他名称:

df.rename(columns={"Timestamp": "MyTimestamp"})

以下代码可处理日期时间问题:
df.query('MyTimestamp < 20200201')

如果您想使用时间戳查询数据框:

df.query('MyTimestamp < @ts("20200201T071320")' 

谢谢您的反馈 - 发现在新版本中 @ts 语法不起作用,但是 @Timestamp 可以。 - Frank

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接