在PANDAS中显示满足特定条件的多个数据问题

Question

在PANDAS中显示满足特定条件的多个数据问题

4

我正在使用Python中的PANDAS，查看一个天气CSV文件。我能够轻松地从中提取数据。然而，我无法提取符合特定条件的数据，例如显示哪些日期的温度超过100度。

到目前为止，我的代码如下：

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt 

df = pd.read_csv('csv/weather.csv')

print(df[[df.MaxTemperatureF > 100 ]])

我认为我遇到了问题的地方在于最后一行。在完成以下步骤后，现在我得到的错误回溯如下:

Traceback (most recent call last):
File "weather.py", line 40, in <module>
print(df[df['MaxTemperatureF' > 100]])
TypeError: unorderable types: str() > int()
Mikes-MBP-2:dataframes mikecuddy$ python3 weather.py
Traceback (most recent call last):
File "weather.py", line 41, in <module>
print(df[[df.MaxTemperatureF > 100 ]])
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-     
packages/pandas/core/frame.py", line 1991, in __getitem__
return self._getitem_array(key)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-  
packages/pandas/core/frame.py", line 2028, in _getitem_array
(len(key), len(self.index)))
 ValueError: Item wrong length 1 instead of 360.

我一直在学习这个教程：http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/。希望能得到您的帮助！谢谢！

关于 df.info() 的信息：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 360 entries, 0 to 359
Data columns (total 23 columns):
PST                           360 non-null object
MaxTemperatureF               359 non-null float64
Mean TemperatureF             359 non-null float64
Min TemperatureF              359 non-null float64
Max Dew PointF                359 non-null float64
MeanDew PointF                359 non-null float64
Min DewpointF                 359 non-null float64
Max Humidity                  359 non-null float64
Mean Humidity                359 non-null float64
Min Humidity                 359 non-null float64
Max Sea Level PressureIn     359 non-null float64
Mean Sea Level PressureIn    359 non-null float64
Min Sea Level PressureIn     359 non-null float64
Max VisibilityMiles          355 non-null float64
Mean VisibilityMiles         355 non-null float64
Min VisibilityMiles          355 non-null float64
Max Wind SpeedMPH            359 non-null float64
Mean Wind SpeedMPH           359 non-null float64
Max Gust SpeedMPH            211 non-null float64
PrecipitationIn               360 non-null float64
CloudCover                   343 non-null float64
Events                       18 non-null object
WindDirDegrees               360 non-null int64
dtypes: float64(20), int64(1), object(2)
memory usage: 64.8+ KB
None

- ravenUSMC

对于第一个问题，print(df[df['MaxTemperatureF'] > 100 ])应该是你要找的。不过我不明白你的其他问题，难道你不能在筛选后调用head()函数吗？ - Wboy

最高温度可能被存储为字符串？ - mechanical_meat

是的，我得到的错误消息是：Traceback (most recent call last): File "weather.py", line 36, in <module> print(df[df['MaxTemperatureF' > 100]]) TypeError: unorderable types: str() > int() 然而不确定在哪里放置int()。 - ravenUSMC

你的MaxTemperatureF列中的值只有数字，还是有一些看起来像95F之类的值？这会影响你转换为浮点数的尝试。 - Grr

1

回复：“ValueError: Item wrong length 1 instead of 360。”。移除双括号：“df[df.MaxTemperatureF > 100]”。 - ptrj

显示剩余3条评论

2个回答

1

尝试一下：您使用的是 '()' 而不是 '[]'。

print(df[df.MaxTemperatureF.astype(float) > 100 ])

笔记：

df.isnull().sum() 
df.dropna()
df.fillna(0)

- Merlin

仍然出现 ValueError: could not convert string to float 的错误信息。 - ravenUSMC

运行：df.isnull().sum() 检查是否有空值或nan，它们可能是字符串，然后尝试使用dropna()来删除--但您可能需要使用fillna(0)来保留数据但将nan替换为“0”。 - Merlin

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mechanical_meat · Accepted Answer

对于最大温度，您可以指定一个转换函数：

df = pd.read_csv('csv/weather.csv', converters={'MaxTemperatureF':float})

编辑：如评论中@ptrj所提到的，您可以这样做来替换MaxTemperatureF列中的字符串值为np.nan：

df = pd.read_csv('csv/weather.csv', 
                 converters={'MaxTemperatureF':
                             lambda x: try: return float(x); 
                                       except ValueError: return np.nan;})

编辑2：@ptrj的解决方案，因为他不能在评论中写出来...

def my_conv(x): 
    try: 
        return float(x)
    except ValueError: 
        return np.nan

df = pd.read_csv('csv/weather.csv', converters={'MaxTemperatureF': my_conv})

其他事项：

如果csv文件的第一行是标题，则不需要传递header=0。
既然您已经有了标题，现在就不需要指定cols=...了
默认的sep是','，因此您不需要指定它。