我想要对EPA PFAS环境媒体采样数据进行可视化处理。我将使用pandas和matplotlib进行操作。以下是我的代码:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import csv
pd.set_option('display.max_columns', 500)
inputpath="CHI"
col_for_analysis=["Environmental Media Name", "Year", "Result Measure Value (ppt)"]
dataset=pd.read_csv(inputpath,sep=',', dtype={'a': str}, usecols= col_for_analysis, low_memory=False)
dataset.sort_values(by=["Year"], ascending=True, inplace=True)
print(dataset)
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].fillna(0, inplace=True)
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].astype(int)
目前的最终目标是按年份排序,然后将“Year”列放在x轴上,将“Result Measure Value (ppt)”列放在y轴上进行绘图。当我最初尝试时,出现了错误消息,指示“Result Measure Value (ppt)”列包含NoneType值,因此matplotlib无法对其进行绘图。没关系,我想,我只需使用
dataset ["Result Measure Value (ppt)"] = dataset ["Result Measure Value (ppt)"].fillna(0,inplace=True)
来删除这些NoneType值,并用一个好的、可绘制的0
替换它们。看起来这样行得通。所以我接着尝试将该列中的所有值更改为int值,以便matplotlib可以绘制它们。我尝试通过添加以下代码行来实现:
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].astype(int)
这行代码会抛出以下相当冗长的错误消息:Traceback (most recent call last):
File "main.py", line 18, in <module>
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].astype(int)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/generic.py", line 5912, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 419, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 304, in apply
applied = getattr(b, f)(**kwargs)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 580, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1292, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1237, in astype_array
values = astype_nansafe(values, dtype, copy=copy)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1154, in astype_nansafe
return lib.astype_intsafe(arr, dtype)
File "pandas/_libs/lib.pyx", line 668, in pandas._libs.lib.astype_intsafe
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
现在,我认为这行代码的意思是:
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].fillna(0, inplace=True)
我希望通过将“结果测量值(ppt)”列中的所有NoneType值替换为0来摆脱它们。如果我想错了,请问怎么才能摆脱此列中的NoneType值或以其他方式将该列中的所有值转换为可用于与年份一起绘制图表的值?否则,我该如何修复代码以便将该列中的所有值转换为int再进行绘制?谢谢!