Python中整数过大无法转换为C长整型 - 绘制pandas日期

3
我已通过一种在此帖子的 接受答案中展示的方法找到最接近的时间戳将两个数据集合并:

pandas.merge: match the nearest time stamp >= the series of timestamps

然而,当我尝试绘制结果时,遇到了错误:

`<matplotlib.collections.LineCollection at 0x2975a3547b8>Traceback (most recent call last):

File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\formatters.py", line 332, in __call__
return printer(obj)

File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\pylabtools.py", line 237, in <lambda>
png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs))

File "C:\ProgramData\Anaconda3\lib\site-packages\IPython\core\pylabtools.py", line 121, in print_figure
fig.canvas.print_figure(bytes_io, **kw)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\backend_bases.py", line 2208, in print_figure
**kwargs)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 507, in print_png
FigureCanvasAgg.draw(self)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py", line 430, in draw
self.figure.draw(self.renderer)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\figure.py", line 1295, in draw
renderer, self, artists, self.suppressComposite)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\image.py", line 138, in _draw_list_compositing_images
a.draw(renderer)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 2399, in draw
mimage._draw_list_compositing_images(renderer, self, artists)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\image.py", line 138, in _draw_list_compositing_images
a.draw(renderer)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
return draw(artist, renderer, *args, **kwargs)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axis.py", line 1133, in draw
ticks_to_draw = self._update_ticks(renderer)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axis.py", line 974, in _update_ticks
tick_tups = list(self.iter_ticks())

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\axis.py", line 917, in iter_ticks
majorLocs = self.major.locator()

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py", line 1054, in __call__
self.refresh()

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py", line 1074, in refresh
dmin, dmax = self.viewlim_to_dt()

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py", line 832, in viewlim_to_dt
return num2date(vmin, self.tz), num2date(vmax, self.tz)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py", line 441, in num2date
return _from_ordinalf(x, tz)

File "C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py", line 256, in _from_ordinalf
dt = datetime.datetime.fromordinal(ix).replace(tzinfo=UTC)

OverflowError: Python int too large to convert to C long

我复制并粘贴了帖子中被接受的答案,但遇到了同样的错误。

我的数据看起来像这样(已经合并):

cm_time_4           log_time_1
2017-06-25 10:30:35 2017-06-25 10:30:31
2017-06-25 10:50:35 2017-06-25 10:50:31
2017-06-25 11:10:35 2017-06-25 11:10:31
2017-06-25 11:30:35 2017-06-25 11:30:31
2017-06-25 11:50:35 2017-06-25 11:50:31
2017-06-25 12:10:35 2017-06-25 12:10:31
2017-06-25 12:30:35 2017-06-25 12:30:31
2017-06-25 12:50:35 2017-06-25 12:50:31
2017-06-25 13:10:35 2017-06-25 13:10:31
2017-06-25 13:30:35 2017-06-25 13:30:31
2017-06-25 13:50:35 2017-06-25 13:50:31

我的代码看起来像这样:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.io.netcdf as netcdf

readcsv = pd.read_csv(filename,parse_dates={'timestamp':['date','time']},index_col=['timestamp'])

# round off times to the nearest second
log_time = readcsv.index.round('1s')

fh = netcdf.netcdf_file(nc, mmap=False)
cm_time = fh.variables['time'][:]
ref_time = pd.datetime(year=2017, month=6, day=25, hour=10, minute=30, second=35) # Reference time
cm_time_2 = [ref_time + pd.Timedelta(minutes=np.float(i)) for i in cm_time] # Add seconds to reference time
cm_time_3 = pd.to_datetime(cm_time_2)

idx = np.searchsorted(log_time, cm_time_3) - 1
mask = idx >= 0
df = pd.DataFrame({"log_time_1":log_time[idx][mask], "cm_time_4":cm_time_3[mask]}) 

# Plot
plt.figure(figsize=(18, 4))
plt.vlines(pd.Series(log_time),0,1,colors="g")
plt.vlines(df.log_time_1, 0.3, 0.7, colors="r", lw=2)
plt.vlines(df.cm_time_4, 0.3, 0.7, colors="b", lw=2)

我正在Windows 10上使用Python 3.6。

如何解决这个错误?

非常感谢。


错误信息中还有更多内容吗?比如行号? - ShpielMeister
@ShpielMeister,我已经添加了完整的错误信息,希望这有所帮助? - Jetman
当您执行 plt.vlines(df.cm_time_4, 0.3, 0.7, colors="b", lw=2) 时,是否立即出现错误? - ShpielMeister
@ShpielMeister,所有的图表都出现了错误,我已经尝试了它们中的每一个,但是都得到了相同的错误。 - Jetman
vlines期望“标量或1D数组样式的x索引,用于绘制线条。” cm_time_4log_time_1可能不是正确的格式。 - ShpielMeister
@ShpielMeister,你知道我怎么能解决这个问题吗?或者有没有其他示例的链接可以提供? - Jetman
3个回答

3

pandas的日期时间大小为64位,这允许具有纳秒(默认值)分辨率的日期时间。然而,matplotlib使用Python datetime模块,它只能处理具有毫秒分辨率并存储为32位的日期时间。当matplotlib尝试使用您的日期时,它会将它们转换为内置的Python datetime。这种转换无法正常工作并引发异常。

因此,您需要将日期时间转换为32位表示形式。请按以下方式修改绘图代码:

from datetime import datetime

log_time_py = [datetime.fromtimestamp(dt.timestamp()) for dt in log_time]
log_time_1_py = [datetime.fromtimestamp(dt.timestamp()) for dt in df.log_time_1]
cm_time_4_py = [datetime.fromtimestamp(dt.timestamp()) for dt in df.cm_time_4]

# Plot
plt.figure(figsize=(18, 4))
plt.vlines(log_time_py,0,1,colors="g")
plt.vlines(log_time_1_py, 0.3, 0.7, colors="r", lw=2)
plt.vlines(cm_time_4_py, 0.3, 0.7, colors="b", lw=2)

2

0

这是一个困扰我很长时间的问题,即使有上面的建议,我也无法解决。记录一下,我正在制作带有色条的时间序列散点图。

为了制作这些图,我不得不升级到matplotlib 2.1.0,然后使用以下方法将datetime64[ns]转换为ndarray

x = df['datecolumn'].values

然后我可以立即绘制:

cm = plt.cm.get_cmap('RdYlBu')
sc = plt.scatter(x, y, c=z, vmin=df['z'].min(), vmax=df['z'].max(), 
s=2, cmap=cm)
plt.colorbar(sc)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接