时间序列的decompose()函数：ValueError错误：必须指定一个周期或x必须是带有非空频率的DatetimeIndex的pandas对象。

Question

时间序列的decompose()函数：ValueError错误：必须指定一个周期或x必须是带有非空频率的DatetimeIndex的pandas对象。

pythonpandasmatplotlibtime-seriesdecomposition

32

我有一些执行加性模型的问题。我有以下数据框：

当我运行这段代码时：

import statsmodels as sm
import statsmodels.api as sm
decomposition = sm.tsa.seasonal_decompose(df, model = 'additive')
fig = decomposition.plot()
matplotlib.rcParams['figure.figsize'] = [9.0,5.0]

我收到了这个消息：

ValueError: 您必须指定一个时间段，或者x必须是一个带有不为None的频率的DatetimeIndex的pandas对象

为了得到上面的示例，我应该怎么做：

我从place中获取了上面的屏幕截图。

- and_and

9个回答

11

可能的原因是您的数据存在间隙。例如：

这份数据存在间隙，会导致在seasonal_decompose()方法中出现异常

这份数据很好，所有的日期都有覆盖，不会出现任何异常{{Exception}}。

- Viacheslav Plekhanov

5

我曾经遇到过同样的问题，最终发现（至少在我的情况下）是数据集中缺失了某些数据点。例如，我有一段时间的每小时数据，在数据集的中间有2个不连续的小时数据点缺失。因此，我得到了相同的错误。当在没有任何缺失数据点的不同数据集上测试时，它可以正常工作而没有出现任何错误信息。希望这可以帮助你。这并不是一个完美的解决方案。

- Nocciolate

4

我也遇到了相同的问题。通过强制小数点位为整数，解决了这个问题。

因此，对于我的特定情况，我使用了以下方法。

decompose_result = seasonal_decompose(df.Sales, model='multiplicative', period=1)
decompose_result.plot();

其中df.Sales是一个Pandas系列，相邻两个元素之间的步长为1。

PS. 通过输入seasonal_decompose?命令可以找到seasonal_decompose()函数的详细信息。您将获得以下详细信息。请查看每个参数的详细信息。

**Signature:**
seasonal_decompose(
    x,
    model='additive',
    filt=None,
    period=None,
    two_sided=True,
    extrapolate_trend=0,
)

- SJa

3

我遇到了同样的错误，原因是一些日期丢失了。这里的快速修复方法就是添加这些日期并加上默认值。

在使用默认值时要小心

如果您的模型是可加的，则可以为0
如果您的模型不可加，则不能为0，因此可以使用1

代码如下：

from datetime import date, timedelta
import pandas as pd

#Start date and end_date
start_date = pd.to_datetime("2019-06-01")
end_date = pd.to_datetime("2021-08-20") - timedelta(days=1) #Excluding last

#List of all dates
all_date = pd.date_range(start_date, end_date, freq='d')

#Left join your main data on dates data
all_date_df = pd.DataFrame({'date':all_date})
tdf = df.groupby('date', as_index=False)['session_count'].sum()
tdf = pd.merge(all_date_df, tdf, on='date', how="left")
tdf.fillna(0, inplace=True)

- HimanshuGahlot

3

我猜你忘记了引入period并将其传递给seasonal_decompose()的freq参数。这就是为什么它抛出以下ValueError的原因：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-9b030cf1055e> in <module>()
----> 1 decomposition = sm.tsa.seasonal_decompose(df, model = 'additive')
      2 decompose_result.plot()

/usr/local/lib/python3.7/dist-packages/statsmodels/tsa/seasonal.py in seasonal_decompose(x, model, filt, freq, two_sided, extrapolate_trend)
    125             freq = pfreq
    126         else:
--> 127             raise ValueError("You must specify a freq or x must be a "
    128                              "pandas object with a timeseries index with "
    129                              "a freq not set to None")

ValueError: You must specify a freq or x must be a pandas object with a time-series index with a freq not set to None

注意：由于此模块的最近更新，可能没有period参数可用。如果您在seasonal_decompose()中使用period参数，则会遇到以下TypeError错误：

TypeError: seasonal_decompose() got an unexpected keyword argument 'period'

请按照以下脚本操作：

# import libraries
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.tsa.seasonal import seasonal_decompose
 
# Generate time-series data
total_duration = 100
step = 0.01
time = np.arange(0, total_duration, step)
 
# Period of the sinusoidal signal in seconds
T= 15
 
# Period component
series_periodic = np.sin((2*np.pi/T)*time)
 
# Add a trend component
k0 = 2
k1 = 2
k2 = 0.05
k3 = 0.001
 
series_periodic = k0*series_periodic
series_trend    = k1*np.ones(len(time))+k2*time+k3*time**2
series          = series_periodic+series_trend 

# Set frequency using period in seasonal_decompose()
period = int(T/step)
results = seasonal_decompose(series, model='additive', freq=period)

trend_estimate    = results.trend
periodic_estimate = results.seasonal
residual          = results.resid
 
# Plot the time-series componentsplt.figure(figsize=(14,10))
plt.subplot(221)
plt.plot(series,label='Original time series', color='blue')
plt.plot(trend_estimate ,label='Trend of time series' , color='red')
plt.legend(loc='best',fontsize=20 , bbox_to_anchor=(0.90, -0.05))
plt.subplot(222)
plt.plot(trend_estimate,label='Trend of time series',color='blue')
plt.legend(loc='best',fontsize=20, bbox_to_anchor=(0.90, -0.05))
plt.subplot(223)
plt.plot(periodic_estimate,label='Seasonality of time series',color='blue')
plt.legend(loc='best',fontsize=20, bbox_to_anchor=(0.90, -0.05))
plt.subplot(224)
plt.plot(residual,label='Decomposition residuals of time series',color='blue')
plt.legend(loc='best',fontsize=20, bbox_to_anchor=(1.09, -0.05))
plt.tight_layout()
plt.savefig('decomposition.png')

绘制时间序列组件：

如果您正在使用 pandas 数据框：

# import libraries
import numpy as np
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose

# Generate some data
np.random.seed(0)
n = 1500
dates = np.array('2020-01-01', dtype=np.datetime64) + np.arange(n)
data = 12*np.sin(2*np.pi*np.arange(n)/365) + np.random.normal(12, 2, 1500)

#=================> Approach#1 <==================
# Set period after building dataframe
df = pd.DataFrame({'data': data}, index=dates)

# Reproduce the OP's example  
seasonal_decompose(df['data'], model='additive', freq=15).plot()

#=================> Approach#2 <==================
# create period once you create pandas dataframe by asfreq() after set dates as index
df = pd.DataFrame({'data': data,}, index=dates).asfreq('D').dropna()

# Reproduce the example for OP
seasonal_decompose(df , model='additive').plot()

- Mario

您说period参数可能已被删除，但实际上并没有。在最新版本的statsmodels v0.13.2中，它在参数列表中，网址为https://www.statsmodels.org/stable/generated/statsmodels.tsa.seasonal.seasonal_decompose.html。然而，如何使用它仍是一个未解之谜，请参见[statsmodels seasonal_decompose(): What is the right "period of the series" in the context of a list column (constant vs. varying number of items)](https://stats.stackexchange.com/questions/482089/statsmodels-seasonal-decompose-what-is-the-right-period-of-the-series-in-th)。 - questionto42

0

我最近使用了“Prophet”软件包。

它曾经被称为“FBProphet”，但由于某些原因，他们删除了FB（FaceBook）部分。

在Windows PC上安装有点困难（在这种情况下，您需要miniconda来安装它）。

但是，一旦安装完成，它非常用户友好，只需要1行代码就可以像魔术一样工作！还能够分解季节性，提供n％的准确度图表，并进行预测。

如果您想要的话，它也可以轻松地考虑假期，这是预先构建到软件包中的。

关于此软件包有很多YouTube视频。

https://github.com/facebook/prophet/tree/main/python

- Cornelis

0

decomposition = sm.tsa.seasonal_decompose(df, model = 'additive', period=7)

如果你知道滞后问题，那么加上周期会解决这个问题。如果我们查看seasonal_decompose函数的描述，我们可以发现周期被设置为None。

- Aaditya Bhardwaj

-1

为了解决这个问题，我执行了sort_index，上面的代码就可以工作了。

df.sort_index(inplace= True)

- and_and

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- questionto42 · Accepted Answer

这只是我的一些测试和个人研究的结果，没有声称完整或专业。请评论或回答谁发现有问题。

当然，您的数据应该按照索引值的正确顺序排列，您可以使用df.sort_index（inplace = True）来确保它，就像您在答案中所述的那样。这本质上并没有错，尽管错误消息与排序顺序无关，并且我已经检查过了：当我对手头的大型数据集的索引进行排序时，错误并没有消失。当然，我也必须对df.index进行排序，但decompose（）也可以处理未排序的数据，其中项目在时间上跳来跳去：然后您只会从左到右看到很多蓝线，直到整个图形充满它。更重要的是，通常情况下，排序已经按正确顺序进行了。在我的情况下，排序无法帮助修复错误。因此，我也怀疑索引排序是否在您的情况下修复了错误，因为：错误实际上是什么？ ValueError：您必须指定：

[要么]一个周期
或x必须是具有未设置为None的DatetimeIndex的pandas对象

首先，在情况下您有一个列表列，使得您的时间序列到目前为止嵌套了起来，请参见将具有“列表列”的pandas df转换为长格式中的时间序列。使用三列：[数据列表]+[时间戳]+[持续时间]以获取有关如何取消嵌套列表列的详细信息。这对于1.)和2.)都是必需的。

1.的详细信息：“您必须指定[要么]一个周期…”

周期的定义 从https://www.statsmodels.org/stable/generated/statsmodels.tsa.seasonal.seasonal_decompose.html中的“period，int，可选”：

系列的周期。如果x不是pandas对象或x的索引没有频率，则必须使用。如果x是具有时间序列索引的pandas对象，则覆盖x的默认周期性。

使用整数设置的周期参数表示您希望在数据中出现的循环次数。如果您有一个包含1000行和一个列表列（称为df_nested）的df，每个列表例如有100个元素，则每个循环将有100个元素。最好采用period = len(df_nested)（=循环数）以获得最佳季节性和趋势拆分。如果每个循环的元素随时间变化，则可能有更好的其他值。 “我不确定如何正确设置参数，因此还有一个未回答的问题statsmodels seasonal_decompose()：在列表列上下文中，系列的正确“周期”是什么在Cross Validated上。”

选项1）的“周期”参数比选项2）具有更大的优势。虽然它使用时间索引（DatetimeIndex）作为x轴，但与选项2）相比，它不需要每个项目完全符合频率。相反，它只是将一行中的任何内容连接在一起，这样做的好处是您不需要填补任何间隙：前一个事件的最后一个值只是与以下事件的下一个值连接在一起，无论它是否已经在下一个秒钟或下一天。

最大的“周期”值是多少？如果您有一个列表列（再次将df称为“df_nested”），则应首先将列表列解压缩为普通列。最大周期为len(df_unnested)/2。

例如1：x中有20个项目（x是df_unnested的所有项目的数量），最大可以有period = 10。

例如2：有20个项目，并采用period=20，则会出现以下错误：

ValueError: x must have 2 complete cycles requires 40 observations. x only has 20 observation(s)

另一个副注：要消除问题中的错误，period = 1应该已经解决了问题，但对于时间序列分析，“=1”不会揭示任何新信息，每个周期只有1个项目，则趋势与原始数据相同，季节性为0，残差始终为0。

####

示例借鉴自将包含在“列表列”中的数据的pandas df转换为长格式的时间序列。使用三列：[数据列表] + [时间戳] + [持续时间]

df_test = pd.DataFrame({'timestamp': [1462352000000000000, 1462352100000000000, 1462352200000000000, 1462352300000000000],
                'listData': [[1,2,1,9], [2,2,3,0], [1,3,3,0], [1,1,3,9]],
                'duration_sec': [3.0, 3.0, 3.0, 3.0]})
tdi = pd.DatetimeIndex(df_test.timestamp)
df_test.set_index(tdi, inplace=True)
df_test.drop(columns='timestamp', inplace=True)
df_test.index.name = 'datetimeindex'

df_test = df_test.explode('listData') 
sizes = df_test.groupby(level=0)['listData'].transform('size').sub(1)
duration = df_test['duration_sec'].div(sizes)
df_test.index += pd.to_timedelta(df_test.groupby(level=0).cumcount() * duration, unit='s')

得到的 df_test['listData'] 如下所示：

2016-05-04 08:53:20    1
2016-05-04 08:53:21    2
2016-05-04 08:53:22    1
2016-05-04 08:53:23    9
2016-05-04 08:55:00    2
2016-05-04 08:55:01    2
2016-05-04 08:55:02    3
2016-05-04 08:55:03    0
2016-05-04 08:56:40    1
2016-05-04 08:56:41    3
2016-05-04 08:56:42    3
2016-05-04 08:56:43    0
2016-05-04 08:58:20    1
2016-05-04 08:58:21    1
2016-05-04 08:58:22    3
2016-05-04 08:58:23    9

现在看一下不同期间的整数值。

period = 1:

result_add = seasonal_decompose(x=df_test['listData'], model='additive', extrapolate_trend='freq', period=1)
plt.rcParams.update({'figure.figsize': (5,5)})
result_add.plot().suptitle('Additive Decompose', fontsize=22)
plt.show()

周期 = 2:

result_add = seasonal_decompose(x=df_test['listData'], model='additive', extrapolate_trend='freq', period=2)
plt.rcParams.update({'figure.figsize': (5,5)})
result_add.plot().suptitle('Additive Decompose', fontsize=22)
plt.show()

如果将所有项目的四分之一视为一个周期，这里有4个（16个项目中的4个）。

period = 4：

result_add = seasonal_decompose(x=df_test['listData'], model='additive', extrapolate_trend='freq', period=int(len(df_test)/4))
plt.rcParams.update({'figure.figsize': (5,5)})
result_add.plot().suptitle('Additive Decompose', fontsize=22)
plt.show()

或者，如果您将一个循环的最大可能大小设置为8（在16个项目中）。

周期 = 8：

result_add = seasonal_decompose(x=df_test['listData'], model='additive', extrapolate_trend='freq', period=int(len(df_test)/2))
plt.rcParams.update({'figure.figsize': (5,5)})
result_add.plot().suptitle('Additive Decompose', fontsize=22)
plt.show()

看一下y轴如何改变它们的刻度。

####

根据您的需求，您将增加周期整数。在您所提出的问题中，最大值为{{max}}。

sm.tsa.seasonal_decompose(df, model = 'additive', period = int(len(df)/2))

2.的详细信息："...或x必须是带有未设置为None的频率的DatetimeIndex的pandas对象"

要将x设置为具有未设置为None的频率的DatetimeIndex，您需要使用.asfreq('?')来分配DatetimeIndex的频率，其中?是您在https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases中可选的范围内的偏移别名。

在您的情况下，此选项2.更适合您，因为您似乎有一个没有间隙的列表。然后，您的月度数据可能应该被引入为“月度开始频率”--> “MS”作为偏移别名：

sm.tsa.seasonal_decompose(df.asfreq('MS'), model = 'additive')

请参阅如何使用pd.to_datetime()设置频率？以获取更多详细信息，还包括如何处理间隙。

如果您的数据在时间上高度分散，因此有太多间隙需要填充，或者时间间隙对您来说并不重要，则使用“period”选项可能是更好的选择。

在我的示例df_test中，选项2.不好。数据在时间上完全分散，如果我将秒作为频率，您会得到以下结果：

df_test.asfreq('s') 的输出（=以秒为频率）：

2016-05-04 08:53:20      1
2016-05-04 08:53:21      2
2016-05-04 08:53:22      1
2016-05-04 08:53:23      9
2016-05-04 08:53:24    NaN
                      ...
2016-05-04 08:58:19    NaN
2016-05-04 08:58:20      1
2016-05-04 08:58:21      1
2016-05-04 08:58:22      3
2016-05-04 08:58:23      9
Freq: S, Name: listData, Length: 304, dtype: object

你可以看到，尽管我的数据只有16行，但是引入秒频率会导致df只有304行，从"08:53:20"到"08:58:23"，这里产生了288个间隙。此外，你必须准确命中时间。如果你的实际频率是0.1甚至0.12314秒，你将无法用索引命中大多数项目。

这里以min为偏移别名的示例： df_test.asfreq('min')

2016-05-04 08:53:20      1
2016-05-04 08:54:20    NaN
2016-05-04 08:55:20    NaN
2016-05-04 08:56:20    NaN
2016-05-04 08:57:20    NaN
2016-05-04 08:58:20      1

我们可以看到只有第一和最后一分钟被填充了，其余时间没有命中。

将日期作为偏移别名，df_test.asfreq('d'):

2016-05-04 08:53:20    1

我们可以看到，由于只覆盖了一天，所以只得到了第一行作为结果df，它将给出找到的第一项，其余部分将被删除。

一切的结束

综合所有这些，在您的情况下，请选择选项2，而在我的示例df_test案例中，则需要选项1。