如何检测具有趋势和季节性的时间序列数据中的异常情况？

Question

如何检测具有趋势和季节性的时间序列数据中的异常情况？

pythonmachine-learningtime-seriesanomaly-detection

7

我希望检测包含趋势和季节性成分的“时间序列数据”中的异常值。我想忽略季节性峰值，只考虑其他峰值并将它们标记为异常值。由于我对时间序列分析不熟悉，请协助我解决这个时间序列问题。

使用的编程平台是Python。

尝试1：使用ARIMA模型

我已经训练了我的模型并对测试数据进行了预测。然后，能够计算预测结果与测试数据的实际值之间的差异，根据观察到的方差找出异常值。

应用自动Arima模型

!pip install pyramid-arima
from pyramid.arima import auto_arima
stepwise_model = auto_arima(train_log, start_p=1, start_q=1,max_p=3, max_q=3,m=7,start_P=0, seasonal=True,d=1, D=1, trace=True,error_action='ignore', suppress_warnings=True,stepwise=True)

import math
import statsmodels.api as sm
import statsmodels.tsa.api as smt
from sklearn.metrics import mean_squared_error

将数据分为训练集和测试集

train, test = actual_vals[0:-70], actual_vals[-70:]

对数变换

train_log, test_log = np.log10(train), np.log10(test)

转换为列表

history = [x for x in train_log]
predictions = list()
predict_log=list()

拟合逐步ARIMA模型

for t in range(len(test_log)):
stepwise_model.fit(history)
    output = stepwise_model.predict(n_periods=1)
    predict_log.append(output[0])
    yhat = 10**output[0]
    predictions.append(yhat)
    obs = test_log[t]
    history.append(obs)

绘图

figsize=(12, 7)
plt.figure(figsize=figsize)
pyplot.plot(test,label='Actuals')
pyplot.plot(predictions, color='red',label='Predicted')
pyplot.legend(loc='upper right')
pyplot.show()

但是我只能在测试数据中检测到异常值。实际上，我需要检测整个时间序列数据，包括我已经拥有的训练数据中的异常值。

尝试2：使用季节性分解

我已经使用下面的代码将原始数据分解为季节性、趋势和残差，并可以在下面的图像中看到。

from statsmodels.tsa.seasonal import seasonal_decompose

decomposed = seasonal_decompose()

我现在正在使用剩余数据通过箱线图查找异常值，因为已经删除了季节性和趋势分量。这样做有意义吗？

或者说还有其他更简单或更好的方法可以使用吗？

- Raja Sahe S

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Dor · Accepted Answer

你可以：

在第四张图（残差图）中，尝试检查极值点，这可能会导致季节序列中的异常情况。
如果有一些标记数据，您可以进行监督式分类。
非监督式: 尝试预测下一个值，并创建置信区间以检查预测是否在其中。
您可以尝试计算数据的相对极值。例如，使用 argrelextrema 如下所示:

from scipy.signal import argrelextrema
x = np.array([2, 1, 2, 3, 2, 0, 1, 0]) 
argrelextrema(x, np.greater)

输出：

(array([3, 6]),)

这是一些随机数据（我对上述argrelextrema的实现）：