我需要对一组数字进行自相关,据我所知这只是该集合与其自身的相关性。
我已经尝试使用NumPy的correlate函数,但我不相信结果,因为它几乎总是给出一个向量,其中第一个数字不是最大值,而它应该是。
因此,这个问题实际上有两个问题:
numpy.correlate
到底在做什么?- 我如何使用它(或其他东西)来进行自相关?
我需要对一组数字进行自相关,据我所知这只是该集合与其自身的相关性。
我已经尝试使用NumPy的correlate函数,但我不相信结果,因为它几乎总是给出一个向量,其中第一个数字不是最大值,而它应该是。
因此,这个问题实际上有两个问题:
numpy.correlate
到底在做什么?def autocorrelate(x, period):
# x is a deep indicator array
# period of sample and slices of comparison
# oldest data (period of input array) may be nan; remove it
x = x[-np.count_nonzero(~np.isnan(x)):]
# subtract mean to normalize indicator
x -= np.mean(x)
# isolate the recent sample to be autocorrelated
sample = x[-period:]
# create slices of indicator data
correls = []
for n in range((len(x)-1), period, -1):
alpha = period + n
slices = (x[-alpha:])[:period]
# compare each slice to the recent sample
correls.append(ta.CORREL(slices, sample, period)[-1])
# fill in zeros for sample overlap period of recent correlations
for n in range(period,0,-1):
correls.append(0)
# oldest data (autocorrelation period) will be nan; remove it
correls = np.array(correls[-np.count_nonzero(~np.isnan(correls)):])
return correls
# CORRELATION OF BEST FIT
# the highest value correlation
max_value = np.max(correls)
# index of the best correlation
max_index = np.argmax(correls)
我认为对于OP的问题,真正的答案简洁地包含在Numpy.correlate文档的这个摘录中:
mode : {'valid', 'same', 'full'}, optional
Refer to the `convolve` docstring. Note that the default
is `valid`, unlike `convolve`, which uses `full`.
import matplotlib.pyplot as plt
def plot_autocorr(returns, lags):
autocorrelation = []
for lag in range(lags+1):
corr_lag = returns.corr(returns.shift(-lag))
autocorrelation.append(corr_lag)
plt.plot(range(lags+1), autocorrelation, '--o')
plt.xticks(range(lags+1))
return np.array(autocorrelation)
autocorrelation_plot()
呢?(参见https://stats.stackexchange.com/questions/357300/what-does-pandas-autocorrelation-graph-show) - Qaswed使用numpy实现IDL的a_correlate函数,需要运行np.correlate
函数,并将mode="full"
和n-1
添加到lag
数组中。
def a_correlate(y, lag):
y = np.asarray(y)
lag = np.asarray(lag)
n = len(y)
yunbiased = y - np.mean(y)
ynorm = np.sum(yunbiased**2)
r = np.correlate(yunbiased, yunbiased, "full") / ynorm
return r[lag + (n - 1)]
示例(基于上面链接的IDL文档页面中的示例):
# Define an n-element sample population:
X = np.array([3.73, 3.67, 3.77, 3.83, 4.67, 5.87, 6.70, 6.97, 6.40, 5.57])
# Compute the autocorrelation of X for LAG = -3, 0, 1, 3, 4, 8:
lag = [-3, 0, 1, 3, 4, 8]
result = a_correlate(X, lag)
print(result)
# prints: [ 0.01461851 1. 0.81087925 0.01461851 -0.32527914 -0.15168379]