如何使用numpy.correlate进行自相关?

133

我需要对一组数字进行自相关,据我所知这只是该集合与其自身的相关性。

我已经尝试使用NumPy的correlate函数,但我不相信结果,因为它几乎总是给出一个向量,其中第一个数字不是最大值,而它应该是。

因此,这个问题实际上有两个问题:

  1. numpy.correlate 到底在做什么?
  2. 我如何使用它(或其他东西)来进行自相关?

请参阅以下链接以获取有关标准化自相关的NumPy自相关函数的信息:https://dev59.com/0WjWa4cB1Zd3GeqPmw_0。 - amcnabb
14个回答

1
我使用 talib.CORREL 进行自相关,你也可以使用其他软件包进行相同的操作。
def autocorrelate(x, period):

    # x is a deep indicator array 
    # period of sample and slices of comparison

    # oldest data (period of input array) may be nan; remove it
    x = x[-np.count_nonzero(~np.isnan(x)):]
    # subtract mean to normalize indicator
    x -= np.mean(x)
    # isolate the recent sample to be autocorrelated
    sample = x[-period:]
    # create slices of indicator data
    correls = []
    for n in range((len(x)-1), period, -1):
        alpha = period + n
        slices = (x[-alpha:])[:period]
        # compare each slice to the recent sample
        correls.append(ta.CORREL(slices, sample, period)[-1])
    # fill in zeros for sample overlap period of recent correlations    
    for n in range(period,0,-1):
        correls.append(0)
    # oldest data (autocorrelation period) will be nan; remove it
    correls = np.array(correls[-np.count_nonzero(~np.isnan(correls)):])      

    return correls

# CORRELATION OF BEST FIT
# the highest value correlation    
max_value = np.max(correls)
# index of the best correlation
max_index = np.argmax(correls)

0

我认为对于OP的问题,真正的答案简洁地包含在Numpy.correlate文档的这个摘录中:

mode : {'valid', 'same', 'full'}, optional
    Refer to the `convolve` docstring.  Note that the default
    is `valid`, unlike `convolve`, which uses `full`.

这意味着,当Numpy.correlate函数在没有“mode”定义的情况下使用时,如果将相同的向量用作其两个输入参数(即执行自相关时),它将返回一个标量。

0
绘制给定 pandas datetime Series 的统计自相关性,与编程有关。
import matplotlib.pyplot as plt

def plot_autocorr(returns, lags):
    autocorrelation = []
    for lag in range(lags+1):
        corr_lag = returns.corr(returns.shift(-lag)) 
        autocorrelation.append(corr_lag)
    plt.plot(range(lags+1), autocorrelation, '--o')
    plt.xticks(range(lags+1))
    return np.array(autocorrelation)

为什么在这种情况下不使用autocorrelation_plot()呢?(参见https://stats.stackexchange.com/questions/357300/what-does-pandas-autocorrelation-graph-show) - Qaswed

0

使用numpy实现IDL的a_correlate函数,需要运行np.correlate函数,并将mode="full"n-1添加到lag数组中。

def a_correlate(y, lag):
    y = np.asarray(y)
    lag = np.asarray(lag)
    n = len(y)
    yunbiased = y - np.mean(y)
    ynorm = np.sum(yunbiased**2)
    r = np.correlate(yunbiased, yunbiased, "full") / ynorm
    return r[lag + (n - 1)]

示例(基于上面链接的IDL文档页面中的示例):

# Define an n-element sample population:
X = np.array([3.73, 3.67, 3.77, 3.83, 4.67, 5.87, 6.70, 6.97, 6.40, 5.57])
# Compute the autocorrelation of X for LAG = -3, 0, 1, 3, 4, 8:
lag = [-3, 0, 1, 3, 4, 8]
result = a_correlate(X, lag)
print(result)
# prints: [ 0.01461851  1.          0.81087925  0.01461851 -0.32527914 -0.15168379]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接