Matlab的交叉相关函数xcorr(x,y,maxlags)
有一个选项maxlag
,它返回跨度范围[-maxlags:maxlags]
内的交叉相关序列。Numpy的numpy.correlate(N,M,mode)
有三种模式,但它们都不允许我设置特定的滞后,这与完整的(N+M-1)
、相同的(max(M,N))
或有效的(max(M,N)-min(M,N)+1)
不同。对于len(N) = 60000
和len(M) = 200
,我想将滞后设置为100。
Matlab的交叉相关函数xcorr(x,y,maxlags)
有一个选项maxlag
,它返回跨度范围[-maxlags:maxlags]
内的交叉相关序列。Numpy的numpy.correlate(N,M,mode)
有三种模式,但它们都不允许我设置特定的滞后,这与完整的(N+M-1)
、相同的(max(M,N))
或有效的(max(M,N)-min(M,N)+1)
不同。对于len(N) = 60000
和len(M) = 200
,我想将滞后设置为100。
matplotlib.xcorr
有一个maxlags参数。它实际上是numpy.correlate
的包装器,所以没有性能提升。尽管如此,它给出了与Matlab的交叉相关函数完全相同的结果。下面我编辑了来自maxplotlib的代码,使其仅返回相关性。原因是如果我们使用matplotlib.corr
,它还会返回绘图。问题是,如果我们将复杂数据类型作为参数传递给它,当matplotlib尝试绘制图时,我们将收到“将复杂类型转换为实数数据类型”的警告。<!-- language: python -->
import numpy as np
import matplotlib.pyplot as plt
def xcorr(x, y, maxlags=10):
Nx = len(x)
if Nx != len(y):
raise ValueError('x and y must be equal length')
c = np.correlate(x, y, mode=2)
if maxlags is None:
maxlags = Nx - 1
if maxlags >= Nx or maxlags < 1:
raise ValueError('maxlags must be None or strictly positive < %d' % Nx)
c = c[Nx - 1 - maxlags:Nx + maxlags]
return c
这是我对于先导-滞后相关性的实现,但它只适用于1维数据,并不能保证在效率上是最佳的。它使用scipy.stats.pearsonr来进行核心计算,因此也返回系数的p值。请根据这个初始版本进行修改以优化。
def lagcorr(x,y,lag=None,verbose=True):
'''Compute lead-lag correlations between 2 time series.
<x>,<y>: 1-D time series.
<lag>: lag option, could take different forms of <lag>:
if 0 or None, compute ordinary correlation and p-value;
if positive integer, compute lagged correlation with lag
upto <lag>;
if negative integer, compute lead correlation with lead
upto <-lag>;
if pass in an list or tuple or array of integers, compute
lead/lag correlations at different leads/lags.
Note: when talking about lead/lag, uses <y> as a reference.
Therefore positive lag means <x> lags <y> by <lag>, computation is
done by shifting <x> to the left hand side by <lag> with respect to
<y>.
Similarly negative lag means <x> leads <y> by <lag>, computation is
done by shifting <x> to the right hand side by <lag> with respect to
<y>.
Return <result>: a (n*2) array, with 1st column the correlation
coefficients, 2nd column correpsonding p values.
Currently only works for 1-D arrays.
'''
import numpy
from scipy.stats import pearsonr
if len(x)!=len(y):
raise('Input variables of different lengths.')
#--------Unify types of <lag>-------------
if numpy.isscalar(lag):
if abs(lag)>=len(x):
raise('Maximum lag equal or larger than array.')
if lag<0:
lag=-numpy.arange(abs(lag)+1)
elif lag==0:
lag=[0,]
else:
lag=numpy.arange(lag+1)
elif lag is None:
lag=[0,]
else:
lag=numpy.asarray(lag)
#-------Loop over lags---------------------
result=[]
if verbose:
print '\n#<lagcorr>: Computing lagged-correlations at lags:',lag
for ii in lag:
if ii<0:
result.append(pearsonr(x[:ii],y[-ii:]))
elif ii==0:
result.append(pearsonr(x,y))
elif ii>0:
result.append(pearsonr(x[ii:],y[:-ii]))
result=numpy.asarray(result)
return result