我想在Python 3x中应用Henze-Zirkler的多元正态性检验,并想知道是否可以在Jupyter笔记本中使用Python进行操作。
我已经对我的数据拟合了VAR模型,现在想要测试该拟合VAR模型的残差是否服从正态分布。
请问我可以如何在Jupyter笔记本中使用Python实现?
我想在Python 3x中应用Henze-Zirkler的多元正态性检验,并想知道是否可以在Jupyter笔记本中使用Python进行操作。
我已经对我的数据拟合了VAR模型,现在想要测试该拟合VAR模型的残差是否服从正态分布。
请问我可以如何在Jupyter笔记本中使用Python实现?
import rpy2.robjects as robjects
from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
from rpy2.robjects.packages import importr
import numpy as np
# Create data
resi = pd.DataFrame(np.random.random((108, 2)), columns=['Number1','Number2'])
#Converting the dataframe from python to R
# firt take the values of the dataframe to numpy
resi1=np.array(resi, dtype=float)
# Taking the variable from Python to R
r_resi = numpy2ri(resi1)
# Creating this variable in R (from python)
r.assign("resi", r_resi)
# Calling libraries in R
r('library("MVN")')
# Calling a function in R (from python)
r("res <- hzTest(resi, qqplot = F)")
# Retrieving information from R to Python
r_result = r("res")
# Printing the output in python
print(r_result)
Henze-Zirkler's Multivariate Normality Test
---------------------------------------------
data : resi
HZ : 2.841424
p-value : 1.032563e-06
Result : Data are not multivariate normal.
---------------------------------------------
2021-08-25更新 MVN软件包和rpy2都发生了API更改。以下内容适用于MVN 5.9版本和rpy2 3.4版本。
"""Interface file to access the R MVN package"""
import numpy as np
import rpy2.robjects.packages as rpackages
from rpy2.robjects import numpy2ri
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import StrVector
# Install packages, if they are not already installed
packages_to_install_if_needed = ("MVN",)
utils = rpackages.importr("utils")
utils.chooseCRANmirror(ind=1) # select the first mirror in the list
names_to_install = [x for x in packages_to_install_if_needed if not rpackages.isinstalled(x)]
if len(names_to_install) > 0:
utils.install_packages(StrVector(names_to_install))
# load the package
mvn = importr("MVN")
# Generate data
np_arr = np.random.multivariate_normal(np.ones(2), np.eye(2), size=100)
# activate automatic conversion from numpy to rpy2 interface objects
numpy2ri.activate()
# perform the work
res = mvn.mvn(np_arr)
print(res)
$multivariateNormality
Test HZ p value MVN
1 Henze-Zirkler 0.3885607 0.8343017 YES
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.2443 0.7569 YES
2 Anderson-Darling Column2 0.3935 0.3692 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
1 100 0.9619135 1.0353688 1.0222279 -1.994833 3.679615 0.2696537 1.758255
2 100 0.7664778 0.9134449 0.8121996 -1.568635 2.648268 0.2068718 1.418113
Skew Kurtosis
1 -0.2123274 -0.16171832
2 -0.3718904 -0.05279222
有一个名为Pingouin的开源Python包,提供Henze-Zirkler多元正态性检验,并已与R的MVM进行了测试。
https://pingouin-stats.org/generated/pingouin.multivariate_normality.html
示例来自文档:
import pingouin as pg
data = pg.read_dataset('multivariate')
X = data[['Fever', 'Pressure', 'Aches']]
pg.multivariate_normality(X, alpha=.05)
>>> HZResults(hz=0.5400861018514641, pval=0.7173686509624891, normal=True)
在R中有一个名为MVN
的程序包已经可以进行这个测试。
首先,按照 此处 的说明将MVN导入Python。
然后打开你的jupyter笔记本,并对你的数据拟合VAR(1)模型,代码如下:
# Fit VAR(1) Model
results = Model.fit(1)
results.summary()
将残差存储为resi
resi=results.resid
那么
# Call function from R
import os
os.environ['R_USER'] = '...\Lib\site-packages\rpy2'
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
from rpy2.robjects.packages import importr
MVN = importr("MVN", lib_loc = "C:/.../R/win-library/3.3")
导入MVN后,您可以像这样简单地进行正态性检验
MVNresult =MVN.hzTest(resi, qqplot = 0)
如果你按下
type(MVNresult)
你会发现这是一个
rpy2.robjects.methods.RS4
tuple(MVNresult.slotnames())
这将展示给您观察结果
('HZ', 'p.value', 'dname', 'dataframe')
然后你可以这样获取值
np.array(MVNresult.slots[tuple(MVNresult.slotnames())[i]])[0]
其中 i
表示 'HZ','p-value',...
中的 0, 1, 2, 3
。
因此,在情况下,如果 p 值即 i=1
小于0.05,则残差(resi)在5%置信水平下不服从多元正态分布。