Python中GLM的Anova检验

Question

Python中GLM的Anova检验

7

我正在尝试获取GLM中每个协变量的F统计量和p值。在Python中，我使用stats mode.formula.api来进行GLM分析。

formula = 'PropNo_Pred ~ Geography + log10BMI + Cat_OpCavity + CatLes_neles + CatRural_urban + \
        CatPred_Control + CatNative_Intro + Midpoint_of_study'

mod1 = smf.glm(formula=formula, data=A2, family=sm.families.Binomial()).fit()
mod1.summary()

接着，我尝试使用statsmodels.stats中的ANOVA测试来测试这个模型。

table1 = anova_lm(mod3)
print table1

但是我收到了一个错误，错误消息如下：'GLMResults'对象没有'ssr'属性

看起来这个anova_lm函数只适用于线性模型，请问在python中有哪个模块可以进行GLMs的ANOVA测试吗？

- DKangeyan

1

ANOVA类型3结果的初步答案。 - Josef

3个回答

3

很遗憾，没有现成的方法。但是，你可以通过在每个术语上使用模型的假设检验方法自己制作。实际上，他们的一些ANOVA方法甚至不使用属性ssr（它是模型的平方残差和，因此在二项式GLM中显然未定义）。你可以修改这段代码来执行GLM ANOVA。

- Ben Kuhn

如果您能够为实验数据添加如何使用假设检验方法的示例，那将非常棒。 - Michael Mauderer

-1

感谢您的回答，但我认为有一个小错误，函数应该是：

def calculate_nested_f_statistic(small_model, big_model):

"""Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""

   addtl_params = small_model.df_model - big_model.df_model 

   f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)

   df_numerator = addtl_params

   # use fitted values to obtain n_obs from model object:

   df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)

   p_value = stats.f.sf(f_stat, df_numerator, df_denom)

   return (f_stat, p_value)

- raphael sebbah

你运行了吗？这应该引发一个异常。 - Josef

你好，请不要仅仅在你的回答中提交代码，还要加入一些细节说明为什么你认为这是最优解决方案。 - Destroy666

是的，我运行了它，但稍后我会再次运行。上面的答案很好，我只是在addtl_param中颠倒了方程式的两个部分。 - raphael sebbah

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- rmstmppr · Accepted Answer

这里是我自己的尝试。

嵌套模型的F统计量定义如下：

(D_s - D_b) / (addtl_parameters * phi_b)

其中：

D_s 是小模型的偏离度； D_b 是大模型的偏离度； addtl_parameters 是模型之间自由度的差异； phi_b 是大模型的离散参数估计值。

"统计学理论表明，F统计量服从F分布，其分子自由度等于添加参数的数量，分母自由度等于n-p_b，即样本记录数减去大模型参数数。"

我们可以用以下代码实现：

from scipy import stats

def calculate_nested_f_statistic(small_model, big_model):
    """Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""
    addtl_params = big_model.df_model - small_model.df_model
    f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)
    df_numerator = addtl_params
    # use fitted values to obtain n_obs from model object:
    df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)
    p_value = stats.f.sf(f_stat, df_numerator, df_denom)
    return (f_stat, p_value)

这里是一个可重现的示例，按照statsmodels中的Gamma GLM示例（https://www.statsmodels.org/stable/glm.html）进行：

import numpy as np
import statsmodels.api as sm
data2 = sm.datasets.scotland.load()
data2.exog = sm.add_constant(data2.exog, prepend=False)

big_model = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma()).fit()
# Drop one covariate (column):
smaller_model = sm.GLM(data2.endog, data2.exog[:, 1:], family=sm.families.Gamma()).fit()

# Using function defined in answer:
calculate_nested_f_statistic(smaller_model, big_model)
# (9.519052917304652, 0.004914748992474178)

Source: https://www.casact.org/pubs/monographs/papers/05-Goldburd-Khare-Tevet.pdf