这里是我自己的尝试。
嵌套模型的F统计量定义如下:
(D_s - D_b) / (addtl_parameters * phi_b)
其中:
D_s 是小模型的偏离度;
D_b 是大模型的偏离度;
addtl_parameters 是模型之间自由度的差异;
phi_b 是大模型的离散参数估计值。
"统计学理论表明,F统计量服从F分布,其分子自由度等于添加参数的数量,分母自由度等于n-p_b,即样本记录数减去大模型参数数。"
我们可以用以下代码实现:
from scipy import stats
def calculate_nested_f_statistic(small_model, big_model):
"""Given two fitted GLMs, the larger of which contains the parameter space of the smaller, return the F Stat and P value corresponding to the larger model adding explanatory power"""
addtl_params = big_model.df_model - small_model.df_model
f_stat = (small_model.deviance - big_model.deviance) / (addtl_params * big_model.scale)
df_numerator = addtl_params
df_denom = (big_model.fittedvalues.shape[0] - big_model.df_model)
p_value = stats.f.sf(f_stat, df_numerator, df_denom)
return (f_stat, p_value)
这里是一个可重现的示例,按照statsmodels中的Gamma GLM示例(https://www.statsmodels.org/stable/glm.html)进行:
import numpy as np
import statsmodels.api as sm
data2 = sm.datasets.scotland.load()
data2.exog = sm.add_constant(data2.exog, prepend=False)
big_model = sm.GLM(data2.endog, data2.exog, family=sm.families.Gamma()).fit()
smaller_model = sm.GLM(data2.endog, data2.exog[:, 1:], family=sm.families.Gamma()).fit()
calculate_nested_f_statistic(smaller_model, big_model)
Source:
https://www.casact.org/pubs/monographs/papers/05-Goldburd-Khare-Tevet.pdf