我尝试重新创建一些数据,重点关注交互中的变量。我不确定目标是否仅是获取值,还是需要特定格式,但这是使用pandas解决问题的示例(因为您在原始帖子中导入了pandas):
import pandas as pd
import statsmodels.formula.api as sm
np.random.seed(2)
df = pd.DataFrame()
df['instagram_posts'] = np.random.rand(50)
df['airports'] = np.random.rand(50)
df['CNMCRGNNM'] = np.random.choice(['Kootenay','Nechako','North Coast','Northeast','Thompson-Okanagan'],50)
fit = sm.ols(formula="instagram_posts ~ airports * C(CNMCRGNNM)",data=df).fit()
print(fit.summary())
这是输出内容:
==============================================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------------------------------
Intercept 0.4594 0.159 2.885 0.006 0.138 0.781
C(CNMCRGNNM)[T.Nechako] -0.2082 0.195 -1.067 0.292 -0.602 0.186
C(CNMCRGNNM)[T.North Coast] -0.1268 0.360 -0.352 0.726 -0.854 0.601
C(CNMCRGNNM)[T.Northeast] 0.0930 0.199 0.468 0.642 -0.309 0.495
C(CNMCRGNNM)[T.Thompson-Okanagan] 0.1439 0.245 0.588 0.560 -0.351 0.638
airports -0.1616 0.277 -0.583 0.563 -0.722 0.398
airports:C(CNMCRGNNM)[T.Nechako] 0.7870 0.343 2.297 0.027 0.094 1.480
airports:C(CNMCRGNNM)[T.North Coast] 0.3008 0.788 0.382 0.705 -1.291 1.893
airports:C(CNMCRGNNM)[T.Northeast] -0.0104 0.348 -0.030 0.976 -0.713 0.693
airports:C(CNMCRGNNM)[T.Thompson-Okanagan] -0.0311 0.432 -0.072 0.943 -0.904 0.842
将 alpha 更改为您偏爱的显著性水平:
alpha = 0.05
df = pd.DataFrame(data = [x for x in fit.summary().tables[1].data[1:] if float(x[4]) < alpha], columns = fit.summary().tables[1].data[0])
数据框架
df 包含原始表格中对于 alpha 有显著意义的记录。在本例中,它是
Intercept 和
airports:C(CNMCRGNNM)[T.Nechako]。