在R中打印所有的glm系数

Question

在R中打印所有的glm系数

rglm

3

如何打印glm系数，包括参考水平的所有因子水平？ summary(glm_obj) 仅打印偏离参考值的值。

我知道那些是0，但我需要这个来进行集成，即告诉其他系统可能发生的因子水平。

如果太简单，请原谅，我找不到任何地方。

谢谢

为了说明我面临的问题：

> glm(Petal.Width~Species,data=iris)  

Call:  glm(formula = Petal.Width ~ Species, data = iris)  

Coefficients:
          (Intercept)  Speciesversicolor   Speciesvirginica  
                0.246              1.080              1.780  

Degrees of Freedom: 149 Total (i.e. Null);  147 Residual
Null Deviance:      86.57 
Residual Deviance: 6.157    AIC: -45.29`

上面的模型描述只提供了鸢尾花属种和维吉尼亚种的系数，从模型本身的角度来看这是完全可以接受的，正如Dason所指出的那样。

但是，我需要与另一个应用程序共享该模型，该应用程序必须知道何种“物种”水平可预期（例如，在出现未研究过的新水平时发出警告）。

Summary()给出了相同的结果：

> summary(glm(Petal.Width~Species,data=iris))

Call:
glm(formula = Petal.Width ~ Species, data = iris)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-0.626  -0.126  -0.026   0.154   0.474  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)        0.24600    0.02894    8.50 1.96e-14 ***
Speciesversicolor  1.08000    0.04093   26.39  < 2e-16 ***
Speciesvirginica   1.78000    0.04093   43.49  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.04188163)

Null deviance: 86.5699  on 149  degrees of freedom
Residual deviance:  6.1566  on 147  degrees of freedom
AIC: -45.285

Number of Fisher Scoring iterations: 2

- coulminer

1

参考水平不会获得系数。它们不存在，因此不会被打印出来。 - Dason

当然，Dason，但请阅读问题：我仍然希望能够打印这些系数，因为我正在将模型发送到另一个应用程序，并且需要知道存在哪些系数。 - coulminer

请考虑包含一个小型的可重现示例，以便我们更好地理解和更容易地回答您的问题。 - Ben Bolker

4个回答

3

你可以重新编写summary.glm方法。你可以在控制台输入summary.glm查看其源代码，或者先使用sink将源代码转储到文件中。大多数显示方法都是用R本身编写的，因此你应该能够浏览代码，并在必要时添加一行。

另外，你可以为参考水平定义一个额外的虚拟变量并将其添加到模型中。R只会给出警告，并将系数设置为NA。例如：

 # no coefficient for the reference level
l = lm(Sepal.Width~Species,iris)

# make a dummy for the reference level
iris$Speciessetosa = as.numeric(iris$Species == "setosa")

# you get NA for the coefficient on new dummy
l = lm(Sepal.Width~Species+Speciessetosa,iris)

很遗憾，你不能只是设置l$coefficients[4] = 0，因为它不会显示在打印方法中。这种方法无效的原因可以从源代码中清楚地看出来，我建议略读一下源代码。

如果你真的需要0而不是NA，你可以通过运行输出结果使用sed将该行中的NA更改为0，或者将输出保存为R character向量并使用内置的gsub函数进行更改，如果只有少数这样的情况，也可以手动更改，将输出sink到文件中，并使用R编辑器或类似Word或Sublime的编辑器的查找和替换功能等。

- ssdecontrol

3

我认为这种方法比已提出的方法更适合，因此我给自己回答了。

分享预测模型并不是summary.glm方法的目的，因此summary(model)对应用模型的数据没有太多说明。

然而，有一种解决方案--使用PMML，它可以描述模型和应用于其上的数据。

示例：

> library(pmml)
> pmml(glm(Petal.Width~Species,data=iris))
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_2 http://www.dmg.org/v4-2/pmml-4-2.xsd">
 <Header copyright="Copyright (c) 2014 dmitrijsl" description="Generalized Linear     Regression Model">
  <Extension name="user" value="dmitrijsl" extender="Rattle/PMML"/>
  <Application name="Rattle/PMML" version="1.4"/>
  <Timestamp>2014-07-15 15:07:51</Timestamp>
 </Header>
 <DataDictionary numberOfFields="2">
  <DataField name="Petal.Width" optype="continuous" dataType="double"/>
  <DataField name="Species" optype="categorical" dataType="string">
   <Value value="setosa"/>
   <Value value="versicolor"/>
   <Value value="virginica"/>
  </DataField>
...

现在，Setosa也在接收系统的列表中，以便系统知道可以期望什么，而且模型描述也已经加入其中：

...
<ParameterList>
 <Parameter name="p0" label="(Intercept)"/>
 <Parameter name="p1" label="Speciesversicolor"/>
 <Parameter name="p2" label="Speciesvirginica"/>
</ParameterList>
<FactorList>
 <Predictor name="Species"/>
</FactorList>
<CovariateList/>
<PPMatrix>
 <PPCell value="versicolor" predictorName="Species" parameterName="p1"/>
 <PPCell value="virginica" predictorName="Species" parameterName="p2"/>
</PPMatrix>
<ParamMatrix>
 <PCell parameterName="p0" df="1" beta="0.245999999999997"/>
 <PCell parameterName="p1" df="1" beta="1.08"/>
 <PCell parameterName="p2" df="1" beta="1.78"/>
</ParamMatrix>

- coulminer

0

rating_factors() 返回 glm 对象的所有系数：

library(insurancerating)
x <- glm(Petal.Width~Species,data=iris) 
rating_factors(x, exponentiate = FALSE)$df
#>   risk_factor       level est_x signif_x
#> 1 (Intercept) (Intercept) 0.246      ***
#> 2     Species      setosa 0.000         
#> 3     Species  versicolor 1.080      ***
#> 4     Species   virginica 1.780      ***

^{由reprex package (v0.3.0)于2021年11月28日创建}

- mharinga

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Choppy123 · Accepted Answer

我知道这个问题很久了，但一个简单的解决方案是使用dummy.coef函数

fit<-glm(Petal.Width~Species,data=iris)  
summary(fit)
dummy.coef(fit)

> dummy.coef(fit)
Full coefficients are 

(Intercept):     0.246                     
Species:        setosa versicolor virginica
                  0.00       1.08      1.78

我希望这能有所帮助！