R probit回归边际效应

Question

R probit回归边际效应

4

我使用R语言复制一项研究，并希望得到作者报告的大部分相同结果。然而，在某个时候，我计算出的边际效应似乎过于小了。如果您能查看我的推理和下面的代码，并确定我是否在某个地方出错，我将不胜感激。

我的样本包含24535个观测值，因变量“x028bin”是一个二元变量，取值为0和1，还有10个解释变量。其中9个自变量具有数值水平，“f025grouped”自变量是由不同宗教派别组成的因子。

我想运行一个包括宗教派别虚拟变量的probit回归模型，然后计算边际效应。为此，我首先消除缺失值，并使用因变量和自变量之间的交叉表来验证是否存在小于等于0的单元格。然后我运行probit模型，该模型正常工作，并且我也获得了合理的结果：

probit4AKIE <- glm(x028bin ~ x003 + x003squ + x025secv2 + x025terv2 + x007bin + x04chief + x011rec + a009bin + x045mod + c001bin + f025grouped, family=binomial(link="probit"), data=wvshm5red2delna, na.action=na.pass)

summary(probit4AKIE)

然而，当使用 probit 系数和一个比例因子计算所有变量处于其平均值时的边际效应时，我得到的边际效应太小了（例如，2.6042e-78）。

代码如下：

ttt <- cbind(wvshm5red2delna$x003,
wvshm5red2delna$x003squ,
wvshm5red2delna$x025secv2,
wvshm5red2delna$x025terv2,
wvshm5red2delna$x007bin,
wvshm5red2delna$x04chief,
wvshm5red2delna$x011rec,
wvshm5red2delna$a009bin,
wvshm5red2delna$x045mod,
wvshm5red2delna$c001bin,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped,
wvshm5red2delna$f025grouped) #I put variable "f025grouped" 9 times because this variable consists of 9 levels

ttt <- as.data.frame(ttt)

xbar <- as.matrix(mean(cbind(1,ttt[1:19]))) #1:19 position of variables in dataframe ttt

betaprobit4AKIE <- probit4AKIE$coefficients

zxbar <- t(xbar) %*% betaprobit4AKIE

scalefactor <- dnorm(zxbar)

marginprobit4AKIE <- scalefactor * betaprobit4AKIE[2:20] #2:20 are the positions of variables in the output of the probit model 'probit4AKIE' (variables need to be in the same ordering as in data.frame ttt), the constant in the model occupies the first position

marginprobit4AKIE #in this step I obtain values that are much too small

抱歉，由于我的数据集过大，我无法为您提供一个可行的示例。非常感谢您的任何意见。非常感谢。

最好的祝福，

托比亚斯

- Tobias

1

我认为你最好在Crossvalidated上发布，这是SO的统计学姐妹网站：http://stats.stackexchange.com/ - Gavin Simpson

我假设你知道probit变量的边际效应取决于变量的值。由于你的变量是分类变量，也许使用均值对它们来说没有意义。 - Manoel Galdino

@Tobias，我认为对于分类变量来说，使用众数比平均数更好。例如，如果分类变量是性别，那么平均值意味着什么？然而，如果您有一些连续但离散的分类变量（如收入类别），那么可能可以使用它的平均值。 - Manoel Galdino

@Tobias。重新阅读您的评论，我想我明白了出了什么问题。如果您的分类变量在R中是因子，则在计算平均值（xbar）时，R可能会将因子转换为数字，并且由于您有多个类别，因此平均值可能是4或5之类的数字。查看旧代码中`xbar´的值。 - Manoel Galdino

关于变量f025grouped：我将这个变量导入为因子，所以我认为R不会将其转换为数值型。我现在认为我最初犯的错误是告诉R取一个因子的平均值，这可能导致接近0的边际效应。然而，通过修改我的代码（参见我的第一个评论），我现在取9个（不同的）虚拟变量的平均值，结果基本正常。 - Tobias

显示剩余4条评论

2个回答

1

这将适用于probit或logit：

mfxboot <- function(modform,dist,data,boot=1000,digits=3){
  x <- glm(modform, family=binomial(link=dist),data)
  # get marginal effects
  pdf <- ifelse(dist=="probit",
                mean(dnorm(predict(x, type = "link"))),
                mean(dlogis(predict(x, type = "link"))))
  marginal.effects <- pdf*coef(x)
  # start bootstrap
  bootvals <- matrix(rep(NA,boot*length(coef(x))), nrow=boot)
  set.seed(1111)
  for(i in 1:boot){
    samp1 <- data[sample(1:dim(data)[1],replace=T,dim(data)[1]),]
    x1 <- glm(modform, family=binomial(link=dist),samp1)
    pdf1 <- ifelse(dist=="probit",
                   mean(dnorm(predict(x, type = "link"))),
                   mean(dlogis(predict(x, type = "link"))))
    bootvals[i,] <- pdf1*coef(x1)
  }
  res <- cbind(marginal.effects,apply(bootvals,2,sd),marginal.effects/apply(bootvals,2,sd))
  if(names(x$coefficients[1])=="(Intercept)"){
    res1 <- res[2:nrow(res),]
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep=""),res1)),nrow=dim(res1)[1])
    rownames(res2) <- rownames(res1)
  } else {
    res2 <- matrix(as.numeric(sprintf(paste("%.",paste(digits,"f",sep=""),sep="")),nrow=dim(res)[1]))
    rownames(res2) <- rownames(res)
  }
  colnames(res2) <- c("marginal.effect","standard.error","z.ratio")
  return(res2)
}

来源：http://www.r-bloggers.com/probitlogit-marginal-effects-in-r/

- Hack-R

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Manoel Galdino · Accepted Answer

@Gavin 是正确的，最好在姊妹网站上询问。

无论如何，这是我的解释 probit 系数的技巧。

Probit 回归系数与 Logit 系数相同，只有一个比例尺 (1.6) 的差别。因此，如果 probit 模型的拟合为 Pr(y=1) = fi(.5 - .3*x)，则等效于 logistic 模型 Pr(y=1) = invlogit(1.6(.5 - .3*x))。

我使用包 arm 中的函数 invlogit 来制作图形。另一种可能性是将所有系数（包括截距）乘以 1.6，然后应用“除以4规则”（参见 Gelman 和 Hill 的书），即将新系数除以 4，就可以找到对应于 x 单位差异的预测差异的上限。

以下是一个示例。

x1 = rbinom(100,1,.5)
x2 = rbinom(100,1,.3)
x3 = rbinom(100,1,.9)
ystar = -.5  + x1 + x2 - x3 + rnorm(100)
y = ifelse(ystar>0,1,0)
probit = glm(y~x1 + x2 + x3, family=binomial(link='probit'))
xbar <- as.matrix(mean(cbind(1,ttt[1:3])))

# now the graphic, i.e., the marginal effect of x1, x2 and x3
library(arm)
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*x + probit$coef[3]*xbar[3] + probit$coef[4]*xbar[4]))) #x1
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*x + probit$coef[4]*xbar[4]))) #x2
curve(invlogit(1.6*(probit$coef[1] + probit$coef[2]*xbar[2] + probit$coef[3]*xbar[3] + probit$coef[4]*x))) #x3