使用R计算加权调查数据中logit的边际效应

Question

使用R计算加权调查数据中logit的边际效应

4

我将尝试估算一个逻辑回归模型的边际效应，其中包含多个二元解释变量。

假设该模型由以下内容估算：

logit<- svyglm ( if_member ~ if_female + dummy_agegroup_2 + dummy_agegroup_3 + dummy_education_2 + dummy_education_3 + dummy_education_4, family = quasibinomial(link = "logit"), design = survey_design)

我知道调查数据包中的marginpred函数，但我并不是很熟悉它。我的模型只有二元变量，所以我想知道如何使用这个函数估计边际效应，特别是我对predictat（一个给出要预测的模型变量值的数据框）不太确定。

- david_sst

https://dev59.com/kITba4cB1Zd3GeqP5264 - Carl

@Carl 是的，我在提问之前就看到了这个链接。然而，它并没有回答我的问题。我已经尝试根据你提供的链接修改logitmfx函数，但是没有成功。 - david_sst

不确定它是否有价值，但是这个教程的预测边际部分可能会有所帮助？http://www.asdfree.com/2015/11/statistically-significant-trends-with.html - Anthony Damico

这可能更适合在http://stats.stackexchange.com/上。 - Stedy

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Leonardo Fontenelle · Accepted Answer

你是在寻找边际效应还是边际预测？如其名，marginpred()函数返回预测。参数predictat是一个数据框，包含控制变量和模型中的变量。请注意：控制变量应该被排除在模型之外。

library("survey")

odds2prob <- function(x) x / (x + 1)
prob2odds <- function(x) x / (1 - x)
expit <- function(x) odds2prob(exp(x))
logit <- function(x) log(prob2odds(x))

set.seed(1)

survey_data <- data.frame(
  if_female = rbinom(n = 100, size = 1, prob = 0.5), 
  agegroup = factor(sample(x = 1:3, size = 100, replace = TRUE)), 
  education = NA_integer_,
  if_member = NA_integer_)
survey_data["agegroup"] <- relevel(survey_data$agegroup, ref = 3)
# Different probabilities between female and male persons
survey_data[survey_data$if_female == 0, "education"] <- sample(
  x = 1:4, 
  size = sum(survey_data$if_female == 0), 
  replace = TRUE, 
  prob = c(0.1, 0.1, 0.5, 0.3))
survey_data[survey_data$if_female == 1, "education"] <-sample(
  x = 1:4, 
  size = sum(survey_data$if_female == 1), 
  replace = TRUE, 
  prob = c(0.1, 0.1, 0.3, 0.5))
survey_data["if_member"] <- rbinom(n = 100, size = 1, prob = 
                                     expit((survey_data$education - 3)/2))
survey_data["education"] <- factor(survey_data$education)
survey_data["education"] <- relevel(survey_data$education, ref = 3)
survey_design <- svydesign(ids = ~ 1, data = survey_data)

logit <- svyglm(if_member ~ if_female + agegroup + education, 
                family = quasibinomial(link = "logit"), 
                design = survey_design)
exp(cbind(`odds ratio` = coef(logit), confint(logit)))
newdf <- data.frame(if_female = 0:1, education = c(3, 3), agegroup =  = c(3, 3))
# Fails
mp <- marginpred(model = logit, adjustfor = ~ agegroup + education, 
                 predictat = newdf, se = TRUE, type = "response")
logit2 <- svyglm(if_member ~ if_female, 
                family = quasibinomial(link = "logit"), 
                design = survey_design)
mp <- marginpred(model = logit2, adjustfor = ~ agegroup + education, 
                 predictat = newdf, se = TRUE, type = "response")
# Probability for male and for female persons controlling for agegroup and education
cbind(prob = mp, confint(mp))

这是我使用 survey 包估计边际效应的方法：

# Probability difference between female and male persons
# when agegroup and education are set to 3
svycontrast(full_model, quote(
  (exp(`(Intercept)` + if_female) / (exp(`(Intercept)` + if_female) + 1)) - 
  (exp(`(Intercept)`) / (exp(`(Intercept)`) + 1))))
# Can't use custom functions like expit :_(

可能有更聪明的方法，但我希望它能帮助到你。

请注意，marginpred() 预测的概率差异与 svycontrast() 估计的差异不同。marginpred() 预测的概率似乎不会受到控制变量值的改变的影响（例如，education = c(4, 4) 而不是 education = c(3, 3)），但是从回归模型中暗示的来看，svycontrast() 的估计会受到影响。