模型框架默认错误(Terms,newdata,na.action = na.action,xlev = object$xlevels):因子X有新水平。

7

我做了一次逻辑回归:

 EW <- glm(everwrk~age_p + r_maritl, data = NH11, family = "binomial")

此外,我希望能针对每个r_maritl的级别预测everwrkr_maritl包括以下级别:
levels(NH11$r_maritl)
 "0 Under 14 years" 
 "1 Married - spouse in household" 
 "2 Married - spouse not in household"
 "3 Married - spouse in household unknown" 
 "4 Widowed"                               
 "5 Divorced"                             
 "6 Separated"                             
 "7 Never married"                        
 "8 Living with partner"  
 "9 Unknown marital status"  

于是我就这么做了:

predEW <- with(NH11,
expand.grid(r_maritl = c( "0 Under 14 years", "1 Married - 
spouse in household", "2 Married - spouse not in household", "3 Married - 
spouse in household unknown", "4 Widowed", "5 Divorced", "6 Separated", "7 
Never married", "8 Living with partner", "9 Unknown marital status"),
age_p = mean(age_p,na.rm = TRUE)))

cbind(predEW, predict(EW, type = "response",
                        se.fit = TRUE, interval = "confidence",
                        newdata = predEW))

问题是我得到了以下回复:
``` 在 model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) 中出错:factor r_maritl has new levels 0 Under 14 years, Married - spouse in household unknown ```
样本数据:
str(NH11$age_p)
num [1:33014] 47 18 79 51 43 41 21 20 33 56 ...

str(NH11$everwrk)
Factor w/ 2 levels "2 No","1 Yes": NA NA 2 NA NA NA NA NA 2 2 ...

str(NH11$r_maritl)
Factor w/ 10 levels "0 Under 14 years",..: 6 8 5 7 2 2 8 8 8 2 ...

能否提供一些样例数据?例如 mtcars,目前我无法重现你的问题。另外,你的数据集 NH11 中是否存在未使用的因子? - coffeinjunky
2个回答

14
简而言之,您的因子中可能存在某些水平,在数据中没有被表示出来,从而使得模型中使用的因子被删除。事后看来这并不令人惊讶,因为您无法预测这些水平的响应。尽管如此,R 并没有像自动生成 NA 值一样为您提供便利。您可以通过在构建预测框架时使用 levels(droplevels(NH11$r_maritl)) 或等效的 EW$xlevels$r_maritl 来解决这个问题。
一个可复制的示例:
maritl_levels <- c( "0 Under 14 years", "1 Married - spouse in household", 
  "2 Married - spouse not in household", "3 Married - spouse in household unknown", 
  "4 Widowed", "5 Divorced", "6 Separated", "7 Never married", "8 Living with partner", 
 "9 Unknown marital status")
set.seed(101)
NH11 <- data.frame(everwrk=rbinom(1000,size=1,prob=0.5),
                 age_p=runif(1000,20,50),
                 r_maritl = sample(maritl_levels,size=1000,replace=TRUE))

让我们创造一个缺失的关卡:

NH11 <- subset(NH11,as.numeric(NH11$r_maritl) != 3)

拟合模型:

EW <- glm(everwrk~r_maritl+age_p,data=NH11,family=binomial)
predEW <- with(NH11,
  expand.grid(r_maritl=levels(r_maritl),age_p=mean(age_p,na.rm=TRUE)))
predict(EW,newdata=predEW)

成功!

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor r_maritl has new levels 2 Married - spouse not in household


错误:在model.frame.default(Terms,newdata,na.action = na.action,xlev = object $ xlevels)中,因子r_maritl有新的2个级别,已婚-配偶不在家庭内。
predEW <- with(NH11,
           expand.grid(r_maritl=EW$xlevels$r_maritl,age_p=mean(age_p,na.rm=TRUE)))
predict(EW,newdata=predEW)

1
我真的非常怀疑[版本差异]。这是所有核心R功能,已经存在了几十年,并由数千个用户使用。我的先验非常强烈地认为这是OP的一个细微的拼写错误。 - Ben Bolker
酷,我今天学到了新东西。(我不认为我以前遇到过这个特殊的问题。) - Ben Bolker
1
那个命令不会做你想要的事情。使用 droplevels() - Ben Bolker

0

非常感谢您的答案,我也遇到了新层次的同样问题。 我已经在我的代码中进行了以下更改。

  1. 我使用了 data.frame() 并将其替换为 expand.grid() 函数。
  2. 在平均函数中,我将 na.rm=TRUE 添加到变量后面。
  3. 将 factor(1:2) 替换为 glmoutput$xlevels$variablename

这个解决方案起作用了!


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接