我正在使用MICE处理一个数据集,并遇到了麻烦。有一个变量明显与另一个变量有关,我无法弄清如何让MICE仅填补一个变量中的一些缺失值(并将其余缺失值保留为真正的缺失值)。
例如,我有一个关于性别、怀孕状态和结果的数据集。只有女性能怀孕,所以当“怀孕”缺失但主体是男性时,我不想在那里填补值。
但是我确实希望在女性的怀孕状态缺失时填补值。所有变量(包括性别和结果)都有一些缺失值。
我已经阅读了这里的建议,并尝试在MICE中使用'where'选项 'R', 'mice'缺失变量插入-如何在稀疏矩阵中仅执行一列。但是使用'where'似乎没有填补所有性别结果?
例如:
例如,我有一个关于性别、怀孕状态和结果的数据集。只有女性能怀孕,所以当“怀孕”缺失但主体是男性时,我不想在那里填补值。
但是我确实希望在女性的怀孕状态缺失时填补值。所有变量(包括性别和结果)都有一些缺失值。
我已经阅读了这里的建议,并尝试在MICE中使用'where'选项 'R', 'mice'缺失变量插入-如何在稀疏矩阵中仅执行一列。但是使用'where'似乎没有填补所有性别结果?
例如:
library(mice)
library(tidyverse)
library(haven)
library(janitor)
# create some data
sex <- c("m","f","m","f","m",NA,NA,"f","f","m","f","f","m","m","f","m")
preg <- c(NA,"not_preg",NA,NA,NA,NA,"preg","not_preg",NA,"not_preg","preg",NA,NA,NA,NA)
outcome <- c(1,0,1,0,0,NA,NA,0,0,1,0,1,1,0,0)
df <- cbind(sex,preg,outcome) %>% as_tibble() %>% mutate(sex=as_factor(sex)) %>% mutate(preg=as_factor(preg))
# look at what's missing
md.pattern(df)
df %>% tabyl(sex,preg)
df %>% tabyl(preg)
# Try to impute over everything to show mice working
mice_a <- mice(df, m=2, maxit=2, seed=3,method="pmm")
df_imp_a <- complete(mice, action="long", include = FALSE)
df_imp_a %>% filter(.imp==1) %>% tabyl(sex,preg) # this has imputed that some men are pregnant (understandably,but not what I want!
df_imp_a %>% filter(.imp==1) %>% tabyl(sex) #but everyone has a sex imputed
df_imp_a %>% filter(.imp==1) %>% tabyl(preg)
# Try to use the 'where' option
# b. Using it with a 'blank' where as proof of principle
grid_b <- is.na(df) #this is just default
mice_b <- mice(df, m=2, maxit=2, seed=3,method="pmm",where=grid_b)
df_imp_b <- complete(mice_b, action="long", include = FALSE)
df_imp_b %>% filter(.imp==1) %>% tabyl(sex,preg) #same problem of pregnant men (obviously, haven't changed anything yet)
df_imp_b %>% filter(.imp==1) %>% tabyl(sex) # but at least everyone has a sex imputed
df_imp_b %>% filter(.imp==1) %>% tabyl(preg)
# c. Making a proper grid of data that I do and don't want imputed
grid_c <- df %>%
mutate(preg=case_when(
sex=="f" & is.na(preg)==TRUE ~ TRUE,
TRUE ~ FALSE
)) %>%
mutate(sex=is.na(sex)) %>%
mutate(outcome=is.na(outcome))
grid_c
grid_c %>% tabyl(preg) # so we are looking for 4 imputed values of 'preg' (so I've done it right -- there are 4 females with unknown pregnancy status)
mice_c <- mice(df,m=2,maxit=2,seed=3,method="pmm",where=grid_c)
df_imp_c <- complete(mice_c,action="long",include=FALSE)
df_imp_c %>% filter(.imp==1) %>% tabyl(sex,preg) # now I have no pregnant men -- which is good!
df_imp_c %>% filter(.imp==1) %>% tabyl(sex) # but I am missing sex for one person??
df_imp_c %>% filter(.imp==1) %>% tabyl(preg) # have imputed all the pregnancy data that I wanted through -- only 7 NAs (for the 7 men)
如何告诉程序仅对某一列中的特定行进行填充,而不是全部进行填充?同时需要对另一列的所有行进行填充。使用“where”选项时为什么没有按我预期的那样工作?
非常感谢您提供的帮助!谢谢。