在R中将所选列中的所有NA替换为FALSE

Question

在R中将所选列中的所有NA替换为FALSE

rdataframenamissing-dataimputation

23

我有一个类似于这个问题的问题，但我的数据集稍微大一些：50列中有1列作为UID，其他列要么带有TRUE，要么是NA，我想把所有的NA改成FALSE，但我不想使用显式循环。

plyr能完成这个任务吗？谢谢。

更新#1

感谢快速回复，但如果我的数据集像下面这样：

df <- data.frame(
  id = c(rep(1:19),NA),
  x1 = sample(c(NA,TRUE), 20, replace = TRUE),
  x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)

我只想处理X1和X2，怎样才能做到？

- lokheart

6个回答

17

tidyr::replace_na 是一个非常优秀的函数。

df %>%
  replace_na(list(x1 = FALSE, x2 = FALSE))

这是一个很好的快速解决方案。唯一的技巧是你要列出你想要更改的列。

- mtelesha

9

试试这段代码：

df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
replace(df, is.na(df), FALSE)

更新：提供另一种解决方案。

df2 <- df <- data.frame(
  id = c(rep(1:19), NA),
  x1 = sample(c(NA, TRUE), 20, replace = TRUE),
  x2 = sample(c(NA, TRUE), 20, replace = TRUE)
)
df2[names(df) == "id"] <- FALSE
df2[names(df) != "id"] <- TRUE
replace(df, is.na(df) & df2, FALSE)

- Triad sou.

4

你可以在gdata包中使用NAToUnknown函数。

df[,c('x1', 'x2')] = gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')

- Ramnath

3

功能非常出色，但有一个问题 - 如果我想将未知值更改为0，并且向量中已经存在一些缺失值和0，则会收到错误信息Error in NAToUnknown.default(x = dots[[1L]][[1L]], unknown = dots[[2L]][[1L]], : 'x' already has value “0”。 - Jubbles

4

使用dplyr，您也可以执行以下操作：

df %>% mutate_each(funs(replace(., is.na(.), F)), x1, x2)

与仅使用replace()相比，它的可读性稍低，但更通用，因为它允许选择要转换的列。如果您想在某些列中保留NAs但想在其他列中摆脱NAs，则特别适用于此解决方案。

- Holger Brandl

0

一个选择是使用for循环。

for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE

基准测试

set.seed(42)
df <- data.frame(
  id = c(rep(1:19),NA),
  x1 = sample(c(NA,TRUE), 20, replace = TRUE),
  x2 = sample(c(NA,TRUE), 20, replace = TRUE)
)

bench::mark(check=FALSE,
"Holger Brandl" = local(dplyr::mutate_each(df, dplyr::funs(replace(., is.na(.), F)), x1, x2)),
"mtelesha" = local(df <- tidyr::replace_na(df, list(x1 = FALSE, x2 = FALSE))),
Ramnath = local(df[,c('x1', 'x2')] <- gdata::NAToUnknown(df[,c('x1', 'x2')], unknown = 'FALSE')),
"Hong Ooi" = local(df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE),
GKi = local(for(i in c("x1", "x2")) df[[i]][is.na(df[[i]])] <- FALSE) )
#  expression         min   median `itr/sec` mem_al…¹ gc/se…² n_itr  n_gc total…³
#  <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t>
#1 Holger Brandl  16.93ms  17.33ms      57.6  34.43KB    19.2    21     7   365ms
#2 mtelesha        3.94ms   4.39ms     226.    8.15KB    13.1   103     6   456ms
#3 Ramnath       400.28µs 415.44µs    2381.    1.55KB    16.7  1142     8   480ms
#4 Hong Ooi      196.87µs 206.72µs    4755.      488B    18.8  2276     9   479ms
#5 GKi             61.8µs  66.16µs   14808.      280B    20.9  7076    10   478ms

for循环比第二个选项Hong Ooi快3倍，并且使用最少的内存。

- GKi

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Hong Ooi · Accepted Answer

34

如果你想对变量的一部分进行替换，仍然可以使用 is.na(*) <- 技巧，如下所示：

df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE

我认为使用临时变量可以让逻辑更易于理解：

vars.to.replace <- c("x1", "x2")
df2 <- df[vars.to.replace]
df2[is.na(df2)] <- FALSE
df[vars.to.replace] <- df2

- Hong Ooi

3

我知道这是一篇旧文章，但您能为我解释一下第一行吗？当你使用临时变量拆分它时，我理解其中的逻辑，但我想了解单行形式。我以为我对子集有所了解，但我不理解[ ][ ]的含义。我搜索了“双括号”，但那出现了不同的内容。 - tmakino

3

你只需要从左到右阅读双括号中的不同子集。例如，如果 x <- 1:10，那么 x[5:10][1:4] 将给你向量 5 6 7 8。通过多个步骤，你可以先取第一个子集并将其命名为 y，即 y <- x[5:10]，此时 y 是 5 6 7 8 9 10。然后再在向量 y 上进行子集操作 y[1:4]，这样就再次得到了 5 6 7 8。 - blakeoft

您还可以使用列位置而不是显式命名它们，这在您需要转换许多变量或它们具有较长名称时非常有用：例如，df2[,14:16][is.na(df2[,14:16])] <- 0 将数据框 df2 的第14、15和16列中的 NA 替换为 0。 - coip