在数据框中将特定的值替换为NA

3
假设我有一个数据框架:
names  <- c("John", "Mark", "Larry", "Will", "Kate", "Daria", "Tom")
gender <- c("M", "M", "M", "M", "F", "F", "M")
mark <- c(1, 2, 3, 1, 2, 3, 1)
df <- data.frame(names, gender, mark)
df

  names gender mark
1  John      M    1
2  Mark      M    2
3 Larry      M    3
4  Will      M    1
5  Kate      F    2
6 Daria      F    3
7   Tom      M    1

我不知道如何将某些值替换为NAs。例如,如果我想要将KateDariaTommark替换为NAs

  names gender mark
1  John      M    1
2  Mark      M    2
3 Larry      M    3
4  Will      M    1
5  Kate      F    NA
6 Daria      F    NA
7   Tom      M    NA

1
您可能需要使用 %in%,例如 df$mark[df$names %in% c('Kate', 'Daria', 'Tom')] <- NA - akrun
2个回答

3

尝试

df <- within(df, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA))
df
#    names gender mark
#1  John      M    1
#2  Mark      M    2
#3 Larry      M    3
#4  Will      M    1
#5  Kate      F   NA
#6 Daria      F   NA
#7   Tom      M   NA

或者

 df$mark[df$names %in% c('Kate', 'Daria', 'Tom')] <- NA

或者

 is.na(df$mark) <- df$names %in% c('Kate', 'Daria', 'Tom')

谢谢!如果我只知道行号(在这种情况下为5到7),而不知道名称,我该怎么做呢? - Zlo

1
is.na(df$mark[df$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE

有时我会发现这种语法很有用,但在这种情况下不够快速。
基准测试
big.df1 <- data.frame(names = rep(names, 1e3), 
                      gender = rep(gender, 1e3), 
                      mark = rep(mark, 1e3))
big.df4 <- big.df3 <- big.df2 <- big.df1

microbenchmark(
  plafort = is.na(big.df1$mark[big.df1$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE,
  akrun1  = within(big.df2, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA)),
  akrun2  = big.df3$mark[big.df3$names %in% c('Kate', 'Daria', 'Tom')] <- NA,
  akrun3  = is.na(big.df4$mark) <- big.df4$names %in% c('Kate', 'Daria', 'Tom')
  )
# 
# Unit: microseconds
#     expr     min       lq     mean   median       uq
#  plafort 389.623 408.9660 484.6090 426.9275 540.8135
#   akrun1 287.381 319.3570 388.3125 357.2530 419.8220
#   akrun2 193.035 204.2860 627.6559 227.7735 327.8440
#   akrun3 208.431 221.6555 274.1615 235.2740 287.3825
#        max neval
#    777.272   100
#    661.214   100
#  37325.194   100
#   1110.445   100

感谢基准测试。is.na(..) <- 这种方式通常会更慢。也许 match 可以更快。 - akrun

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接