如何在R中使用if else流程控制来测试多个条件

4

这似乎是一个非常简单的流程控制结构问题,但我很难找到R语言中正确的语法,我已经尝试了很多次但都没有成功。我一定是忽略了什么非常明显的东西。

我想循环遍历一个包含巴西州代码的列表,并返回其所属的区域。我的目标是操作一个更大的数据集,而不是一个列表,但这里使用了一个最小可行性示例:

a <- c("RO", "AC", "AM" ,"RR", "PA", "AP", "TO", "MA", "PI", "CE", "RN", "PB", "PE", "AL", "SE", "BA", "MG", "ES", "RJ", "SP")

setregion <- function(uf) {
  pb = txtProgressBar(min = 0, max = length(uf), initial = 0) 
  region_out<-list()
  for (i in length(uf)) {
    if (uf %in% c("RO"  ,"AC" ,"AM" ,"RR", "PA" , "AP" , "TO")) {
      region_out <- append(region_out,"North")
    } else if (  uf %in% c("MA","PI","CE","RN","PB","PE","AL","SE","BA")) {
      region_out <-append(region_out,"Northeast")
    } else if ( uf %in% c("MG","ES","RJ","SP")){
      region_out <- append(region_out,"Southeast")
    } else if ( uf %in% c("PR", "SC", "RS")){
      region_out <- append(region_out,"South") 
    } else if ( uf %in% c("MS","MT","GO", "DF")){
      region_out <-append(region_out,"Midwest")
    }
    setTxtProgressBar(pb,i)
  }
  return(region_out)
}

setregion(a)

运行以上代码后,似乎if循环也打破了for循环,并且只返回“North”,这是列表中第一个项目的响应。
我期望得到一个看起来像这样的列表:
"North", "North", "North" ,"North", "North", "North","North", "Northeast", "Northeast",...
  • 我错过了什么?

你能展示一下你期望的输出是什么样子吗? - jay.sf
如请求所示,请参见上文。 - lf_araujo
您可以使用已命名向量,如此处所述:从查找表中创建新变量。另请参阅查找表(字符子集)、匹配和合并 - Henrik
4个回答

4

常规的if-else存在无法向量化的问题。因此,需要采用向量化方法,例如ifelse函数。但在您的情况下,由于有很多条件,所以来自dplyr库的case_when函数可能更合适:

library(dplyr)

setregion <- function(uf) {
    region_out <- case_when(
        uf %in% c("RO","AC","AM","RR","PA","AP","TO") ~ "North",
        uf %in% c("MA","PI","CE","RN","PB","PE","AL","SE","BA") ~ "Northeast",
        uf %in% c("MG","ES","RJ","SP") ~ "Southeast",
        uf %in% c("PR", "SC", "RS") ~ "South",
        uf %in% c("MS","MT","GO", "DF") ~ "Midwest"
    )
    return(region_out)
}

这个可以用,不过需要花些时间看一下 case_when 函数的文档。 - lf_araujo

3
最好的做法是避免硬编码此映射; 相反,更好的方法是将其放在文件/表中,并让代码独立于此类映射(这可能会在随后发生更改)。
考虑构建类似于以下的表格(我可能在关联正确区域方面犯了错误,但无论如何):
ufToRegionMap <- structure(list(uf = c("RO", "AC", "AM", "RR", "PA", "AP", "TO", 
"MA", "PI", "CE", "RN", "PB", "PE", "AL", "SE", "BA", "MG", "ES", 
"RJ", "SP", "PR", "SC", "RS", "MS", "MT", "GO", "DF"), region = c("North", 
"North", "North", "North", "North", "North", "North", "Northeast", 
"Northeast", "Northeast", "Northeast", "Northeast", "Northeast", 
"Northeast", "Northeast", "Northeast", "Southeast", "Southeast", 
"Southeast", "Southeast", "South", "South", "South", "Midwest", 
"Midwest", "Midwest", "Midwest")), class = "data.frame", row.names = c(NA, 
-27L))

然后,您可以将函数简单地定义为以下内容:
setregion <- function(uf, ufToRegionMap) {
   ufToRegionMap$region[match(uf,ufToRegionMap$uf)]
}

避免所有的if-else头疼,让代码自然向量化。此外,如果您想要更改或创建另一个区域/关联,只需更改ufToRegionMap而无需更改setregion函数。


2
如果您不喜欢case_when(),您可以在函数中使用within()和简单的条件赋值。原始答案被翻译成“最初的回答”。
regionizer <- function(dat, a) within(dat, {
  region_out[a %in% c("RO"  ,"AC" ,"AM" ,"RR", "PA" , "AP" , "TO")] <- "North"
  region_out[a %in% c("MA","PI","CE","RN","PB","PE","AL","SE","BA")] <- "Northeast"
  region_out[a %in% c("MG","ES","RJ","SP")] <- "Southeast"
  region_out[a %in% c("PR", "SC", "RS")] <- "South"
  region_out[a %in% c("MS","MT","GO", "DF")] <- "Midwest"
})

regionizer(dat, a)

#     a           x region_out
# 1  RO  0.15983063      North
# 2  AC -0.24371961      North
# 3  AM -0.52700098      North
# 4  RR  0.38777302      North
# 5  PA  0.91111258      North
# 6  AP -1.31696659      North
# 7  TO -0.16136374      North
# 8  MA -0.85951191  Northeast
# 9  PI  0.13187218  Northeast
# 10 CE -1.62908394  Northeast
...

Data: dat <- data.frame(a, x=rnorm(length(a)))


2
另外,这可以通过与查找表 lut 合并/连接来解决。最初的回答。
a <- c("RO", "AC", "AM" ,"RR", "PA", "AP", "TO", "MA", "PI", "CE", "RN", "PB", "PE", "AL", "SE", "BA", "MG", "ES", "RJ", "SP")

library(data.table)
library(magrittr)

# create look-up table from code snippets supplied by OP
lut <- list(
  North = c("RO"  ,"AC" ,"AM" ,"RR", "PA" , "AP" , "TO"),
  Northeast = c("MA","PI","CE","RN","PB","PE","AL","SE","BA"),
  Southeast = c("MG","ES","RJ","SP"),
  South = c("PR", "SC", "RS"),
  Midwest = c("MS","MT","GO", "DF")
) %>% 
  lapply(as.data.table) %>% 
  rbindlist(idcol = "region")

# update join
as.data.table(a)[lut, on = .(a == V1), region_out := region][]
     a region_out
 1: RO      North
 2: AC      North
 3: AM      North
 4: RR      North
 5: PA      North
 6: AP      North
 7: TO      North
 8: MA  Northeast
 9: PI  Northeast
10: CE  Northeast
11: RN  Northeast
12: PB  Northeast
13: PE  Northeast
14: AL  Northeast
15: SE  Northeast
16: BA  Northeast
17: MG  Southeast
18: ES  Southeast
19: RJ  Southeast
20: SP  Southeast

这个查找表是根据原帖提供的代码片段构建的:

"OP" 的意思是 "Original Poster",即最初发布帖子的人。

       region V1
 1:     North RO
 2:     North AC
 3:     North AM
 4:     North RR
 5:     North PA
 6:     North AP
 7:     North TO
 8: Northeast MA
 9: Northeast PI
10: Northeast CE
11: Northeast RN
12: Northeast PB
13: Northeast PE
14: Northeast AL
15: Northeast SE
16: Northeast BA
17: Southeast MG
18: Southeast ES
19: Southeast RJ
20: Southeast SP
21:     South PR
22:     South SC
23:     South RS
24:   Midwest MS
25:   Midwest MT
26:   Midwest GO
27:   Midwest DF
       region V1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接