嵌套的ifelse语句

Question

嵌套的ifelse语句

rif-statementnestedsas

72

我仍在学习如何将SAS代码翻译成R，并且我收到了警告。我需要理解我犯了哪些错误。我想要做的是创建一个变量，总结并区分人口的三种状态：本地人、海外人和外国人。我有一个包含两个变量的数据库：

id国籍：idnat（法国人，外国人）

如果idnat是法国人，则：

id出生地：idbp（本土，殖民地，海外）

我想要将idnat和idbp的信息总结到一个名为idnat2的新变量中：

状态：k（本地人，海外人，外国人）

所有这些变量都使用“字符类型”。

期望在列idnat2中得到以下结果：

   idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign

以下是我要将其翻译为R语言的SAS代码：

if idnat = "french" then do;
   if idbp in ("overseas","colony") then idnat2 = "overseas";
   else idnat2 = "mainland";
end;
else idnat2 = "foreigner";
run;

这是我用 R 语言尝试的结果：

if(idnat=="french"){
    idnat2 <- "mainland"
} else if(idbp=="overseas"|idbp=="colony"){
    idnat2 <- "overseas"
} else {
    idnat2 <- "foreigner"
}

我收到了这个警告：

Warning message:
In if (idnat=="french") { :
  the condition has length > 1 and only the first element will be used

我被建议使用“嵌套的ifelse”因为它更易于使用，但是会得到更多的警告：

idnat2 <- ifelse (idnat=="french", "mainland",
        ifelse (idbp=="overseas"|idbp=="colony", "overseas")
      )
            else (idnat2 <- "foreigner")

根据警告信息，长度大于1，因此只有第一个括号内的内容会被考虑。抱歉，但我不明白这个长度与此有何关系？有人知道我错在哪里吗？

- balour

7

不应将 ifelse 和 else 混用。 - Roland

1

@Roland 您说得对，感谢建议，我已经放置了结果。如果可以的话，我只想要idnat2列的数据。@KarlForner 谢谢您，这正是我想用简单的例子来做的事情，但我在使用“R”时真的很困难。我尝试使用SPSS进行相同的操作，那会更加容易一些。 - balour

我的观点是，SO并不能替代学习一门语言。有很多书籍、教程等资源可供使用...当你遇到困难时，可以在这里发帖求助，前提是你已经尝试过其他所有资源。祝好。 - Karl Forner

7

我完全同意你的观点。然而，在这个特定的情况下（if vs. ifelse），我投了赞成票，因为当我开始使用R时，我也遇到了完全相同的问题。《R介绍》中没有清楚地说明，R语言定义中也没有关于 ifelse 的内容，只有在R For Dummies中有一些示例。还有其他描述 if 和 ifelse 差异的来源吗？ - Tomas Greif

10个回答

13

尝试类似以下的内容：

# some sample data
idnat <- sample(c("french","foreigner"),100,TRUE)
idbp <- rep(NA,100)
idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE)

# recoding
out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland",
              ifelse(idbp %in% c("overseas","colony"),"overseas",
                     "foreigner"))
cbind(idnat,idbp,out) # check result

您的困惑源于 SAS 和 R 如何处理 if-else 结构。在 R 中，if 和 else 不是矢量化的，这意味着它们检查单个条件是否为真（即，if("french"=="french") 是可以的），并且不能处理多个逻辑（即，if(c("french","foreigner")=="french") 是不行的）。因此，R 会给出您收到的警告。

相比之下，ifelse 是矢量化的，因此它可以接受您的向量（也称为输入变量）并测试每个元素上的逻辑条件，就像您在 SAS 中习惯的那样。另一种理解这一点的方法是使用 if 和 else 语句构建循环（就像您在这里开始的那样），但矢量化的 ifelse 方法将更有效率并且通常涉及较少的代码。

- Thomas

你好，R中的IF和ELSE没有向量化，这就是为什么我会收到关于长度>1和仅记录第一个TRUE参数的警告。我将尝试使用IFELSE的提示，尽管Tomas Greif的提示也很有效。 - balour

9

如果数据集包含许多行，使用 data.table 连接查找表可能比嵌套的 ifelse() 更有效。

提供下面的查找表。

lookup

     idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign

以及一个样本数据集

library(data.table)
n_row <- 10L
set.seed(1L)
DT <- data.table(idnat = "french",
                 idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE))
DT[idbp == "foreign", idnat := "foreign"][]

      idnat     idbp
 1:  french   colony
 2:  french   colony
 3:  french overseas
 4: foreign  foreign
 5:  french mainland
 6: foreign  foreign
 7: foreign  foreign
 8:  french overseas
 9:  french overseas
10:  french mainland

然后我们可以执行一个连接时更新。

DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]

      idnat     idbp   idnat2
 1:  french   colony overseas
 2:  french   colony overseas
 3:  french overseas overseas
 4: foreign  foreign  foreign
 5:  french mainland mainland
 6: foreign  foreign  foreign
 7: foreign  foreign  foreign
 8:  french overseas overseas
 9:  french overseas overseas
10:  french mainland mainland

- Uwe

8

你可以使用if和ifelse以外的方法来创建向量idnat2。

可以使用函数replace将所有出现的"colony"替换为"overseas":

idnat2 <- replace(idbp, idbp == "colony", "overseas")

- Sven Hohenstein

1

df$idnat2 <- df$idbp; 如果df$idnat == 'colony'，则df$idnat2[df$idnat == 'colony'] <- 'overseas' - Jaap

7

使用dplyr和sqldf包的SQL CASE语句：

数据

df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", 
"french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 
2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", 
"idbp"), class = "data.frame", row.names = c(NA, -4L))

sqldf

library(sqldf)
sqldf("SELECT idnat, idbp,
        CASE 
          WHEN idbp IN ('colony', 'overseas') THEN 'overseas' 
          ELSE idbp 
        END AS idnat2
       FROM df")

dplyr

library(dplyr)
df %>% 
mutate(idnat2 = case_when(idbp == 'mainland' ~ "mainland", 
                          idbp %in% c("colony", "overseas") ~ "overseas", 
                         TRUE ~ "foreign"))

输出

    idnat     idbp   idnat2
1  french mainland mainland
2  french   colony overseas
3  french overseas overseas
4 foreign  foreign  foreign

- mpalanco

2

使用data.table，解决方案如下：

DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", 
        ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))]

ifelse是向量化的，而if-else则不是。在这里，DT是：

    idnat     idbp
1  french mainland
2  french   colony
3  french overseas
4 foreign  foreign

这将会得到：

   idnat     idbp   idnat2
1:  french mainland mainland
2:  french   colony overseas
3:  french overseas overseas
4: foreign  foreign  foreign

- Sun Bee

我认为更好的方法是：DT[, idnat2 := idbp][idbp %in% c('colony','overseas'), idnat2 := 'overseas'] - Jaap

2

DT[, idnat2 := idbp][idbp == 'colony', idnat2 := 'overseas'] - Jaap

另一种使用 data.table 的方法是与查找表进行连接：DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][] - Uwe

1

# Read in the data.

idnat=c("french","french","french","foreign")
idbp=c("mainland","colony","overseas","foreign")

# Initialize the new variable.

idnat2=as.character(vector())

# Logically evaluate "idnat" and "idbp" for each case, assigning the appropriate level to "idnat2".

for(i in 1:length(idnat)) {
  if(idnat[i] == "french" & idbp[i] == "mainland") {
    idnat2[i] = "mainland"
} else if (idnat[i] == "french" & (idbp[i] == "colony" | idbp[i] == "overseas")) {
  idnat2[i] = "overseas"
} else {
  idnat2[i] = "foreign"
} 
}

# Create a data frame with the two old variables and the new variable.

data.frame(idnat,idbp,idnat2)

- Azul

1

这些例子的解释对我很有帮助，但问题在于当我复制时它并没有起作用，所以我不得不用几种方法来调整它才能正常工作。(我对R非常陌生，并且由于缺乏知识而在第三个ifelse上遇到了一些问题)。

所以对于那些初学者在使用R时遇到问题的人...

   ifelse(x < -2,"pretty negative", ifelse(x < 1,"close to zero", ifelse(x < 3,"in [1, 3)","large")##all one line
     )#normal tab
)

我在一个函数中使用了这个代码，所以"ifelse..."向右缩进了一个制表符，但最后一个")"完全靠左。

- Tiffany T

1

只是提醒一下——在进行数字分组时，使用cut可能会更好。您可以将其重写为cut(x, breaks = c(-Inf, -2, 1, 3, Inf), labels = c("非常负数", "接近零", "在[1, 3)之间", "很大"))。如果只有一两个嵌套，ifelse同样适用，但如果需要更深的嵌套，则使用cut可以减轻跟踪所有嵌套和括号的负担。 - Gregor Thomas

谢谢，我之前没有使用过cut函数，它似乎将数据分成了(-inf,-2],(-2,1],(1,3],(3,inf]这些区间，只要区间表述为“x <= some Z”，那么这个函数就能很好地工作。我测试了反转breaks和labels、仅标签和仅断点等情况，但都没有得到我所需的[-inf,-2),[-2,1),[1,3),[3,inf)这样的结果区间...但在实际应用中，cut函数似乎更好用。 - Tiffany T

cut 函数还有一个参数 right（默认为 TRUE），表示区间右端点是闭合的。将 right = FALSE 设置为 [-inf,-2),[-2,1),[1,3),[3,inf)。这里的 -Inf 和 Inf 边界不会影响，但您也可以使用 include.lowest 来切换两个极值是否闭合。更多详情请参见 ?cut。 - Gregor Thomas

0

我编写了一个嵌套if-else语句的函数。该函数未经过速度优化，但我认为它可能对其他人有用。

ifelse_nested <- function(...) {
  args <- list(...)
  nargs <- length(args)
  
  default_ind <- nargs
  condition_inds <- which(seq_len(nargs) %% 2 == 1)
  condition_inds <- condition_inds[-length(condition_inds)] # remove default_ind
  value_inds <- which(seq_len(nargs) %% 2 == 0)
  
  .init <- args[[default_ind]]
  .x <- mapply(
    function(icond_ind, ivalue_ind) {
      return(list(condition=args[[icond_ind]], value=args[[ivalue_ind]]))
    }
    , icond_ind=condition_inds
    , ivalue_ind=value_inds
    , SIMPLIFY = FALSE
  ) # generate pairs of conditions & resulting-values
  
  out <- Reduce(
    function(x, y) ifelse(x$condition, x$value, y)
    , x = .x
    , init=.init
    , right=TRUE
  )
  
  return(out)
}

例如：

x <- seq_len(10)
ifelse_nested(x%%2==0, 2,x%%3==0, x^2, 0)

- drakethrice

-1

很抱歉我来晚了。这是一个简单的解决方案。

#building up your initial table
idnat <- c(1,1,1,2) #1 is french, 2 is foreign

idbp <- c(1,2,3,4) #1 is mainland, 2 is colony, 3 is overseas, 4 is foreign

t <- cbind(idnat, idbp)

#the last column will be a vector of row length = row length of your matrix
idnat2 <- vector()

#.. and we will populate that vector with a cursor

for(i in 1:length(idnat))

     #*check that we selected the cursor to for the length of one of the vectors*

{  

  if (t[i,1] == 2) #*this says: if idnat = foreign, then it's foreign*

    {

      idnat2[i] <- 3 #3 is foreign

    }

  else if (t[i,2] == 1) #*this says: if not foreign and idbp = mainland then it's mainland*

    {

      idnat2[i] <- 2 # 2 is mainland  

    }

  else #*this says: anything else will be classified as colony or overseas*

    {

      idnat2[i] <- 1 # 1 is colony or overseas 

    }

}


cbind(t,idnat2)

- Jorge Lopez

1

简单易懂，是的。但也很啰嗦、不地道...而且没有很好的说明（为什么使用这些整数而不是问题中提供的数据？）还有就是与Azul的答案重复了，虽然基本上采用了相同的方法，但是处理的是问题中的文本数据而不是整数... - Gregor Thomas

因为这样做很麻烦，Gregor。看到了吗？我们可以用多少种美妙的方式进行交流... Azul的... Jorge的... Gregor的... - Jorge Lopez

由于对于我、你和OP来说，什么更合乎逻辑是个人选择的问题。问候Gregor。 - Jorge Lopez

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tomas Greif · Accepted Answer

如果您正在使用任何电子表格应用程序，都有一个基本函数if()，其语法如下：

if(<condition>, <yes>, <no>)

ifelse()在R中的语法与原始英文相同:

ifelse(<condition>, <yes>, <no>)

在电子表格应用程序中，与 if() 唯一的不同之处在于 R 中的 ifelse() 是矢量化的（以向量作为输入并返回向量作为输出）。考虑以下电子表格应用程序和 R 中公式的比较，以便比较 a 是否大于 b，并在是时返回 1，否则返回 0 的示例。

  A  B C
1 3  1 =if(A1 > B1, 1, 0)
2 2  2 =if(A2 > B2, 1, 0)
3 1  3 =if(A3 > B3, 1, 0)

在R中：

> a <- 3:1; b <- 1:3
> ifelse(a > b, 1, 0)
[1] 1 0 0

ifelse()可以以多种方式嵌套使用：

ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>))

ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>)

ifelse(<condition>, 
       ifelse(<condition>, <yes>, <no>), 
       ifelse(<condition>, <yes>, <no>)
      )

ifelse(<condition>, <yes>, 
       ifelse(<condition>, <yes>, 
              ifelse(<condition>, <yes>, <no>)
             )
       )

要计算列 idnat2，您可以：

df <- read.table(header=TRUE, text="
idnat idbp idnat2
french mainland mainland
french colony overseas
french overseas overseas
foreign foreign foreign"
)

with(df, 
     ifelse(idnat=="french",
       ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign")
     )

R文档

条件长度大于1，只有第一个元素会被使用是什么意思？让我们来看一下：

> # What is first condition really testing?
> with(df, idnat=="french")
[1]  TRUE  TRUE  TRUE FALSE
> # This is result of vectorized function - equality of all elements in idnat and 
> # string "french" is tested.
> # Vector of logical values is returned (has the same length as idnat)
> df$idnat2 <- with(df,
+   if(idnat=="french"){
+   idnat2 <- "xxx"
+   }
+   )
Warning message:
In if (idnat == "french") { :
  the condition has length > 1 and only the first element will be used
> # Note that the first element of comparison is TRUE and that's whay we get:
> df
    idnat     idbp idnat2
1  french mainland    xxx
2  french   colony    xxx
3  french overseas    xxx
4 foreign  foreign    xxx
> # There is really logic in it, you have to get used to it

我还能使用if()吗？是的，你可以使用，但语法可能不那么酷炫 :)

test <- function(x) {
  if(x=="french") {
    "french"
  } else{
    "not really french"
  }
}

apply(array(df[["idnat"]]),MARGIN=1, FUN=test)

如果你熟悉SQL，你也可以在 sqldf 包中使用 CASE 语句。