在R中按组进行变异

3

我希望通过传感器数据来识别碎片并赋予它们一个ID。因此,我想按Sensor列对以下数据集进行分组,并查看Value行是否从0变为1。当它这样做时,第一块被识别,并且caseid切换为1(如手工列中的caseid)。只要值保持为1,它将保持为1。当它变为0时,它应该切换回0。 在下一个从0到1的切换时,caseid应该变成2,因为传感器识别出第二块,依此类推..

time = c("07:00:01","07:00:01","07:00:01","07:00:02","07:00:02","07:00:02","07:00:03","07:00:03","07:00:03","07:00:04",
     "07:00:04","07:00:04","07:00:05","07:00:05","07:00:05","07:00:06","07:00:06","07:00:06","07:00:07","07:00:07",
     "07:00:07","07:00:08","07:00:08","07:00:08","07:00:09","07:00:09","07:00:09")
sensor = c(10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,
       10001,10002,10003,10001,10002,10003,10001,10002,10003)
values = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,1,1,0,1)
caseid = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,2,0,1,2,0,1)

data = data.frame(time,sensor,values,caseid)

(所以我想获取的是data$caseid这个数据)

我认为可以通过分组实现,但我没有做对,所以我选择了另一种(粗略)的方法。这就是我得到的结果。

data%>% 
filter(Sensor=="10002") -> sensor_data_temp

sensor_data_temp$CaseID2 <- NA 
case_id = 1

for(i in 1:nrow(sensor_data_temp)){

   current_value <- sensor_data_temp[i,"values"]
   next_value <- sensor_data_temp[i+1,"values"]

   if(i+1 > nrow(sensor_data_temp)){
     break
   }

   if(current_value==0 & next_value==1 || current_value==1 & next_value==1){
     sensor_data_temp$CaseID2[i+1] <- case_id
   }
   else if(current_value==1 & next_value==0){
     sensor_data_temp$CaseID2[i+1] <- 0
     case_id = case_id +1
   }
   else{
     sensor_data_temp$CaseID2[i+1] <- 0
   }

}

我认为这是我可以获取一个传感器的案例ID的方式。但我不知道如何将每个传感器都管理到一个数据框中(就像上面那个)。

我相信有一种更优雅的方法来实现我的目标。

希望有人能帮助我。提前谢谢!:)


我认为你需要使用dplyr中的lead函数。 - akrun
2个回答

3
这是一种方法:
library(dplyr)

mutate(group_by(arrange(data, sensor, time), sensor),
       caseID = case_when(values != 0 ~ cumsum(diff(c(0, values)) > 0),
                          TRUE ~ 0L))

太完美了!难以想象它是如此简单^^ 我理解case_when是矢量化的,但我不太明白它是如何工作的。因为cumsum(diff(c(0, values)) > 0)给我一个向量。但它是如何将这个向量与一个单独的列条目匹配的呢?非常感谢您的回答! - Jonas Pirner
@JonasPirner case_when 已经有很好的文档说明了;?case_when 的解释比我更好。 - Ista

1
这是一个使用 data.table 的解决方案。
library("data.table")

data <- data.table(
  time = c("07:00:01","07:00:01","07:00:01","07:00:02","07:00:02","07:00:02","07:00:03","07:00:03","07:00:03","07:00:04",
         "07:00:04","07:00:04","07:00:05","07:00:05","07:00:05","07:00:06","07:00:06","07:00:06","07:00:07","07:00:07",
         "07:00:07","07:00:08","07:00:08","07:00:08","07:00:09","07:00:09","07:00:09"),
  sensor = c(10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,10001,10002,10003,
           10001,10002,10003,10001,10002,10003,10001,10002,10003),
  values = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,1,1,0,1),
  caseid = c(0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,2,0,1,2,0,1))

data[, caseID:=ifelse(values==0, 0, cumsum(diff(c(0, values))==1)), sensor][]

而且不需要使用 ifelse()

data[, caseID:= { v <- rep(0, .N); v[values==1] <- cumsum(diff(c(0, values))==1)[values==1]; v }, sensor][]

1
非常好!谢谢!我一定要去看看data.table!我刚刚才接触过tidyverse ^^ - Jonas Pirner

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接