在R中使用最大值和最小值组织数据

3

我有一个表格,像这样:

enter image description here

这个表格是由以下代码生成的:

id <- c("1","2","1","2","1","1")
status <- c("open","open","closed","closed","open","closed")
date <- c("11-10-2017 15:10","10-10-2017 12:10","12-10-2017 22:10","13-10-2017 06:30","13-10-2017 09:30","13-10-2017 10:30")
data <- data.frame(id,status,date)
hour <- data.frame(do.call('rbind', strsplit(as.character(data$date),' ',fixed=TRUE)))
hour <- hour[,2]
hour <- as.POSIXlt(hour, format = "%H:%M") 

我想要实现的是选择每个ID的最早开放时间最晚关闭时间。因此,最终结果将如下所示:
enter image description here
目前我使用sqldf来解决这个问题:
sqldf("select * from (select id, status, date as closeDate, max(hour) as hour from data 
  where status='closed'
   group by id,status) as a
   join 
   (select id, status, date  as openDate, min(hour) as hour from data 
   where status='open'
   group by id,status) as b
  using(id);")

问题1:有没有更简单的方法做到这一点?

问题2:如果我选择将max(hour)命名为其他名称而不是hour,则结果不会以日期和时间格式呈现,而是一系列数字,如15078642001507807800。如何在为列指定不同名称的同时保留时间格式?


2
你的意思是hour应该是数据中的一列吗?也许你忘记了加上data$hour <- hour这一行代码? - Gregor Thomas
1个回答

0
使用plyr包:
(由于某种原因,如此处所示,您必须将小时转换为as.POSIXct类,否则会收到错误消息):
#add hour to data.frame:
data$hour <- as.POSIXct(hour)
library(plyr)
ddply(data, .(id), summarize, open=min(hour[status=="open"]),
     closed=max(hour[status=="closed"]))

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接