如何计算连续行之间的时间差?(涉及IT技术)

3

原始数据如下,我想按访问者和时间对其进行排序,以便在保存到新文件之前,在行之间计算时间差。

  visitor         v_time payment items
1    Jack 1/2/2018 16:07      35     3
2    Jack 1/2/2018 16:09     160     1
3   David 1/2/2018 16:12      25     2
4    Kate 1/2/2018 16:16       3     3
5   David 1/2/2018 16:21      25     5
6    Jack 1/2/2018 16:32      85     5
7    Kate 1/2/2018 16:33     639     3
8    Jack 1/2/2018 16:55       6     2

分组和排序已经完成,但是没有计算时间差,也没有保存文件。
visitor <- c("Jack", "Jack", "David", "Kate", "David", "Jack", "Kate", "Jack")
v_time <- c("1/2/2018 16:07","1/2/2018 16:09","1/2/2018 16:12","1/2/2018 16:16","1/2/2018 16:21","1/2/2018 16:32","1/2/2018 16:33", "1/2/2018 16:55")
payment <- c(35,160,25,3,25,85,639,6)
items <- c(3,1,2,3,5,5,3,2)
df <- data.frame(visitor, v_time, payment, items)

df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M")), diff_secs = as.numeric(diff, units = 'secs'))

write.csv(df,"C:/output.csv", row.names = F)

我的错误是什么,正确做法是什么?

# A tibble: 8 x 6
# Groups: visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12   25.0   2.00 NA            NA
2 David   1/2/2018 16:21   25.0   5.00 NA            NA
3 Jack    1/2/2018 16:07   35.0   3.00 NA            NA
4 Jack    1/2/2018 16:09  160     1.00 NA            NA
5 Jack    1/2/2018 16:32   85.0   5.00 NA            NA
6 Jack    1/2/2018 16:55    6.00  2.00 NA            NA
7 Kate    1/2/2018 16:16    3.00  3.00 NA            NA
8 Kate    1/2/2018 16:33  639     3.00 NA            NA

你期望什么结果? - Onyambu
1
只需在“lag”部分添加 default = strptime(v_time, "%d/%m/%Y %H:%M")[1] - Jaap
@Onyambu,我期望结果显示在列“diff”和“diff_secs”中,并且这两个新列保存在新文件中。 - Mark K
1
请使用as.POSIXct而不是strptime进行转换。 - akrun
1
df%>%group_by(visitor)%>%mutate(diff=c(0,diff(strptime(v_time,"%d/%m/%Y %H:%M")))) - Onyambu
4个回答

5
当您只是将default = strptime(v_time, "%d/%m/%Y %H:%M")[1]添加到lag部分时:
df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]),
         diff_secs = as.numeric(diff, units = 'secs'))

您可以获得预期的结果:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 540         540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 120         120.
5 Jack    1/2/2018 16:32     85.    5. 1380       1380.
6 Jack    1/2/2018 16:55      6.    2. 1380       1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 1020       1020.

另一个选项是使用 difftime
df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) %>%
  mutate(diff = difftime(strptime(v_time, "%d/%m/%Y %H:%M"), lag(strptime(v_time, "%d/%m/%Y %H:%M"), default = strptime(v_time, "%d/%m/%Y %H:%M")[1]), units = 'mins'),
         diff_secs = as.numeric(diff, units = 'secs'))

现在,diff列以分钟为单位,diff_sec列以秒为单位:

> df
# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor v_time         payment items diff   diff_secs
  <fct>   <fct>            <dbl> <dbl> <time>     <dbl>
1 David   1/2/2018 16:12     25.    2. 0             0.
2 David   1/2/2018 16:21     25.    5. 9           540.
3 Jack    1/2/2018 16:07     35.    3. 0             0.
4 Jack    1/2/2018 16:09    160.    1. 2           120.
5 Jack    1/2/2018 16:32     85.    5. 23         1380.
6 Jack    1/2/2018 16:55      6.    2. 23         1380.
7 Kate    1/2/2018 16:16      3.    3. 0             0.
8 Kate    1/2/2018 16:33    639.    3. 17         1020.

现在您可以使用write.csv(df,"C:/output.csv", row.names = FALSE)再次保存结果。


谢谢。它如何保存/导出到新文件? - Mark K
1
@MarkK 就像之前一样,只需执行 write.csv(df,"C:/output.csv", row.names = F) 即可。 - Jaap

2

错误来自于lag(strptime(v_time, "%d/%m/%Y %H:%M"))

错误信息:

# Error in format.POSIXlt(x, usetz = TRUE) : 
#  invalid component [[10]] in "POSIXlt" should be 'zone'

为避免这种情况,请尝试使用 strptime(lag(v_time), "%d/%m/%Y %H:%M")
df <- df %>%
    arrange(visitor, v_time) %>%
    group_by(visitor) %>%
    mutate(diff = strptime(v_time, "%d/%m/%Y %H:%M") - strptime(lag(v_time), "%d/%m/%Y %H:%M"), diff_secs = as.numeric(diff, units = 'secs'))
print(df)

输出:

# A tibble: 8 x 6
# Groups:   visitor [3]
  visitor         v_time payment items    diff diff_secs
   <fctr>         <fctr>   <dbl> <dbl>  <time>     <dbl>
1   David 1/2/2018 16:12      25     2 NA mins        NA
2   David 1/2/2018 16:21      25     5  9 mins       540
3    Jack 1/2/2018 16:07      35     3 NA mins        NA
4    Jack 1/2/2018 16:09     160     1  2 mins       120
5    Jack 1/2/2018 16:32      85     5 23 mins      1380
6    Jack 1/2/2018 16:55       6     2 23 mins      1380
7    Kate 1/2/2018 16:16       3     3 NA mins        NA
8    Kate 1/2/2018 16:33     639     3 17 mins      1020

在导出数据之前,不要忘记使用 df <- 将你的工作保存在 df 上。


谢谢您的指导。您介意我选择Jaap的答案作为这个问题的最佳答案吗?因为他获得了最多的赞数。 - Mark K
1
@MarkK 没问题。 - pe-perry

1
这是一个使用lubridate包的方法。
library(lubridate)
df$v_time <- mdy_hm(df$v_time)
df <- df %>%
  arrange(visitor, v_time) %>%
  group_by(visitor) 
df$diff <- rep(0,nrow(df))
for(i in 1:(nrow(df)-1)){
  df$diff[i+1] <- df$v_time[i+1]-df$v_time[i]
}
write.csv(df,"C:/output.csv", row.names = F)

1

这里有一个使用difftime选项的方法。我们使用lubridate中的dmy_hm将'v_time'转换为日期时间,然后在arrange之后,按'visitor'分组,使用difftime以秒为单位输出结果。

library(tidyverse)
out <- df %>% 
        mutate(v_time = dmy_hm(v_time)) %>% 
        arrange(visitor, v_time) %>% 
        group_by(visitor) %>%
        mutate(diff = difftime(v_time, lag(v_time, default = first(v_time)), units = "secs"))
# A tibble: 8 x 5
# Groups: visitor [3]
#  visitor v_time              payment items diff  
#  <fctr>  <dttm>                <dbl> <dbl> <time>
#1 David   2018-02-01 16:12:00   25.0   2.00 0     
#2 David   2018-02-01 16:21:00   25.0   5.00 540   
#3 Jack    2018-02-01 16:07:00   35.0   3.00 0     
#4 Jack    2018-02-01 16:09:00  160     1.00 120   
#5 Jack    2018-02-01 16:32:00   85.0   5.00 1380  
#6 Jack    2018-02-01 16:55:00    6.00  2.00 1380  
#7 Kate    2018-02-01 16:16:00    3.00  3.00 0     
#8 Kate    2018-02-01 16:33:00  639     3.00 1020  

然后,我们使用write_csv将其写入csv文件。
write_csv(out, "yourfile.csv")

谢谢。它如何保存/导出到新文件? - Mark K
感谢您的指导和理解。不知道您是否介意我选择Jaap的答案作为这个问题的最佳答案,因为他获得了最多的赞数?谢谢! - Mark K

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接