在lubridate中对多个时间间隔进行求和,且不计算重叠的时间。

4

我需要将同一观测中多个时间段的天数总和起来。我在StackOverflow上看到了许多关于这个任务的不同例子。但是,由于我必须在多个重叠的时间段中进行操作,并跨越多个时间间隔,因此我无法使用我的数据复制它们。

library(lubridate)
library(dplyr)

a <- c(as_date(0), as_date(8), as_date(80),as_date(60))
b <-c(as_date(2), as_date(20), as_date(100),as_date(80))
c <-c(as_date(1), as_date(16), as_date(95),as_date(85))
d <- c(as_date(100), as_date(19), as_date(120),as_date(100))
e <-c(as_date(0), as_date(50), as_date(101),as_date(65))
f <- c(as_date(150), as_date(100), as_date(200),as_date(200))

df <- data.frame(int.1 = interval(a, b), int.2 = interval(c, d), int.3 = interval(e, f))

我可以计算时间间隔的总时间,但重叠的时间也会被计入:

df %>%
  mutate(overlapping.time = int.1 %/% days(1) + int.2 %/% days(1) + int.3 %/% days(1))


                           int.1                          int.2                          int.3 overlapping.time
1 1970-01-01 UTC--1970-01-03 UTC 1970-01-02 UTC--1970-04-11 UTC 1970-01-01 UTC--1970-05-31 UTC              251
2 1970-01-09 UTC--1970-01-21 UTC 1970-01-17 UTC--1970-01-20 UTC 1970-02-20 UTC--1970-04-11 UTC               65
3 1970-03-22 UTC--1970-04-11 UTC 1970-04-06 UTC--1970-05-01 UTC 1970-04-12 UTC--1970-07-20 UTC              144
4 1970-03-02 UTC--1970-03-22 UTC 1970-03-27 UTC--1970-04-11 UTC 1970-03-07 UTC--1970-07-20 UTC              170


你能否更新你的帖子并附上预期输出? - Ronak Shah
你需要多少列来完成这个任务?int.2 可以完全包含在 int.1 中,或者完全包含在 int.1int.3 的并集中吗? - smingerson
1个回答

3
以下是一个名为 overlapping_days() 的函数,它将接受一组区间列并计算重叠天数的总和。请参见内联注释以了解其工作原理。它覆盖完全包含在另一个内的区间,部分重叠,并且不对列之间的关系做出任何假设。从您先前的计算结果中减去函数的结果将得到您想要的结果。请注意,我对您最初发布的数据进行了一些修改。
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

a <- c(as_date(0), as_date(1), as_date(80),as_date(60))
b <-c(as_date(20), as_date(22), as_date(100),as_date(80))
c <-c(as_date(1), as_date(16), as_date(95),as_date(85))
d <- c(as_date(3), as_date(19), as_date(120),as_date(100))
e <-c(as_date(0), as_date(50), as_date(101),as_date(65))
f <- c(as_date(150), as_date(100), as_date(200),as_date(200))

df <- data.frame(int.1 = interval(a, b), int.2 = interval(c, d), int.3 = interval(e, f))
overlapping_days <- function(...) {
  # Collect the vectors passed into a list
  ll <- list(...)
  # Create all possible 2-combinations for the number of columns passed in.
  combinations <- combn(length(ll), 2)
  # Create a column for each combination, and a row for each element in the vectors.
  overlaps <- matrix(data = 0, nrow = length(ll[[1]]), ncol = length(combinations))
  # Loop through the combinations
  iterations <- seq_len(ncol(combinations))
  for (k in iterations) {
    # I'll refer to each of these indices as intervals -- they each represent
    # a vector passed in.
    i <- combinations[1, k]
    j <- combinations[2, k]
    overlaps[,k] <- case_when(
      # If the interval i is within interval j, add i to the overlap
      ll[[i]] %within% ll[[j]] ~ ll[[i]] %/% days(1),
      # If the interval j is within interval i, add j to the overlap
      ll[[j]] %within% ll[[i]] ~ ll[[j]] %/% days(1),
      # If they overlap, either int_start(i) < int_end(j), or int_start(j) < int_end(i)
      # Calculate the appropriate difference -- these look backwards but
      # are needed so a positive number is produced.
      int_overlaps(ll[[i]], ll[[j]]) & int_start(ll[[j]]) < int_end(ll[[i]]) ~
        int_start(ll[[j]]) %--% int_end(ll[[i]]) %/% days(1),
      int_overlaps(ll[[j]], ll[[i]]) & int_start(ll[[i]]) < int_end(ll[[j]]) ~
        int_start(ll[[i]]) %--% int_end(ll[[j]]) %/% days(1),
      # If none of these are true, the intervals do not overlap and we add 0 to
      # the overlap amount.
        TRUE ~ 0
    )
  }
  # Sum across rows to get the total number of overlapping days.
  rowSums(overlaps)
}

df %>%
  mutate(overlapping.time = int.1 %/% days(1) + int.2 %/% days(1) + int.3 %/% days(1), overlap =  overlapping_days(int.1, int.2, int.3))
#> Note: method with signature 'Timespan#Timespan' chosen for function '%/%',
#>  target signature 'Interval#Period'.
#>  "Interval#ANY", "ANY#Period" would also be valid
#>                            int.1                          int.2
#> 1 1970-01-01 UTC--1970-01-21 UTC 1970-01-02 UTC--1970-01-04 UTC
#> 2 1970-01-02 UTC--1970-01-23 UTC 1970-01-17 UTC--1970-01-20 UTC
#> 3 1970-03-22 UTC--1970-04-11 UTC 1970-04-06 UTC--1970-05-01 UTC
#> 4 1970-03-02 UTC--1970-03-22 UTC 1970-03-27 UTC--1970-04-11 UTC
#>                            int.3 overlapping.time overlap
#> 1 1970-01-01 UTC--1970-05-31 UTC              172      24
#> 2 1970-02-20 UTC--1970-04-11 UTC               74       3
#> 3 1970-04-12 UTC--1970-07-20 UTC              144      24
#> 4 1970-03-07 UTC--1970-07-20 UTC              170      30

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接