R - 根据开始和结束日期序列复制行

8

I have a data frame "DF" like this:

Flight.Start   Flight.End   Device      Partner   Creative   Days.in.Flight 
2015-08-31     2015-08-31   Standard    MSN       Video      35

我需要做的是像这样"扩大它":
Flight.Start   Flight.End   Date         Device      Partner   Creative   Days.in.Flight 
2015-08-31     2015-10-04   2015-08-31   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-01   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-02   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-03   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-04   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-05   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-06   Standard    MSN       Video      35
2015-08-31     2015-10-04   2015-09-07   Standard    MSN       Video      35

当日期变量达到2015-10-04时,每一行都会被复制,然后继续进行下一个重复。

基本上每一行都会被复制成“飞行天数-1”的数量(因为已经存在的行可以说明该间隔内的单日情况,然后为该航班内相关日期填写新列“日期”。例如,如果一行的开始日期和结束日期分别为9/1和9/5,则会向已经存在的行追加4个重复行,创建一个新列(日期),并且原始行航班起始日期和结束日期的日期序列将填充列值。

所有日期值均格式化为日期,飞行天数为数字,其余均为因素。

编辑

针对重复问题标记的回答:

澄清一下,这不像被标记为重复的案例,因为我的问题实际上不是关注如何根据飞行天数进行复制(我已经知道了!),而是我如何向输出数据框添加列,并在相应的飞行期间顺序插入日期。谢谢提醒...


嘿@Jay,肯定不是,谢谢。我可能不应该包括有关复制行的所有内容,因为我知道如何使用expandRows(),但这个问题更多地涉及如何填写一个顺序日期列以配合该扩展。 - YungBoy
4个回答

9
这里有一种使用 splitstackshapedplyr 的方法。使用 splitstackshape 包中的 expandRows(),可以按照您描述的方式扩展数据框。然后,您需要使用 mutate() 添加一个日期序列。我将数据按照 Flight.StartFlight.End 的组合进行分组,并使用 seq() 为每个组创建日期序列。 first()Flight.StartFlight.End 的第一个元素。通过这种方式,您就可以创建所需的序列。希望这对您有所帮助。 数据和代码
mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
                   Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
                   Device = "Standard",
                   Creative = "Video",
                   Days.in.Flight = c(3, 6),
                   stringsAsFactors = FALSE)

#  Flight.Start Flight.End   Device Creative Days.in.Flight
#1   2015-09-01 2015-09-03 Standard    Video              3
#2   2015-09-10 2015-09-15 Standard    Video              6

library(splitstackshape)
library(dplyr)

expandRows(mydf, "Days.in.Flight", drop = FALSE) %>%
group_by(Flight.Start, Flight.End) %>%
mutate(Date = seq(first(Flight.Start),
                  first(Flight.End),
                  by = 1))

#  Flight.Start Flight.End   Device Creative Days.in.Flight       Date
#        (date)     (date)    (chr)    (chr)          (dbl)     (date)
#1   2015-09-01 2015-09-03 Standard    Video              3 2015-09-01
#2   2015-09-01 2015-09-03 Standard    Video              3 2015-09-02
#3   2015-09-01 2015-09-03 Standard    Video              3 2015-09-03
#4   2015-09-10 2015-09-15 Standard    Video              6 2015-09-10
#5   2015-09-10 2015-09-15 Standard    Video              6 2015-09-11
#6   2015-09-10 2015-09-15 Standard    Video              6 2015-09-12
#7   2015-09-10 2015-09-15 Standard    Video              6 2015-09-13
#8   2015-09-10 2015-09-15 Standard    Video              6 2015-09-14
#9   2015-09-10 2015-09-15 Standard    Video              6 2015-09-15

8

或者使用data.table,我们将'data.frame'转换为'data.table' (setDT(mydf)),通过'Days.in.Flight'的序列来复制行的顺序,基于该索引,我们对数据集进行子集处理(.SD[rep(...),按照'Flight.Start'和'Flight.End'进行分组,创建'Date'列。

library(data.table)
setDT(mydf)[, .SD[rep(1:.N, Days.in.Flight)]][, 
     Date:= seq(Flight.Start , Flight.End, by = '1 day'),
     by = .(Flight.Start, Flight.End)][]

2
这是一种使用基础R的方法:
```R ```
```
mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
                   Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
                   Device = "Standard",
                   Creative = "Video",
                   Days.in.Flight = c(3, 6),
                   stringsAsFactors = FALSE)

expanded <-mydf[rep(row.names(mydf), mydf$ Days.in.Flight), ]
data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))

> data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))
    Flight.Start Flight.End   Device Creative Days.in.Flight       Date
1     2015-09-01 2015-09-03 Standard    Video              3 2015-09-01
1.1   2015-09-01 2015-09-03 Standard    Video              3 2015-09-02
1.2   2015-09-01 2015-09-03 Standard    Video              3 2015-09-03
2     2015-09-10 2015-09-15 Standard    Video              6 2015-09-10
2.1   2015-09-10 2015-09-15 Standard    Video              6 2015-09-11
2.2   2015-09-10 2015-09-15 Standard    Video              6 2015-09-12
2.3   2015-09-10 2015-09-15 Standard    Video              6 2015-09-13
2.4   2015-09-10 2015-09-15 Standard    Video              6 2015-09-14
2.5   2015-09-10 2015-09-15 Standard    Video              6 2015-09-15

0

另一种方法是使用tidyverse包中的dplyrtidyrpurrr

library(dplyr, quietly = TRUE, warn.conflicts = FALSE)

mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
                   Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
                   Device = "Standard",
                   Creative = "Video",
                   Days.in.Flight = c(3, 6),
                   stringsAsFactors = FALSE)


mydf %>% 
  mutate(id = row_number()) %>% 
  group_by(id) %>% 
  tidyr::nest() %>% 
  mutate(
    Date = purrr::map(
      data, ~ seq(.x$Flight.Start, .x$Flight.End, by = "1 day")
    )) %>% 
  tidyr::unnest(c(data, Date))
#> # A tibble: 9 x 7
#> # Groups:   id [2]
#>      id Flight.Start Flight.End Device   Creative Days.in.Flight Date      
#>   <int> <date>       <date>     <chr>    <chr>             <dbl> <date>    
#> 1     1 2015-09-01   2015-09-03 Standard Video                 3 2015-09-01
#> 2     1 2015-09-01   2015-09-03 Standard Video                 3 2015-09-02
#> 3     1 2015-09-01   2015-09-03 Standard Video                 3 2015-09-03
#> 4     2 2015-09-10   2015-09-15 Standard Video                 6 2015-09-10
#> 5     2 2015-09-10   2015-09-15 Standard Video                 6 2015-09-11
#> 6     2 2015-09-10   2015-09-15 Standard Video                 6 2015-09-12
#> 7     2 2015-09-10   2015-09-15 Standard Video                 6 2015-09-13
#> 8     2 2015-09-10   2015-09-15 Standard Video                 6 2015-09-14
#> 9     2 2015-09-10   2015-09-15 Standard Video                 6 2015-09-15

reprex package (v2.0.1)于2022年2月21日创建


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接