我正在处理一个问题陈述,需要我填充缺失日期的行(即在 Pandas 数据帧的列中两个日期之间的日期)。请参见下面的示例。我正在使用 Pandas 进行当前方法(如下所述)。
输入数据示例(大约有 25000 行):
预期输出:
我知道另一种更传统的方法来实现这个(我的当前方法):
A | B | C | Date1 | Date2
a1 | b1 | c1 | 1Jan1990 | 15Aug1990 <- this row should be repeated for all dates between the two dates
.......................
a3 | b3 | c3 | 11May1986 | 11May1986 <- this row should NOT be repeated. Just 1 entry since both dates are same.
.......................
a5 | b5 | c5 | 1Dec1984 | 31Dec2017 <- this row should be repeated for all dates between the two dates
..........................
..........................
预期输出:
A | B | C | Month | Year
a1 | b1 | c1 | 1 | 1990 <- Since date 1 column for this row was Jan 1990
a1 | b1 | c1 | 2 | 1990
.......................
.......................
a1 | b1 | c1 | 7 | 1990
a1 | b1 | c1 | 8 | 1990 <- Since date 2 column for this row was Aug 1990
..........................
a3 | b3 | c3 | 5 | 1986 <- only 1 row since two dates in input dataframe were same for this row.
...........................
a5 | b5 | c5 | 12 | 1984 <- since date 1 column for this row was Dec 1984
a5 | b5 | c5 | 1 | 1985
..........................
..........................
a5 | b5 | c5 | 11 | 2017
a5 | b5 | c5 | 12 | 2017 <- Since date 2 column for this row was Dec 2017
我知道另一种更传统的方法来实现这个(我的当前方法):
- 迭代每一行。
- 获取两个日期列之间的天数差异。
- 如果两栏中的日期相同,则在输出数据框中仅包括该月和年份的一个行。
- 如果日期不同(差值 > 0),则为每个日期差异行获取所有(month, year)组合,并添加到新的数据框。