我有以下数据框:
fid date stage
test_fid 4/22/2019 a1
test_fid 4/23/2019 a1
test_fid 4/24/2019 a2
test_fid 4/25/2019 a2
test_fid 4/26/2019 a2
test_fid 4/27/2019 a3
test_fid 4/28/2019 a3
test_fid 4/29/2019 a3
test_fid1 4/30/2019 a1
test_fid1 5/1/2019 a1
test_fid1 5/2/2019 a1
test_fid1 5/3/2019 a1
test_fid1 5/4/2019 a2
test_fid1 5/5/2019 a2
test_fid1 5/6/2019 a2
test_fid1 5/7/2019 a2
test_fid1 5/8/2019 a3
test_fid1 5/9/2019 a3
test_fid1 5/10/2019 a3
我想确定阶段列值开始和结束的日期,例如test_fid从2019年4月22日到2019年4月23日有a1阶段。结果应该如下所示:
fid stage start_date end_date
test_fid a1 4/22/2019 4/23/2019
test_fid a2 4/24/2019 4/26/2019
test_fid a3 4/27/2019 4/29/2019
test_fid1 a1 4/30/2019 5/3/2019
test_fid1 a2 5/4/2019 5/7/2019
test_fid1 a3 5/8/2019 5/10/2019
我尝试了这个:
df['stage_change'] = df['stage'].diff()
df_filtered = df[df['stage_change'] != 0]
sort_values['fid', 'date', 'stage'])
会更安全。 - Erfan