假设我有一个 Pandas 数据框(Dataframe):
df = pd.DataFrame({'Company': ['company A']*5 + ['company B']*5,
'Date': ['01.01.2020', '01.02.2020', '01.03.2020', '01.04.2020', '01.05.2020'] +
['01.04.2020', '01.05.2020', '01.06.2020', '01.07.2020', '01.08.2020'],
'Revenue': np.random.rand(1, 10)[0]*10000})
Company Date Revenue
0 company A 01.01.2020 5033.243098
1 company A 01.02.2020 5967.112256
2 company A 01.03.2020 6328.425874
3 company A 01.04.2020 7289.514777
4 company A 01.05.2020 9642.728016
5 company B 01.04.2020 805.708717
6 company B 01.05.2020 162.177508
7 company B 01.06.2020 7549.296095
8 company B 01.07.2020 4398.211089
9 company B 01.08.2020 1651.938946
目标是获得一个DF,其中排除了每个公司的前N个月:
Company Date Revenue
2 company A 01.03.2020 5731.949686
3 company A 01.04.2020 4300.537741
4 company A 01.05.2020 4283.022397
7 company B 01.06.2020 8011.727731
8 company B 01.07.2020 1935.579432
9 company B 01.08.2020 3866.649045
例如像这样:
for company in df['Company'].unique():
company_df = df[df['Company'] == company].sort_values(by='Date')
ind_to_drop = company_df.iloc[:2].index
df = df.drop(ind_to_drop)
我正在寻找一种更有效率的方法。