我有一个类似下面的DataFrame,并且想要在它上面添加一个“连胜”列(参见下面的示例):
Date Home_Team Away_Team Winner Streak
2005-08-06 A G A 0
2005-08-06 B H H 0
2005-08-06 C I C 0
2005-08-06 D J J 0
2005-08-06 E K K 0
2005-08-06 F L F 0
2005-08-13 A B A 1
2005-08-13 C D D 1
2005-08-13 E F F 0
2005-08-13 G H H 0
2005-08-13 I J J 0
2005-08-13 K L K 1
2005-08-20 B C B 0
2005-08-20 A D A 2
2005-08-20 G K K 0
2005-08-20 I E E 0
2005-08-20 F H F 2
2005-08-20 J L J 2
2005-08-27 A H A 3
2005-08-27 B F B 1
2005-08-27 J C C 3
2005-08-27 D E D 0
2005-08-27 I K K 0
2005-08-27 L G G 0
2005-09-05 B A A 2
2005-09-05 D C D 1
2005-09-05 F E F 0
2005-09-05 H G H 0
2005-09-05 J I I 0
2005-09-05 K L K 4
这个DataFrame有大约20万行数据,时间跨度从2005年到2020年。
现在,我想要做的是,在DataFrame中找到在“Date”列中日期之前主队连续获胜的场次。 我有一个解决方案,但速度太慢,如下所示:
df["Streak"] = 0
def home_streak(x): # x is a row of the DataFrame
"""Keep track of a team's winstreak"""
home_team = x["Home_Team"]
date = x["Date"]
# all previous matches for the home team
home_df = df[(df["Home_Team"] == home_team) | (df["Away_Team"] == home_team)]
home_df = home_df[home_df["Date"] < date].sort_values(by="Date", ascending=False).reset_index()
if len(home_df.index) == 0: # no previous matches for that team, so start streak at 0
return 0
elif home_df.iloc[0]["Winner"] != home_team: # lost the last match
return 0
else: # they won the last game
winners = home_df["Winner"]
streak = 0
for i in winners.index:
if home_df.iloc[i]["Winner"] == home_team:
streak += 1
else: # they lost, return the streak
return streak
df["Streak"] = df.apply(lambda x: home_streak(x), axis = 1)
我该如何加快这个过程的速度?
A
作为客队获胜会发生什么?如果它输了呢?这是否会继续/结束连胜纪录?还是信息会丢失? - Mad Physicist