背景: 实际上我想修改数据框中的值,只保留前20项运动,并将其余项显示为“其他”。 这是现有列的副本,如下所示:
athlete_events['Sport_modified'] = athlete_events['Sport']
生成包含前20个运动名称的筛选器如下:
top20_sport = athlete_events['Sport'].value_counts().head(20).index
修改过程如下所示: 方法一:
def classify_sports(cols, filters):
for i in cols:
if i in filters:
pass
else:
i = 'Others'
classify_sports(athlete_events.Sport_modified, top20_sport)
方法二:
athlete_events.Sport_modified.apply(lambda x : x if x in top20_sport else 'Others')
然而,上述两种方法均未能起效。我只能像以下代码一样实现此功能:
athlete_events.loc[
(athlete_events['Sport'] !='Athletics')&
(athlete_events['Sport'] !='Gymnastics')&
(athlete_events['Sport'] !='Swimming')&
(athlete_events['Sport'] !='Shooting')&
(athlete_events['Sport'] !='Cycling')&
(athlete_events['Sport'] !='Fencing')&
(athlete_events['Sport'] !='Rowing')&
(athlete_events['Sport'] !='Cross Country Skiing')&
(athlete_events['Sport'] !='Alpine Skiing')&
(athlete_events['Sport'] !='Wrestling')&
(athlete_events['Sport'] !='Football')&
(athlete_events['Sport'] !='Sailing')&
(athlete_events['Sport'] !='Equestrianism')&
(athlete_events['Sport'] !='Canoeing')&
(athlete_events['Sport'] !='Boxing')&
(athlete_events['Sport'] !='Speed Skating')&
(athlete_events['Sport'] !='Ice Hockey')&
(athlete_events['Sport'] !='Hockey')&
(athlete_events['Sport'] !='Biathlon')&
(athlete_events['Sport'] !='Basketball')
,'Sport_modified'] = 'Others'
那两种方法存在什么问题?谢谢帮助。