我的原始数据看起来像这样。
id season home_team away_team home_goals away_goals result winner
0 0 2006-07 Shu Liv 1 1 D NaN
1 1 2006-07 Ars Avl 1 1 D NaN
2 2 2006-07 Eve Wat 2 1 H Eve
3 3 2006-07 New Wig 2 1 H New
4 4 2006-07 Por Bla 3 0 H Por
目的是建立一个模型来预测,即
Home Team Win 55%
Draw 13%
Away Team Win 32%
我选择了这三列并对它们进行了标签编码。
home_team, away_team, winner
然后我创建了这些新的类/标签。
df.loc[df["winner"]==df["home_team"],"home_team_win"]=1
df.loc[df["winner"]!=df["home_team"],"home_team_win"]=0
df.loc[df["result"]=='D',"draw"]=1
df.loc[df["result"]!='D',"draw"]=0
df.loc[df["winner"]==df["away_team"],"away_team_win"]=1
df.loc[df["winner"]!=df["away_team"],"away_team_win"]=0
现在编码后的数据看起来像这样:
home_team away_team home_team_win away_team_win draw
0 28 19 0 0 1
1 1 2 0 0 1
2 14 34 1 0 0
3 23 37 1 0 0
4 25 4 1 0 0
最初,我使用下面的代码针对单个标签'home_team_win',它工作得很好,但不支持多类别/标签。
X = prediction_df.drop(['home_team_win'] ,axis=1)
y = prediction_df['home_team_win']
logReg=LogisticRegression(solver='lbfgs')
rfe = RFE(logReg, 20)
rfe = rfe.fit(X, y.values.ravel())
如何对这个问题进行多标签分类或多类分类?