我正在处理泰坦尼克号数据集,试图填补年龄值。我的数据框如下:
Dataframe df
Survived Pclass Age SibSp Parch Fare male Q S Title
0 0 3 22.0 1 0 7.2500 1 0 1 Mr
1 1 1 38.0 1 0 71.2833 0 0 0 Mrs
2 1 3 26.0 0 0 7.9250 0 0 1 Miss
3 1 1 35.0 1 0 53.1000 0 0 1 Mrs
4 0 3 35.0 0 0 8.0500 1 0 1 Mr
5 0 3 NaN 0 0 8.4583 1 1 0 Mr
并且
DataFrame age_df
3 1 2
Mr 28.7249 41.5805 32.7683
Mrs 33.5152 40.8824 33.6829
Miss 16.1232 30 22.3906
Master 5.35083 5.30667 2.25889
Don 40 40 40
Rev 43.1667 43.1667 43.1667
Dr 42 43.75 38.5
Mme 24 24 24
Ms 28 28 28
Major 48.5 48.5 48.5
Lady 48 48 48
Sir 49 49 49
Mlle 24 24 24
Col 58 58 58
Capt 70 70 70
Countess 33 33 33
Jonkheer 38 38 38
我想用来自age_df的相应值基于df['Title']和df['Pclass']填充df['Age']中的缺失值。
我已经想出了下面的方法,但是没有任何NaN被覆盖。
for tit in df['Title'].unique():
for cls in [1,2,3]:
df.loc[ (df['Age'].isna() == True) &
(df['Title'] == tit) &
(df['Pclass'] == cls)]['Age'] = age_df.loc[tit][cls]
此外,我认为这不应该使用嵌套循环来完成。我应该如何做?