从一个包含数值和名义数据的数据框架:
>>> from pandas import pd
>>> d = {'m': {0: 'M1', 1: 'M2', 2: 'M7', 3: 'M1', 4: 'M2', 5: 'M1'},
'qj': {0: 'q23', 1: 'q4', 2: 'q9', 3: 'q23', 4: 'q23', 5: 'q9'},
'Budget': {0: 39, 1: 15, 2: 13, 3: 53, 4: 82, 5: 70}}
>>> df = pd.DataFrame.from_dict(d)
>>> df
Budget m qj
0 39 M1 q23
1 15 M2 q4
2 13 M7 q9
3 53 M1 q23
4 82 M2 q23
5 70 M1 q9
get_dummies函数将分类变量转换为虚拟/指标变量:
>>> df_dummies = pd.get_dummies(df)
>>> df_dummies
Budget m_M1 m_M2 m_M7 qj_q23 qj_q4 qj_q9
0 39 1 0 0 1 0 0
1 15 0 1 0 0 1 0
2 13 0 0 1 0 0 1
3 53 1 0 0 1 0 0
4 82 0 1 0 1 0 0
5 70 1 0 0 0 0 1
如何在不失优雅的前提下,从 df_dummies 返回到 df?
>>> (back_from_dummies(df_dummies) == df).all()
Budget True
m True
qj True
dtype: bool