我有一个包含重复索引值的数据帧(DataFrame):
df1 = pd.DataFrame( np.random.randn(6,6), columns = pd.date_range('1/1/2010', periods=6), index = {"A", "B", "C", "D", "E", "F"})
df1.rename(index = {"C": "A", "B": "E"}, inplace = 1)
ipdb> df1
2010-01-01 2010-01-02 2010-01-03 2010-01-04 2010-01-05 2010-01-06
A -1.163883 0.593760 2.323342 -0.928527 0.058336 -0.209101
A -0.593566 -0.894161 -0.789849 1.452725 0.821477 -0.738937
E -0.670305 -1.788403 0.134790 -0.270894 0.672948 1.149089
F 1.707686 0.323213 0.048503 1.168898 0.002662 -1.988825
D 0.403028 -0.879873 -1.809991 -1.817214 -0.012758 0.283450
E -0.224405 -1.803301 0.582946 0.338941 0.798908 0.714560
我只想更改重复值的名称,并获得如下所示的DataFrame:
ipdb> df1
2010-01-01 2010-01-02 2010-01-03 2010-01-04 2010-01-05 2010-01-06
A -1.163883 0.593760 2.323342 -0.928527 0.058336 -0.209101
A_dp -0.593566 -0.894161 -0.789849 1.452725 0.821477 -0.738937
E -0.670305 -1.788403 0.134790 -0.270894 0.672948 1.149089
F 1.707686 0.323213 0.048503 1.168898 0.002662 -1.988825
D 0.403028 -0.879873 -1.809991 -1.817214 -0.012758 0.283450
E_dp -0.224405 -1.803301 0.582946 0.338941 0.798908 0.714560
我的方法:
(i) 创建一个新名称的字典
old_names = df1[df1.index.duplicated()].index.values
new_names = df1[df1.index.duplicated()].index.values + "_dp"
dictionary = dict(zip(old_names, new_names))
(ii) 仅重命名重复值
df1.loc[df1.index.duplicated(),:].rename(index = dictionary, inplace = True)
但是这似乎不起作用。
df1['col'] = df1['col'] + df1.groupby(['col']).cumcount().astype(str).replace('0','')
- DuCorey