Pandas- 基于参考字典重复数据框列

3

我需要根据一个参考字典来重命名和复制我的数据框列。下面我创建了一个虚拟数据框:

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')

        entity  entity2  entity3
id                              
json   present  present   absent
molly   absent  present   absent
tina    absent  present   absent
jake   present   absent  present
molly  present   absent   absent

现在我有以下示例字典:
ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

我需要按照字典的值替换列名,如果一个列有多个值,则应该重复该列。以下是我期望的数据框:

       entity_exp1  entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                      
json    present      present      present      absent      absent    absent
molly   absent       present      present      absent      absent    absent
tina    absent       present      present      absent      absent    absent
jake    present      absent       absent       present     present   present
molly   present      absent       absent       absent      absent    absent

感谢您接受我的答案。也请随意投票支持该答案。 - piRSquared
谢谢piRSquared。你总是有最棒的解决方案。 - Rtut
4个回答

2

选项 1
使用字典推导式上的 pd.concat

pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1)

      entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1
id                                                                                
json       present      present       absent       absent       absent     present
molly      present      present       absent       absent       absent      absent
tina       present      present       absent       absent       absent      absent
jake        absent       absent      present      present      present     present
molly       absent       absent       absent       absent       absent     present

选项2
切片数据框并重命名列

repeats = df.columns.map(lambda x: len(ref_dict[x]))
d1 = df.reindex_axis(df.columns.repeat(repeats), 1)
d1.columns = df.columns.map(ref_dict.get).values.sum()
d1

      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                                                                                
json      present      present      present       absent       absent       absent
molly      absent      present      present       absent       absent       absent
tina       absent      present      present       absent       absent       absent
jake      present       absent       absent      present      present      present
molly     present       absent       absent       absent       absent       absent

0

对于 df 中的每一列,您可以查找在 ref_dict 中的新列数,并为它们创建 新列,最后删除旧列。您可以尝试以下操作:

# for key, value in ref_dict where old column and new columns are 
for old_column,new_columns in ref_dict.items():
    for new_column in new_columns:  # for each new_column in new_columns defined
        df[new_column] = df[old_column] # the content remains same as old column
    del df[old_column]  # now remove the old column

0
你可以简单地使用循环:
rawdata= {'id':['json','molly','tina','jake','molly'],
          'entity':['present','absent','absent','present','present'],
          'entity2':['present','present','present','absent','absent'],
          'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
ref_dict= {'entity':['entity_exp1'],
           'entity2':['entity2_exp1','entity2_exp2'],
           'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

# here comes the new part:
df2 = pd.DataFrame()
for key, val in sorted(ref_dict.items()):
    for subval in val:
        df2[subval] = df[key]

df2['id'] = df['id']
df2.set_index('id', inplace=True)

print(df2)
      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2  entity3_exp3  
id                                                                      
json      present      present      present       absent       absent        absent   
molly      absent      present      present       absent       absent        absent   
tina       absent      present      present       absent       absent        absent   
jake      present       absent       absent      present      present       present    
molly     present       absent       absent       absent       absent        absent   

0

您可以使用字典键作为列名重新索引您的数据框,并使用字典的值重命名这些列。

df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[]))
df_new.columns=sum(ref_dict.values(),[])
df_new
Out[573]: 
  entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
0     present      present      present       absent       absent       absent
1      absent      present      present       absent       absent       absent
2      absent      present      present       absent       absent       absent
3     present       absent       absent      present      present      present
4     present       absent       absent       absent       absent       absent

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接