如何在共同的列上合并多个CSV文件,并将非共同的列保留为单独的列?

3

我有三个csv文件,其中包含有关COVID-19的数据。第一个csv文件包含有关确诊病例数量的信息,第二个文件包含有关死亡人数的信息,第三个文件包含有关康复人数的信息。

这就是数据框的外观

import pandas as pd

df1 = pd.read_csv('/Users/sr/covid_csvs/confirmed.csv')

df2 = pd.read_csv('/Users/sr/covid_csvs/deaths.csv')

df3 = pd.read_csv('/Users/sr/covid_csvs/recovery.csv')

print(df1.head(5))

  Province/State Country/Region      Lat     Long     Date  Confirmed
0            NaN    Afghanistan  33.0000  65.0000  1/22/20          0
1            NaN        Albania  41.1533  20.1683  1/22/20          0
2            NaN        Algeria  28.0339   1.6596  1/22/20          0
3            NaN        Andorra  42.5063   1.5218  1/22/20          0
4            NaN         Angola -11.2027  17.8739  1/22/20          0


print(df2.head(5))

  Province/State Country/Region      Lat     Long     Date     Deaths
0            NaN    Afghanistan  33.0000  65.0000  1/22/20          0
1            NaN        Albania  41.1533  20.1683  1/22/20          0
2            NaN        Algeria  28.0339   1.6596  1/22/20          0
3            NaN        Andorra  42.5063   1.5218  1/22/20          0
4            NaN         Angola -11.2027  17.8739  1/22/20          0


print(df3.head(5))

  Province/State Country/Region      Lat     Long     Date  Recovery
0            NaN    Afghanistan  33.0000  65.0000  1/22/20         0
1            NaN        Albania  41.1533  20.1683  1/22/20         0
2            NaN        Algeria  28.0339   1.6596  1/22/20         0
3            NaN        Andorra  42.5063   1.5218  1/22/20         0
4            NaN         Angola -11.2027  17.8739  1/22/20         0

现在我想合并所有三个数据帧,以获得以下结果

  Province/State Country/Region      Lat     Long     Date  Confirmed  Deaths Recovery
0            NaN    Afghanistan  33.0000  65.0000  1/22/20          0       0        0
1            NaN        Albania  41.1533  20.1683  1/22/20          0       0        0
2            NaN        Algeria  28.0339   1.6596  1/22/20          0       0        0
3            NaN        Andorra  42.5063   1.5218  1/22/20          0       0        0
4            NaN         Angola -11.2027  17.8739  1/22/20          0       0        0

所以我尝试做了以下的事情

df_merged = pd.concat([df1, df2, df3])    
df_merged.to_csv('merged.csv', sep=',', encoding='utf-8', index=False)

但我没有得到所需的 csv 文件。我该怎么做?

1个回答

2
思路是通过 DataFrame.set_index 为每个 DataFrame 创建 MultiIndex,然后使用 axis=1 进行 concat,最后在 to_csv 中去掉 index=False
cols = ['Province/State', 'Country/Region','Lat','Long','Date']

dfs = [df1, df2, df3]
df_merged = pd.concat([x.set_index(cols) for x in dfs], axis=1)    
df_merged.to_csv('merged.csv', sep=',', encoding='utf-8')

或者将MultiIndex转换为列,然后在to_csv中使用index=False
cols = ['Province/State', 'Country/Region','Lat','Long','Date']

dfs = [df1, df2, df3]
df_merged = pd.concat([x.set_index(cols) for x in dfs], axis=1).reset_index()  
df_merged.to_csv('merged.csv', sep=',', encoding='utf-8', index=False)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接