Pandas：如何将一个字典的字典映射到2列？

Question

Pandas：如何将一个字典的字典映射到2列？

4

我有以下字典：

rates = {'USD': 
              {'2019': 1,
               '2020': 2,
               '2021': 3},
         'CAD':
              {'2019': 4,
               '2020': 5,
               '2021': 6}
         }

and I have the following dummy dataframe:

   Item Currency Year Rate
0  1    USD      2019 
1  2    USD      2020
2  3    CAD      2021
3  4    CAD      2019
4  5    GBP      2020

我现在要通过映射正确的汇率（rate = f(currency, year)），来填充 Rate 列。我正在试着使用以下代码：

def map_rate(data, rates):

    for index, row in data.iterrows():

        currency = str(row['Currency'])

        if currency in list(rates.keys()):

            year = str(row['Year'])
            rate = rates[currency][year]

        else:
            rate = 1

    return rate

我使用以上内容如下：

df['Rate'] = map_rate(test, rates)

然而，这只会返回第一个汇率，比如值为1，而不是相应的汇率：

    Item Currency Year  Rate
0   1    USD      2019  1
1   2    USD      2020  1
2   3    CAD      2021  1
3   4    CAD      2019  1
4   5    GBP      2020  1

预期结果是：

    Item Currency Year  Rate
0   1    USD      2019  1
1   2    USD      2020  2
2   3    CAD      2021  6
3   4    CAD      2019  4
4   5    GBP      2020  1

我的错误在哪里？

- Zizzipupp

2

顺便提一下：您可以直接检查字典中是否存在键：if currency in rates，而不是if currency in list(rates.keys())。后者会形成一个列表并且失去 ~O(1) 的查找时间。 - Mustafa Aydın

4个回答

2

以下是一种方法，通过使用 stack 从 rates 创建一个多级索引系列，然后使用 df 中的值 reindex 来获得所需的每行汇率。

df['rate'] = (
    pd.DataFrame(rates)
      .stack()
      .reindex(pd.MultiIndex.from_frame(df[['Year','Currency']].astype(str)), 
               fill_value=1)
     .to_numpy()
)
print(df)
   Item Currency  Year  rate
0     1      USD  2019     1
1     2      USD  2020     2
2     3      CAD  2021     6
3     4      CAD  2019     4
4     5      GBP  2020     1

- Ben.T

2

创建另一个数据框以存储汇率信息。

rates_df = pd.DataFrame(rates).unstack().reset_index()
rates_df.columns = ['Currency', 'Year', 'Rates']
rates_df['Year'] = rates_df['Year'].astype(int)

然后合并。

df.merge(rates_df, on=['Currency', 'Year'], how='left').fillna(1)

费率数据框

  Currency  Year  Rates
0      USD  2019      1
1      USD  2020      2
2      USD  2021      3
3      CAD  2019      4
4      CAD  2020      5
5      CAD  2021      6

输出

   Item Currency  Year  Rates
0     1      USD  2019    1.0
1     2      USD  2020    2.0
2     3      CAD  2021    6.0
3     4      CAD  2019    4.0
4     5      GBP  2020    1.0

- Vishnudev Krishnadas

1

这可以通过内置的 Pandas 方法 df.apply() 轻松完成。下面是一个比其他发布的答案更详细的示例。 代码：

def get_rate(row):
  if row['Currency'] in rates.keys():
    return rates[row['Currency']][row['Year']]
  else:
    return 1

df['Rate'] = df.apply(get_rate,axis=1)

print(df)

- filiabel

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Rakesh · Accepted Answer

使用.apply

例子：

df['Rate'] = df.apply(lambda x: rates[x['Currency']][x['Year']], axis=1)
# OR
df['Rate'] = df.apply(lambda x: rates.get(x['Currency'], dict()).get(x['Year'], 1), axis=1)
print(df)

输出：

  Item Currency  Year  Rate
0    1      USD  2019     1
1    2      USD  2020     2
2    3      CAD  2021     6
3    4      CAD  2019     4
4    5      GBP  2020     1