将Pandas数据框转换为字典

Question

将Pandas数据框转换为字典

pythonpython-3.xpandasdictionarydataframe

4

我有一个如下的pandas数据框：

df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})
df

看起来像

    a       b   c   d
0   red     0   0   1
1   yellow  0   1   0
2   blue    1   0   0

我希望将其转换为字典，以便得到以下结果：

red     d
yellow  c
blue    b

这个数据集很大，所以请避免使用任何迭代方法。我还没有想出一个解决方案。感谢任何帮助。

- singh

1

可能是将Pandas DataFrame转换为字典的重复问题。 - Vivek Kalyanarangan

子集化您的数据，然后使用 pandas 中提供的 to_dict 函数将其转换为字典。具体用法请参考：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html - Vivek Kalyanarangan

可以有两个连续的1吗？ - Tai

1

@tai：每行只会出现一个数字1。 - singh

6个回答

2

你可以试试这个。

df = df.set_index('a')
df.where(df > 0).stack().reset_index().drop(0, axis=1)


    a   level_1
0   red     d
1   yellow  c
2   blue    b

- Tai

1

您需要在这里使用 dot 和 zip。

dict(zip(df.a,df.iloc[:,1:].dot(df.iloc[:,1:].columns)))
Out[508]: {'blue': 'b', 'red': 'd', 'yellow': 'c'}

- BENY

1

也许只需使用 df.set_index('a').dot(df.columns[1:]).to_dict() - Bharath M Shetty

0

希望这个能够正常工作：

import pandas as pd
df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})

df['e'] = df.iloc[:,1:].idxmax(axis = 1).reset_index()['index']

newdf = df[["a","e"]]

print (newdf.to_dict(orient='index'))

输出：

{0: {'a': 'red', 'e': 'd'}, 1: {'a': 'yellow', 'e': 'c'}, 2: {'a': 'blue', 'e': 'b'}}

- Bhushan Pant

是的，我正在使用Python 2.7。 - Bhushan Pant

它被标记为“3.x”。输出结果看起来不像OP想要的。 - James Schinner

似乎我忘记使用轴列了。我也检查了Python3，它工作正常。 - Bhushan Pant

@bhushan，感谢您的回答，但输出结果不正确..我想要一个不同的格式。 - singh

0

您可以使用Pandas的to_dict函数将您的dataframe转换成dict，并将list作为参数。然后迭代这个结果的dict并获取值为1的列标签。

>>> {k:df.columns[1:][v.index(1)] for k,v in df.set_index('a').T.to_dict('list').items()}
>>> {'yellow': 'c', 'blue': 'b', 'red': 'd'}

- Sohaib Farooqi

谢谢您的解决方案，但它是迭代的，对于我的大型数据集来说速度较慢。 - singh

0

将列a设置为索引，然后查找df的行以找到值为1的索引，最后使用to_dict将结果系列转换为字典。

以下是代码：

df.set_index('a').apply(lambda row:row[row==1].index[0],axis=1).to_dict()

或者将索引设置为a，然后使用argmax在每行中找到最大值的索引，然后使用to_dict转换为字典

df.set_index('a').apply(lambda row:row.argmax(),axis=1).to_dict()

在这两种情况下，结果都将是：

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

顺便提一下，我使用了apply函数通过设置axis=1来迭代df的行。

- sgDysregulation

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- PaSTE · Accepted Answer

首先，如果您真的想将其转换为字典，最好将要作为键的值转换为DataFrame的索引：

df.set_index('a', inplace=True)

这看起来像：

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

你的数据似乎是使用“one-hot”编码的。你需要先使用这里详细介绍的方法将其反转：

series = df.idxmax(axis=1)

这看起来像是：

a
red       d
yellow    c
blue      b
dtype: object

离成功就差一步了！现在，对于“value”列使用to_dict方法（这也是将列a设置为索引的好处所在）：

series.to_dict()

这看起来像是：

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

我认为这正是您正在寻找的。一行代码如下：

df.set_index('a').idxmax(axis=1).to_dict()