从字典列表值创建Pandas数据框架

Question

从字典列表值创建Pandas数据框架

17

我有一个字典，其中列表是值，例如：

cols = {'animals':['dog','cat','fish'],
        'colors':['red','black','blue','dog']}

我希望将此转换为数据框，在其中每个列表根据其键进行枚举，结果为什么。

key variable
animals dog
animals cat
animal fish
colors red
colors black
colors blue
colors dog

到目前为止，我已经做了这个：但它并没有给我想要的结果。

cols_df = pd.DataFrame.from_dict(cols, orient='index')

如何修改此内容以实现以上目标？

- owwoow14

你想要长格式，但是 from_dict(.. orient='index') 只提供宽格式，而 from_dict(.. orient='columns') 则会出现 ValueError('arrays must all be same length') 的错误。 - smci

6个回答

4

pd.DataFrame.from_dict(cols, orient='index').T.unstack().dropna().reset_index(level=1,drop=True)

animals      dog
animals      cat
animals     fish
colors       red
colors     black
colors      blue
colors       dog

在进行from_dict(.. orient='columns')操作时，我们需要首先将列填充到相等的长度以防止出现错误。有两种方法可以做到这一点：

pd.DataFrame.from_dict(cols, orient='index').T是我在root的这个答案中发现的一个未记录技巧；transpose添加NaN单元格以使结果矩形化。
手动的替代方法是找到每行需要填充的单元格数量，类似于：

使用df_cols.apply(pd.Series.pad, max(len(c) for c in cols.values()))来计算填充量... 然后在每行的末尾添加NaN值，例如：cols['animals'].append(np.NaN)

- smci

3

这可能不是最快的解决方案，您需要额外的列表。

d = {'animals': ['dog','cat','fish'],
     'colors': ['red','black','blue','dog']}

keys = [k for k in d.keys() for v in d[k]]
values = [v for k in d.keys() for v in d[k]]
pd.DataFrame.from_dict({'index': keys, 'values': values})

- dtt

1

你可以使用 stack：

df = pd.DataFrame.from_dict(cols, orient='index')
df = df.stack().to_frame().reset_index().drop('level_1', axis=1)
df.columns = ['key', 'variable']

df

key variable
0   colors  red
1   colors  black
2   colors  blue
3   colors  dog
4   animals dog
5   animals cat
6   animals fish

演示：

df = pd.DataFrame.from_dict(cols, orient='index')
df

        0   1      2    3
colors  red black  blue dog
animals dog cat    fish None

df.stack() 返回一个 series。需要使用 to_frame() 将其转换为 dataframe。然后进行 reset_index() 以获得所需的 frame。

df.stack().to_frame().reset_index()


 level_0 level_1 0
0   colors  0   red
1   colors  1   black
2   colors  2   blue
3   colors  3   dog
4   animals 0   dog
5   animals 1   cat
6   animals 2   fish

现在执行drop('level_1', axis=1)并设置列名以获得期望的数据框。

- akilat90

0

使用 itertools.chain 和 itertools.repeat：

import pandas as pd
from itertools import chain, repeat

chainer = chain.from_iterable

d = {'animals': ['dog', 'cat', 'fish'],
     'colors': ['red', 'black', 'blue', 'dog']}

df = pd.DataFrame({'key': list(chainer(repeat(k, len(v)) for k, v in d.items())),
                   'variable': list(chainer(d.values()))})

print(df)

       key variable
0  animals      dog
1  animals      cat
2  animals     fish
3   colors      red
4   colors    black
5   colors     blue
6   colors      dog

- jpp

0

使用 itertools 的 crossproduct 创建一个键 / 值配对的字典，可加载到数据框中。

 import itertools

 cols = {'animals':['dog','cat','fish'],
    'colors':['red','black','blue','dog']}

 keys=cols.keys()
 values=cols.values()

 data=[]
 for key,values in cols.items():
     results=itertools.product([key],values)
     for key,item in enumerate(results):
          data.append(item)

 df=pd.DataFrame(data,columns=['category','value'])
 print(df)

输出：

  category  value
0  animals    dog
1  animals    cat
2  animals   fish
3   colors    red
4   colors  black
5   colors   blue
6   colors    dog

- Golden Lion

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- BallpointBen · Accepted Answer

无需导入，适用于所有输入：

>>> pd.DataFrame([(key, var) for (key, L) in cols.items() for var in L], 
                 columns=['key', 'variable'])

       key variable
0  animals      dog
1  animals      cat
2  animals     fish
3   colors      red
4   colors    black
5   colors     blue
6   colors      dog