pandas字典列表转换为单独的列

6

我有一个如下的数据集:

name    status    number   message
matt    active    12345    [job:  , money: none, wife: none]
james   active    23456    [group: band, wife: yes, money: 10000]
adam    inactive  34567    [job: none, money: none, wife:  , kids: one, group: jail]

如何提取键值对并将其转换为完全展开的数据框?
预期输出:
name    status   number    job    money    wife    group   kids 
matt    active   12345     none   none     none    none    none
james   active   23456     none   10000    none    band    none
adam    inactive 34567     none   none     none    none    one

该消息包含多种不同的密钥类型。

任何帮助都将不胜感激。

2个回答

5

这并不容易。

需要使用replace\s+表示一到多个空格)将值转换为listdict,然后再使用ast

然后可以使用 DataFrame 构造函数和concatpopdf中删除列:

import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'], 
                                ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
   kids  money group   job  money  wife
0   NaN   none   NaN  none    NaN  none
1   NaN    NaN  band   NaN  10000   yes
2   one    NaN  jail  none   none  none

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number  kids  money group   job  money  wife
0   matt    active   12345   NaN   none   NaN  none    NaN  none
1  james    active   23456   NaN    NaN  band   NaN  10000   yes
2   adam  inactive   34567   one    NaN  jail  none   none  none

编辑:

yaml 的另一个解决方案:

import yaml

df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)

df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
  group   job kids  money  wife
0   NaN  None  NaN   none  none
1  band   NaN  NaN  10000  True
2  jail  none  one   none  None

df = pd.concat([df, df1], axis=1)
print (df)
    name    status  number group   job kids  money  wife
0   matt    active   12345   NaN  None  NaN   none  none
1  james    active   23456  band   NaN  NaN  10000  True
2   adam  inactive   34567  jail  none  one   none  None

1
你把它标记为列表,但说它是字典,所以这应该可以工作:
pd.concat([data.drop(['message'], axis=1), data['message'].apply(pd.Series)], axis=1)

这是一列列表,但它的值是用冒号分隔的。唯一的问题是,我有空白值。不幸的是,这段代码没有起作用。 - johnnyb
这些列被标记为数值,而不是键 {键:值}。 - johnnyb

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接