使用Pandas DataFrame将类似于列表的列值转换为多行

3

CSV文件:(样例1.csv)

Location_City, Location_State, Name, hobbies
Los Angeles,   CA,             John, "['Music', 'Running']"
Texas,         TX,             Jack, "['Swimming', 'Trekking']"

我想将CSV的兴趣爱好列转换为以下输出。
Location_City, Location_State, Name, hobbies
Los Angeles,   CA,             John, Music
Los Angeles,   CA,             John, Running
Texas,         TX,             Jack, Swimming
Texas,         TX,             Jack, Trekking

我已经将CSV文件读入到dataframe中,但我不知道如何进行转换?

 data = pd.read_csv("sample1.csv") 
 df=pd.DataFrame(data)
 df

1
请问兴趣爱好列中的值是列表还是字符串? - Sociopath
当它进入数据框时,它显示dtype:object。 - Rohan Pawar
2个回答

2
我们可以使用 pandas.DataFrame.explode 函数来解决这个问题,该函数在版本 0.25.0 中被引入,如果您的版本相同或更高,则可以使用以下代码。
explode 函数参考文献:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html
import pandas as pd
import ast

data = {
    'Location_City': ['Los Angeles','Texas'],
    'Location_State': ['CA','TX'],
    'Name': ['John','Jack'],
    'hobbies': ["['Music', 'Running']", "['Swimming', 'Trekking']"]
}
df = pd.DataFrame(data)

# Converting a string representation of a list into an actual list object

list_eval = lambda x: ast.literal_eval(x)
df['hobbies'] = df['hobbies'].apply(list_eval)

# Exploding the list
df = df.explode('hobbies')

print(df)

  Location_City Location_State  Name   hobbies
0   Los Angeles             CA  John     Music
0   Los Angeles             CA  John   Running
1         Texas             TX  Jack  Swimming
1         Texas             TX  Jack  Trekking

1
你可以使用findallextractall来获取hobbies列中的列表,然后使用chain.from_iterable将其展开并重复其他列:
a = df['hobbies'].str.findall("'(.*?)'").astype(np.object)
lens = a.str.len()

from itertools import chain

df1 = pd.DataFrame({
    'Location_City' : df['Location_City'].values.repeat(lens),
    'Location_State' : df['Location_State'].values.repeat(lens),
    'Name' : df['Name'].values.repeat(lens),
    'hobbies' : list(chain.from_iterable(a.tolist())), 
})

或者创建 Series,移除第一层并 join 到原始的 DataFrame

df1 = (df.join(df.pop('hobbies').str.extractall("'(.*?)'")[0]
               .reset_index(level=1, drop=True)
               .rename('hobbies'))
         .reset_index(drop=True))

print (df1)

  Location_City Location_State  Name   hobbies
0   Los Angeles             CA  John     Music
1   Los Angeles             CA  John   Running
2         Texas             TX  Jack  Swimming
3         Texas             TX  Jack  Trekking

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接