如何根据列表中元素的条件，在Pandas Dataframe 中拆分列表列？

Question

如何根据列表中元素的条件，在Pandas Dataframe 中拆分列表列？

3

我有一个只有一列的pandas DataFrame，例如：

df = pd.DataFrame({"combined_list": [["Netherlands|NL", "Germany|DE", "United_States|US", "Poland|PL"], ["Netherlands|NL", "Austria|AU", "Belgium|BE"], ["United_States|US", "Germany|DE"]]})

我想从combined_list列创建两列:

一个包含所有常规国家名称（即最后一次出现|之前的所有内容）
一个包含所有2个字母缩写（其长度始终为2），即基本上是最后一次出现|之后的所有剩余文本。

生成的数据框应如下所示：

countries                                      abbreviations
[Netherlands, Germany, United_States, Poland]  [NL, DE, US, PL]
[Netherlands, Austria, Belgium]                [NL, AU, BE]
[United_States, Germany]                       [US, DE]

如何实现这个？

我知道如果Dataframe的列只是一个字符串，我可以使用各种字符串分隔函数来实现它，但对于列表类型的列找不到任何方法。

- Peter

3个回答

2

使用explode函数，然后将结果转换为数据框，并使用groupby.agg进行聚合。

out_cols= ["countries", "abbreviations"]
out =(df['combined_list'].explode().str.split("|",expand=True)
      .groupby(level=0).agg(list).set_axis(out_cols,axis=1))

print(out)

                                       countries     abbreviations
0  [Netherlands, Germany, United_States, Poland]  [NL, DE, US, PL]
1                [Netherlands, Austria, Belgium]      [NL, AU, BE]
2                       [United_States, Germany]          [US, DE]

- anky

1

非常信息丰富的解决方案！（在我看来是迄今为止最好的）但第二个agg(list)是打字错误吗？ - Mustafa Aydın

@MustafaAydın 啊，是的，谢谢您 :) 我会编辑它。 - anky

1

这里是另一种解决方案：

df2 = pd.DataFrame()
df2['Countries'] = df.apply(lambda row:[row['combined_list'][i].split('|')[0] for i in range(len(row['combined_list']))], axis=1)
df2['Abbreviations'] = df.apply(lambda row:[row['combined_list'][i].split('|')[1] for i in range(len(row['combined_list']))], axis=1)

print(df2)

                                       Countries     Abbreviations
0  [Netherlands, Germany, United_States, Poland]  [NL, DE, US, PL]
1                [Netherlands, Austria, Belgium]      [NL, AU, BE]
2                       [United_States, Germany]          [US, DE]

- pakpe

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Andrej Kesely · Accepted Answer

df_out = pd.DataFrame(
    df["combined_list"]
    .apply(lambda x: list(zip(*[s.split("|") for s in x])))
    .tolist(),
    columns=["countries", "abbreviations"],
)
print(df_out)

打印：

                                       countries     abbreviations
0  (Netherlands, Germany, United_States, Poland)  (NL, DE, US, PL)
1                (Netherlands, Austria, Belgium)      (NL, AU, BE)
2                       (United_States, Germany)          (US, DE)

在列中有列表：

df_out = pd.DataFrame(
    df["combined_list"]
    .apply(lambda x: list(map(list, zip(*[s.split("|") for s in x]))))
    .tolist(),
    columns=["countries", "abbreviations"],
)
print(df_out)

输出：

                                       countries     abbreviations
0  [Netherlands, Germany, United_States, Poland]  [NL, DE, US, PL]
1                [Netherlands, Austria, Belgium]      [NL, AU, BE]
2                       [United_States, Germany]          [US, DE]