列表的列表转换为 Pandas 数据框？

Question

列表的列表转换为 Pandas 数据框？

pythonpython-3.xpandastupleslist-comprehension

9

我有一个列表，它包含一系列元组的集合，每个元组的长度都相等。我需要将这些元组转换为 Pandas 数据框，使得数据框的列数等于元组的长度，并且每个元组项都成为跨列的行条目。

我已经查阅了其他关于此主题的问题（例如：将元组的列表转换为 pandas 数据帧、将元组的列表转换为 pandas 数据帧、将元组的列表分割为元组的列表的列表），但都没有成功。

最接近我的是来自 Stack Overflow 的另一个问题中的列表推导式。

import pandas as pd

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

# Trying list comprehension from previous stack question:
pd.DataFrame([[y for y in x] for x in tupList])

但是这会产生意想不到的结果：

    0                                 1
0   (commentID, commentText, date)    (123456, blahblahblah, 2019)
1   (45678, hello world, 2018)        (0, text, 2017)

当期望的结果如下：

      0            1                 2
0     commentID    commentText       date
1     123456       blahblahblah      2019
2     45678        hello world       2018
3     0            text              2017

总结一下：我需要列的数量等于每个元组的长度（在本例中为3），其中元组中的每个项都是跨列的行条目。

谢谢！

- n0ro

4个回答

3

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
print(pd.DataFrame(sum(tupList,[])))

输出

           0             1     2
0  commentID   commentText  date
1     123456  blahblahblah  2019
2      45678   hello world  2018
3          0          text  2017

- ComplicatedPhenomenon

哦，我非常喜欢这个，太聪明了。希望楼主接受这个，点赞。 - Erfan

2

一个更短的代码如下所示：

最初的回答

from itertools import chain
import pandas as pd

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

new_list = [x for x in chain.from_iterable(tupList)]
df = pd.DataFrame.from_records(new_list)

编辑

您可以直接在from_records函数中进行列表推导。

from_records函数是一个用于将记录数组转换为结构化数组的函数。使用列表推导可以简化代码并使其更易读。

- ndclt

0

你可以这样做：D

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

# Trying list comprehension from previous stack question:
df = pd.DataFrame([[y for y in x] for x in tupList])
df_1 = df[0].apply(pd.Series).assign(index= range(0, df.shape[0]*2, 2)).set_index("index")
df_2 = df[1].apply(pd.Series).assign(index= range(1, df.shape[0]*2, 2)).set_index("index")

pd.concat([df_1, df_2], axis=0).sort_index()

- ivallesp

“apply(pd.Series)” 是使用 pandas 最糟糕的操作之一。它非常慢。 - Erfan

你怎么知道的？ - Erfan

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- RomanPerekhrest · Accepted Answer

将您的列表展平为一个元组列表（您的初始列表包含元组的子列表）：

In [1251]: tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

In [1252]: pd.DataFrame([t for lst in tupList for t in lst])
Out[1252]: 
           0             1     2
0  commentID   commentText  date
1     123456  blahblahblah  2019
2      45678   hello world  2018
3          0          text  2017