从现有的列名中在Pandas DataFrame中创建一个新的列

11

我想要分解一份pandas DataFrame,使用列标题作为新的数据列,并创建一个包含行索引和列的所有组合的列表。更容易通过示例演示:

index_col = ["store1", "store2", "store3"]
cols = ["January", "February", "March"]
values = [[2,3,4],[5,6,7],[8,9,10]]
df = pd.DataFrame(values, index=index_col, columns=cols)

我希望从这个DataFrame中获得以下列表:

[['store1', 'January', 2],
 ['store1', 'February', 3],
 ['store1', 'March', 4],
 ['store2', 'January', 5],
 ['store2', 'February', 6],
 ['store2', 'March', 7],
 ['store3', 'January', 8],
 ['store3', 'February', 9],
 ['store3', 'March', 10]]

有没有方便的方法做到这一点?

6个回答

10
df.unstack().swaplevel().reset_index().values.tolist()
#OR
df.reset_index().melt(id_vars="index").values.tolist()
# [['store1', 'January', 2],
#  ['store2', 'January', 5],
#  ['store3', 'January', 8],
#  ['store1', 'February', 3],
#  ['store2', 'February', 6],
#  ['store3', 'February', 9],
#  ['store1', 'March', 4],
#  ['store2', 'March', 7],
#  ['store3', 'March', 10]]

使用以下代码,元素的顺序将与问题中的输出匹配。

df.transpose().unstack().reset_index().values.tolist()
# [['store1', 'January', 2],
#  ['store1', 'February', 3],
#  ['store1', 'March', 4],
#  ['store2', 'January', 5],
#  ['store2', 'February', 6],
#  ['store2', 'March', 7],
#  ['store3', 'January', 8],
#  ['store3', 'February', 9],
#  ['store3', 'March', 10]]

1
这绝对是最好的(虽然我的也不错,但这更加简洁,甚至更符合“真正的Pandas风格”)。 - user17242583
需要保持相同的顺序吗? - Larry the Llama
2
@LarrytheLlama,如果顺序很重要,您可以执行 df.unstack().swaplevel().reset_index().sort_values("level_0").values.tolist() - d.b

6

真正的Pandas风格:

lst = [[*k, v] for k, v in df.unstack().swaplevel().to_dict().items()]

1
美妙的嵌套列表推导和多个Pandas方法的使用! - Celius Stingher

2

我更喜欢堆叠而不是解除堆叠,然后交换层级:

>>> df.stack().reset_index().to_numpy()
array([['store1', 'January', 2],
       ['store1', 'February', 3],
       ['store1', 'March', 4],
       ['store2', 'January', 5],
       ['store2', 'February', 6],
       ['store2', 'March', 7],
       ['store3', 'January', 8],
       ['store3', 'February', 9],
       ['store3', 'March', 10]], dtype=object)
>>> 

或者使用 meltignore_index=False

>>> df.melt(ignore_index=False).reset_index().to_numpy()
array([['store1', 'January', 2],
       ['store2', 'January', 5],
       ['store3', 'January', 8],
       ['store1', 'February', 3],
       ['store2', 'February', 6],
       ['store3', 'February', 9],
       ['store1', 'March', 4],
       ['store2', 'March', 7],
       ['store3', 'March', 10]], dtype=object)
>>> 

1
这也是一个不错的选择! - user17242583

1

你想要的数据结构非常混乱,因此考虑到你所需的数据,这可能是最佳方法。

# Results
res = []

# Nested loop: first for length of index col, then next for cols
for i in range(len(index_col)):
    for j in range(len(cols)):
        # Format of data
        res.append([index_col[i], cols[j], values[i][j]])

# Return results
print(res)
return res

1

您可以使用以下方法迭代数据框中的项目:


data = []

for col, row in df.items():
    for ind, val in row.reset_index().values:
        data.append([ind, col, val])

data

您可以避免第二个循环,以牺牲输出顺序为代价,因为这会完全分解结构的起始方式。

1
temp = df.stack()

[[*ent, val] for ent, val in zip(temp.index, temp)]

[['store1', 'January', 2],
 ['store1', 'February', 3],
 ['store1', 'March', 4],
 ['store2', 'January', 5],
 ['store2', 'February', 6],
 ['store2', 'March', 7],
 ['store3', 'January', 8],
 ['store3', 'February', 9],
 ['store3', 'March', 10]]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接