如何将pandas数据框转换为多个命名元组列表

3
我正在处理一段代码,需要将放置在列表中的多个NamedTuple映射起来。 以下是示例代码 - 我主要的问题在于如何映射双重NamedTuplePeopleNamePeopleAgeList - 我不清楚该如何操作。应该分为两步完成,1 / 将整行提取成通用的NamedTuple,然后2 / 将记录拆分为不同的NamedTuple PeopleNamePeopleAge
from typing import NamedTuple, List

import pandas as pd

data = [["tom", 10, "ab 11"], ["nick", 15, "ab 22"], ["juli", 14, "ab 11"]]
people = pd.DataFrame(data, columns=["Name", "Age", "PostalCode"])

PeopleName = NamedTuple("PeopleName", [("Name", str)])
PeopleAge = NamedTuple("PeopleAge", [("Age", int)])
PeoplePC = NamedTuple("PeoplePC", [("PostalCode", str)])

# The code below is not correct
Demography = NamedTuple(
    "Demography", [("names", List[(PeopleName, PeopleAge)]), ("postalcodes", PeoplePC)],
)


def to_nested_tuple(k, g):
    peoples = list(
        g["Name"].to_frame().itertuples(name="Person", index=False),
        # rec["Age"].to_frame().itertuples(name="PeopleAge", index=False),
    )
    return Demography(peoples, PeoplePC(k))


d = [to_nested_tuple(*item) for item in people.groupby("PostalCode")]

print(d)

请您能否分享一些示例输出,我不太确定您试图做什么。 - AMC
2个回答

2

这个注释List [(PeopleName,PeopleAge)]会抛出TypeError:typing.List的参数太多;实际为2,期望为1

那个包含2种不同类型的元组也应该用typing.Tuple进行注释:

List[Tuple[PeopleName, PeopleAge]]

然而,为了注释参数,最好使用抽象集合类型,例如SequenceIterable

Demography = NamedTuple(
    "Demography", [("names", Sequence[Tuple[PeopleName, PeopleAge]]), ("postalcodes", PeoplePC)],
)

不必为每个组应用to_nested_tuple,我会直接按照以下方式进行:

d = [Demography([(PeopleName(row['Name']), PeopleAge(row['Age'])) for _, row in group.iterrows()], PeoplePC(k))
     for k, group in people.groupby("PostalCode")] 

现在,结果将被打印为:
[Demography(names=[(PeopleName(Name='tom'), PeopleAge(Age=10)), (PeopleName(Name='juli'), PeopleAge(Age=14))], postalcodes=PeoplePC(PostalCode='ab 11')),
 Demography(names=[(PeopleName(Name='nick'), PeopleAge(Age=15))], postalcodes=PeoplePC(PostalCode='ab 22'))]

谢谢,然而最终结果看起来不正确,难道它不应该是一个PeopleNamePeopleAge类型的列表,每个常见邮政编码都有一个PeoplePC吗? - Michael
此外,您如何将这个带注释的序列转换成 pandas df 的输出? - Michael
@Michael, “将这个带注释的序列转化为输出”是什么意思? 我已经发布了你的最终语句print(d)的结果。 - RomanPerekhrest
感谢@RomanParekhrest - 在我的初始代码中 - 您如何将原始数据框转换为序列:def to_nested_tuple(k, g): peoples = list(g["Name"].to_frame().itertuples(name="Person", index=False),) return Demography(peoples, PeoplePC(k)) - Michael
1
谢谢@RomanPerekhrest - 这正如预期的那样工作 - 谢谢您花时间更新您的答案! - Michael

1

使用list(df.itertuples()),其中df是您的数据框。


这是我想要做的事情,但对于列表中的每个数据帧记录,我应该有一个对PeopleAgePeopleName的双重映射 - 最终我应该有一个(peopleAgepeopleName)列表,其中一个对应的PeoplePC如果有意义的话。 - Michael

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接