根据列值对有序的pandas数据帧进行分组行

Question

根据列值对有序的pandas数据帧进行分组行

3

我有一个关于 pandas 数据框中只将某些行按照它们的列值分组在一起的问题（数据框按时间戳排序）。

以下是一个例子：

df=pd.DataFrame({"text":["Hello.",
                    "I had a question.", 
                    "Hi!",
                    "Yes how can I help?",
                    "Do you ship to the UK?"
                    ],
            "timestamp":[
                        pd.Timestamp('20131213 11:50:00'),
                        pd.Timestamp('20131213 11:51:00'),
                        pd.Timestamp('20131213 11:52:00'),
                        pd.Timestamp('20131213 11:53:00'),
                        pd.Timestamp('20131213 11:54:00')
                        ],
            "direction":["In","In","Out","Out","In"]})

这是数据框的样子：

这个数据框按时间戳排序，可能是一个聊天记录，其中方向"In"可以是一个人说话，"Out"是另一个人说话。

我想要的是像这样的东西：

在最终的数据框中，如果它们是相同方向的，则行的文本被组合到一起成为一行，但是只有在达到具有不同方向的行之前才将行组合在一起。并且消息的顺序保持不变。

有人有什么想法吗？

- Imu

2个回答

0

你觉得这样做怎么样：

# indicate direction changes
df['dir'] = df.direction.shift(1).bfill()
df['dir_change'] = df.apply(lambda x: 1 if x.direction != x.dir else 0, axis=1)

# create new groups
df['new_group'] = df.dir_change.cumsum()

# group on new groups and aggregate the text
agg_df = df.groupby('new_group').agg({'text':lambda x: ' '.join(list(x)), 'timestamp':'first'})

- katelie

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user3483203 · Accepted Answer

安装设置。

operations = {
    'text': ' '.join,
    'direction': 'first',
}

使用agg和一个常见的技巧按连续值分组：

df.groupby(df.direction.ne(df.direction.shift()).cumsum()).agg(operations)

                               text direction
direction
1          Hello. I had a question.        In
2           Hi! Yes how can I help?       Out
3            Do you ship to the UK?        In