如何在Pandas中计算数值序列？

Question

如何在Pandas中计算数值序列？

3

假设在Pandas中，我有一个数据框如下所示：

index    value

1        1
2        0
3        1
4        1
5        0
6        1

我想要统计特定值序列出现的次数，比如有多少次在1后面出现了0（即[1,0]出现的次数，在上面的例子中是两次），或者有多少次出现了[1,0,1]（同样是两次）。

是否有一种方法可以不使用简单的for循环来实现这个功能？

- Gian Segato

2个回答

1

我不知道有没有直接操作pandas序列的方法来实现这个，可能需要先将pandas序列转换为字符串。下面的方法是将序列转换为字符串，然后使用count函数。

import pandas as pd
import re

s = pd.Series([1,0,1,1,0,1])

# convert to string and remove all whitespace
re.sub('\s+', '', s.to_string(index=False)).count('101')
# 2

- 3novak

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- piRSquared · Accepted Answer

通用解决方案

def tuplify(s, k):
    return list(zip(*[s.values[i:].tolist() for i in range(k)]))

s = pd.Series([1, 0, 1, 1, 0, 1])

pd.value_counts(tuplify(s, 3))

(1, 0, 1)    2
(1, 1, 0)    1
(0, 1, 1)    1
dtype: int64

你可以将其分配给一个变量，并获取你想要的元组。

counts = pd.value_counts(tuplify(s, 3))
counts[(1, 0, 1)]

2

故障

tuplify(s, 3)

[(1, 0, 1), (0, 1, 1), (1, 1, 0), (1, 0, 1)]

元组是可哈希的，因此可以进行计数，pd.value_counts 的使用如上所示。