pandas：groupby对象是否存储索引？

Question

pandas：groupby对象是否存储索引？

4

据我所知，groupby需要在分组变量上计算索引。但是，我不确定这是否存储在groupby对象中。

我的代码如下：

df.groupby(["col1","col2"]).agg( something )
( ... some code ... )
df.groupby(["col1","col2"]).agg( something else )

我理解的是，以下内容可以避免索引被建立两次，你的理解正确吗？

my_group = groupby(["col1","col2"])
my_group.agg( something )
( ... some code ... )
my_group.agg( something else )

这对我很重要，因为我正在编写一些需要两次经过分组处理的内容，如果索引没有被存储，我可能需要实现自己的groupby。

- user1950164

你能提供更多的上下文吗？你担心形成这些组需要很长时间，所以你只想做一次吗？还是你需要在第二个聚合中使用第一个聚合的结果？ - ALollz

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ankit Kumar Namdeo · Accepted Answer

groupby可以计算用于聚合的索引，并且如果可以在groupby对象中存储它，则会再次存储正在构建的索引

df3 = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                    "B": ["one", "one", "one", "two", "two",
                          "one", "one", "two", "two"],
                    "C": ["small", "large", "large", "small",
                          "small", "large", "small", "small",
                         "large"],
                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df4 = df3.sort_values(['A','B'])
res1 = df3.groupby(['A', 'B'])['D'].mean()
res2 = df4.groupby(['A', 'B'])['D'].median()

print res1.index
MultiIndex(levels=[[u'bar', u'foo'], [u'one', u'two']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'A', u'B'])

print res2.index
MultiIndex(levels=[[u'bar', u'foo'], [u'one', u'two']],
           labels=[[0, 0, 1, 1], [0, 1, 0, 1]],
           names=[u'A', u'B'])

你肯定可以做到。

my_group = df3.groupby(['A', 'B']) 
print type(my_group)
pandas.core.groupby.groupby.DataFrameGroupBy

然后可以对同一个分组对象执行不同的聚合操作，确保它不会重新计算索引。

如果这有帮助，请让我知道。