Pandas DataFrame使用groupby方法时出现StopIteration错误。

5

我在使用 groupby 方法时遇到了与这个人在 StackOverflow 上发布的类似问题:

pandas group StopIteration error

我正在尝试使用 groupby 方法,但是我遇到了一个类似的 StopIteration 错误,而我的操作更加简单。

Traceback (most recent call last):
  File "prepare_data_TJ2012_v1p0.py", line 107, in <module>
    grouped = df.groupby('hh').apply(f)
  File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 637, in apply
    return self._python_apply_general(f)
  File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 644, in _python_apply_general
    not_indexed_same=mutated)
  File "/Users/shafiquejamal/allfiles/htdocs/venvs/easyframes-py3/lib/python3.4/site-packages/pandas/core/groupby.py", line 2657, in _wrap_applied_output
    v = next(v for v in values if v is not None)
StopIteration

以下是生成它的代码:
df = pd.DataFrame(
            {'educ': {0: 'pri', 1: 'bach', 2: 'pri', 3: 'hi', 4: 'bach', 5: 'sec', 
                6: 'hi', 7: 'hi', 8: 'pri', 9: 'pri'}, 
             'hh': {0: 1, 1: 1, 2: 1, 3: 2, 4: 3, 5: 3, 6: 4, 7: 4, 8: 4, 9: 4}, 
             'id': {0: 1, 1: 2, 2: 3, 3: 1, 4: 1, 5: 2, 6: 1, 7: 2, 8: 3, 9: 4}, 
             'has_car': {0: 1, 1: 1, 2: 1, 3: 1, 4: 0, 5: 0, 6: 1, 7: 1, 8: 1, 9: 1}, 
             'weighthh': {0: 2, 1: 2, 2: 2, 3: 3, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 3}, 
             'house_rooms': {0: 3, 1: 3, 2: 3, 3: 2, 4: 1, 5: 1, 6: 3, 7: 3, 8: 3, 9: 3}, 
             'prov': {0: 'BC', 1: 'BC', 2: 'BC', 3: 'Alberta', 4: 'BC', 5: 'BC', 6: 'Alberta', 
                7: 'Alberta', 8: 'Alberta', 9: 'Alberta'}, 
             'age': {0: 44, 1: 43, 2: 13, 3: 70, 4: 23, 5: 20, 6: 37, 7: 35, 8: 8, 9: 15}, 
             'fridge': {0: 'yes', 1: 'yes', 2: 'yes', 3: 'no', 4: 'yes', 5: 'yes', 6: 'no', 
                7: 'no', 8: 'no', 9: 'no'}, 
             'male': {0: 1, 1: 0, 2: 1, 3: 1, 4: 1, 5: 0, 6: 1, 7: 0, 8: 0, 9: 0}})
print(df)
print('-- groupby dataframes ---')
def f(df):
    print('-------------------------')
    print('DataFrame' )
    print(df)
    s = df['age']
    print(s)
    print('----> Not nulls:')
    s_notnulls = ~s.isnull()
    print(s_notnulls)
    print('----> Number of non-nulls: %d' % len(s_notnulls[s_notnulls==True]))
df.groupby('hh').apply(f)

如果另一列中存在至少一个非空值,我希望按组对列执行操作。

我正在使用 pandas==0.14.1。看起来分组循环的时间太长了。这是一个 bug 吗?(或者可能是我使用了错误的 groupby 方法...)

1个回答

10

你之所以会收到这个错误,是因为你传递给apply的函数没有返回任何内容。如果你只关心输出结果,你可以像这样将df返回。

def f(df):
    print('-------------------------')
    print('DataFrame' )
    print(df)
    s = df['age']
    print(s)
    print('----> Not nulls:')
    s_notnulls = ~s.isnull()
    print(s_notnulls)
    print('----> Number of non-nulls: %d' % len(s_notnulls[s_notnulls==True]))

    return df

然后应用程序将在没有错误的情况下运行。

In [295]: df.groupby('hh').apply(f)
-------------------------
DataFrame
   age  educ fridge  has_car  hh  house_rooms  id  male prov  weighthh
0   44   pri    yes        1   1            3   1     1   BC         2
1   43  bach    yes        1   1            3   2     0   BC         2
2   13   pri    yes        1   1            3   3     1   BC         2
.....

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接