为什么pandas.DataFrame.apply会打印出垃圾信息？

Question

为什么pandas.DataFrame.apply会打印出垃圾信息？

10

考虑这个简单的数据框：

   a  b
0  1  2
1  2  3

我使用.apply方法执行如下：

In [4]: df.apply(lambda x: [x.values])
Out[4]: 
a    [[140279910807944, 140279910807920]]
b    [[140279910807944, 140279910807920]]
dtype: object

In [5]: df.apply(lambda x: [x.values])
Out[5]: 
a    [[37, 37]]
b    [[37, 37]]
dtype: object

In [6]: df.apply(lambda x: [x.values])
Out[6]: 
a    [[11, 11]]
b    [[11, 11]]
dtype: object

为什么pandas每次都会打印出垃圾？

我已经验证了这在v0.20中发生。

编辑：寻找答案，而不是解决方法。

- cs95

1

与 df.apply(lambda x: [x]) 相同。 - DYZ

与一行数据框相同：df1=pd.DataFrame({'a':[1],'b':[2]}) df1.apply(lambda x: [x],axis=1) 输出：0 [[0, 0]]。 - DYZ

@DYZ 谢谢。所以不只是我的机器出了问题。 - cs95

3

干得好，把 apply 弄坏了...现在你需要修复它！ - piRSquared

@piRSquared 很有趣，看看这是否是一个 bug... :p （那我就无能为力了） - cs95

可以确认 0.21.0.dev+ 主分支上的行为。 - Zero

3个回答

6

我没有答案...只有一个解决方法

f = lambda x: x.values.reshape(1, -1).tolist()

df.apply(f)

a    [[1, 2]]
b    [[2, 3]]
dtype: object

我追踪到了pd.lib.reduce

pd.lib.reduce(df.values, lambda x: [list(x)])

array([list([[1, 2]]), list([[2, 3]]), list([['a', 'b']])], dtype=object)

对比

pd.lib.reduce(df.values, lambda x: [x])

array([list([array([None, None], dtype=object)]),
       list([array([None, None], dtype=object)]),
       list([array([None, None], dtype=object)])], dtype=object)

- piRSquared

我一直在主要的 [code][1] 中检查同样的事情，直到现在从那个 _apply_raw 方法。我以为是 np.apply_along_axis 中混合数据类型的问题。但实际上是 reduce 的问题。真的很好。 [1]: https://github.com/pandas-dev/pandas/blob/3a7f956c30528736beaae5784f509a76d892e229/pandas/core/frame.py#L4280 - Bharath M Shetty

3

另一个解决方法：

df.apply(lambda x: [list(x)])

- DYZ

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jezrael · Accepted Answer

看起来像是一个错误，因此开了一个问题17487。

对我来说，需要使用tolist：

print (df.apply(lambda x: [x.values.tolist()]))
a    [[1, 2]]
b    [[2, 3]]
dtype: object

print (df.apply(lambda x: [list(x.values)]))
a    [[1, 2]]
b    [[2, 3]]
dtype: object