当权重参数为整数时，如何从numpy.bincount获得整数数组

Question

当权重参数为整数时，如何从numpy.bincount获得整数数组

5

考虑numpy数组 a

a = np.array([1, 0, 2, 1, 1])

如果我进行二进制计数，我会得到整数

np.bincount(a)

array([1, 3, 1])

但是如果我添加权重来执行等效的箱计数

np.bincount(a, np.ones_like(a))

array([ 1.,  3.,  1.])

相同的值但是使用float类型。最聪明的方式是如何将它们转换为int类型？为什么numpy不会假设与传递的权重相同的数据类型？

- piRSquared

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- MSeifert · Accepted Answer

为什么numpy不假设与传递的权重相同的dtype？

有两个原因：

1. 有几种方法可以加权计数，例如通过将值与权重相乘或将值乘以权重除以权重总和。在后一种情况下，它将始终是双倍（只是因为否则除法会不准确）。根据我的经验，使用标准化权重进行加权（第二种情况）更为常见。因此，假设它们是浮点数实际上是合理的（而且肯定更快）。

2. 溢出。计数不能超过整数限制，因为数组不能具有比此限制更多的值（这是合理的，否则您无法索引该数组）。但是，如果将它与权重相乘，则很容易使计数“溢出”。

我猜在这种情况下可能是后者的原因。

虽然不太可能有人使用真正的大整数权重和大量重复的值，但是请假设如果：

import numpy as np

i = 10000000
np.bincount(np.ones(100000000, dtype=int), weights=np.ones(10000000, dtype=int)*1000000000000)

将返回：

array([0, -8446744073709551616])

而不是实际结果：

array([  0.00000000e+00,   1.00000000e+19])

除此之外，第一个原因和事实是，将浮点数组转换为整数数组非常容易（个人认为这很琐碎）：

np.asarray(np.bincount(...), dtype=int)

可能是将float转换为加权bincount的“实际”返回数据类型。

“字面”原因：

numpy源代码实际上提到weights需要可转换为double（float64）：

/*
 * arr_bincount is registered as bincount.
 *
 * bincount accepts one, two or three arguments. The first is an array of
 * non-negative integers The second, if present, is an array of weights,
 * which must be promotable to double. Call these arguments list and
 * weight. Both must be one-dimensional with len(weight) == len(list). If
 * weight is not present then bincount(list)[i] is the number of occurrences
 * of i in list.  If weight is present then bincount(self,list, weight)[i]
 * is the sum of all weight[j] where list [j] == i.  Self is not used.
 * The third argument, if present, is a minimum length desired for the
 * output array.
 */

那么，他们在函数中将其转换为双精度浮点数。这就是你得到浮点数据类型结果的“字面”原因。