我正在尝试使用np.std(array,ddof = 0)来计算方差。如果我碰巧有一个很长的delta数组,也就是说,数组中的所有值都相同,就会出现问题。它不返回std=0,而是返回一些小值,这反过来又会导致进一步的估计误差。平均值被正确地返回...... 例子:
np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)
输出结果为1.80411241502e-16。
但是
np.std([0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],ddof = 0)
给定std = 0
除了每次迭代都检查数据的唯一性而不计算std之外,是否有其他方法可以克服这个问题?
谢谢
P.S. 根据Is floating point math broken?的标记为重复,复制@kxr的回复以解释为什么这是一个不同的问题:
"当前的重复标记是错误的。它不仅仅是关于简单的浮点比较,而是关于使用np.std对长数组进行内部聚合小误差的结果 - 正如提问者额外指出的那样。例如比较>>> np.std([0.1, 0.1, 0.1, 0.1, 0.1, 0.1]*200000) -> 2.0808632594793153e-12
。因此,他可以通过以下方式解决:>>> mean = a.mean(); xmean = round(mean, int(-log10(mean)+9)); std = np.sqrt(((a - xmean) ** 2).sum()/ a.size)
"
问题确实始于浮点表示,但并不止于此。
@kxr - 我感谢你的评论和示例
>>> np.std([0.1, 0.1, 0.1, 0.1, 0.1, 0.1]*200000) -> 2.0808632594793153e-12
。因此,他可以通过以下方式解决:>>> mean = a.mean(); xmean = round(mean, int(-log10(mean)+9)); std = np.sqrt(((a - xmean) ** 2).sum()/ a.size)
。 - kxr