将数据转化为正态分布。对于给定的情况，最佳函数是什么？

Question

将数据转化为正态分布。对于给定的情况，最佳函数是什么？

8

有没有一种函数或软件包可以寻找最佳（或其中一个最佳）变量转换方式，以使模型残差尽可能符合正态分布？

例如：

frml = formula(some_tranformation(A) ~ B+I(B^2)+B:C+C)
model = aov(formula, data=data)
shapiro.test(residuals(model))

有没有一种函数可以告诉我们，哪个函数 some_transformation（） 可以优化残差的正态性？

- Remi.b

为什么不进行多个正态性检验，并通过模型比较每个统计量呢？ - statquant

2

@statquant 因为那是一个非常糟糕的想法？ - hadley

2

@statquant 基本上，正态性检验对于与其他测试不同的正态性偏离更为敏感。例如，适度的峰度或轻微的偏斜对于 t 检验几乎没有影响，但正态性检验会拒绝原假设。这是一个非常常见的讨论话题。 - hadley

1

@statquant，这并不会使其更具统计学有效性。你可能需要阅读一些相关问题的资料。 - hadley

1

除非您给我明确的参考，否则恐怕您在这里没有表达任何观点... - statquant

显示剩余4条评论

2个回答

6

很遗憾，在统计学中这个问题还没有得到解决。用户@statquant建议的方法是目前最好的选择，但也存在一些缺陷。

需要注意的一点是，像shapiro.test这样的正态性检验在获得合理的样本量（即数百个）后对变化非常敏感，因此您不应该盲目依赖它们。

我自己认为这个问题太难了。如果数据看起来至少不是正态分布的话，那么我会尝试找到一个非参数化版本的所需统计数据。

- Scott Ritchie

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Roland · Accepted Answer

您是指Box-Cox变换吗？

library(car)
m0 <- lm(cycles ~ len + amp + load, Wool)
plot(m0, which=2)

enter image description here

# Box Cox Method, univariate
summary(p1 <- powerTransform(m0))
# bcPower Transformation to Normality 
# 
#    Est.Power Std.Err. Wald Lower Bound Wald Upper Bound
# Y1   -0.0592   0.0611          -0.1789           0.0606
# 
# Likelihood ratio tests about transformation parameters
#                              LRT df      pval
# LR test, lambda = (0)  0.9213384  1 0.3371238
# LR test, lambda = (1) 84.0756559  1 0.0000000


# fit linear model with transformed response:
coef(p1, round=TRUE)
summary(m1 <- lm(bcPower(cycles, p1$roundlam) ~ len + amp + load, Wool))
plot(m1, which=2)

enter image description here