计算截距的P值需要执行以下步骤:
- 从用截距的平均值和标准误差计算出的t值(参见下面的函数)开始计算
- 然后使用t分布的生存函数和自由度(参见下面的函数)计算P值。
Python代码示例(适用于scipy):
import scipy.stats
from scipy import stats
import numpy as np
def tvalue(mean, stderr):
return mean / stderr
def pvalue(tvalue, dof):
return 2*scipy.stats.t.sf(abs(tvalue), dof)
np.random.seed(42)
x = np.random.random(10)
y = np.random.random(10)
scipy_results = stats.linregress(x,y)
print(scipy_results)
dof = 1.0*len(x) - 2
print("degrees of freedom = ", dof)
tvalue_intercept = tvalue(scipy_results.intercept, scipy_results.intercept_stderr)
tvalue_slope = tvalue(scipy_results.slope, scipy_results.stderr)
pvalue_intercept = pvalue(tvalue_intercept, dof)
pvalue_slope = pvalue(tvalue_slope, dof)
print(f"""tvalues(intercept, slope) = {tvalue_intercept, tvalue_slope}
pvalues(intercept, slope) = {pvalue_intercept, pvalue_slope}
""")
输出:
LinregressResult(slope=0.6741948478345656, intercept=0.044594333294114996, rvalue=0.7042846127289285, pvalue=0.02298486740535295, stderr=0.24027039310814322, intercept_stderr=0.14422953722007206)
degrees of freedom = 8.0
tvalues(intercept, slope) = (0.30919001858870915, 2.8059838713924172)
pvalues(intercept, slope) = (0.7650763497698203, 0.02298486740535295)
与您使用 statsmodels
获得的结果进行比较:
import statsmodels.api as sm
import math
X = sm.add_constant(x)
model = sm.OLS(y,X)
statsmodels_results = model.fit()
print(f"""intercept, slope = {statsmodels_results.params}
rvalue = {math.sqrt(statsmodels_results.rsquared)}
tvalues(intercept, slope) = {statsmodels_results.tvalues}
pvalues(intercept, slope) = {statsmodels_results.pvalues}""")
输出:
intercept, slope = [0.04459433 0.67419485]
rvalue = 0.7042846127289285
tvalues(intercept, slope) = [0.30919002 2.80598387]
pvalues(intercept, slope) = [0.76507635 0.02298487]
注释
- 固定随机种子以获得可复现的结果
- 使用
LinregressResult
对象,该对象还包含 intercept_stderr
参考资料