我目前正在尝试使用自助法(bootstrap method)在一些参数上实现置信区间,但是我遇到了一个小问题。即使我使用了3000个样本,我的置信区间也变化很大。
这是情况:
我有一个数据集,约有300个点,以传统的y=f(x)方式定义。我知道适合数据的模型。因此我使用curve_fit找到参数,并尝试为每个参数建立置信区间。我尝试混合描述如下方法:
confidence interval with leastsq fit in scipy python
和这里:
http://www.variousconsequences.com/2010/02/visualizing-confidence-intervals.html
这是我使用的代码:
def model(t, Vs, Vi, k):
"""
Fitting model, following a Burst kinetics.
t is the time
Vs is the steady velocity
Vi is the initial velocity
k is the Burst rate constant
"""
y = Vs * t - ((Vs - Vi) * (1 - np.exp(-k * t)) / k)
return y
[some code]
bootindex = np.random.random_integers
nboot = 3000
local_t = np.array(local_t)
local_fluo = np.array(local_fluo)
concentration = np.array(concentration)
#Initializing time values in hours
local_scaled_t = [ index /3600 for index in local_t ]
local_scaled_t = np.array(local_scaled_t)
conc_produit = [ concentration[0] - value_conc for value_conc in concentration ]
conc_produit = np.array(conc_produit)
popt, pcov = curve_fit(model, local_scaled_t, conc_produit, maxfev=3000)
popt = [ popt[0] / 3600, popt[1] / 3600 , popt[2] / 3600 ]
ymod = list()
for each in local_t:
ymod.append(model(each, popt[0], popt[1], popt[2]))
ymod = np.array(ymod)
r = conc_produit - ymod
list_para = list()
# loop over n bootstrap samples from the resids
for i in range(nboot):
pc, pout = curve_fit(model, local_scaled_t, ymod + r[bootindex(0, len(r)-1, len(r))], maxfev=3000)
pc = [ pc[0] / 3600, pc[1] / 3600 , pc[2] / 3600 ]
list_para.append(pc)
ymod = list()
for each in local_t:
ymod.append(model(each, pc[0], pc[1], pc[2]))
ymod = np.array(ymod)
list_para = np.array(list_para)
mean_params = np.mean(list_para,0)
std_params = np.std(list_para,0)
print(popt)
for true_para, para, std in zip(popt, mean_params, std_params):
print("{0} between {1} and {2}".format(round(true_para, 6), round(para - std * 1.95996, 6), round(para + std * 1.95996, 6)))
print("{0} between {1} and {2}".format(round(true_para, 6), round(para - std * 1.95996, 6), round(para + std * 1.95996, 6)))
这里没有什么复杂的,只是需要注意我重新调整了时间以标准化我的数据,并获得更好的参数。
最后,这是同一段代码的两个输出:
[1.9023455671995163e-05, 0.01275941716148471, 0.026540319119773129]
1.9e-05 between 1.6e-05 and 2.1e-05
0.012759 between -0.042697 and 0.092152
0.02654 between -0.073456 and 0.159983
[1.9023455671995163e-05, 0.01275941716148471, 0.026540319119773129]
1.9e-05 between 1.5e-05 and 2.9e-05
0.012759 between -0.116499 and 0.17112
0.02654 between -0.186011 and 0.27797
如您所见,差异非常大。这是否符合预期或者我做错了什么?例如,我不太明白为什么要将标准偏差乘以1.95996。
vcov
矩阵的对角线元素。除非您想要多维置信区域:http://en.wikipedia.org/wiki/Confidence_region - CT Zhu