如何在R中编写一个多参数对数似然函数

Question

如何在R中编写一个多参数对数似然函数

7

我将估计以下问题的能力。我有兴趣比较两个遵循Weibull分布的组。因此，A组有两个参数（形状参数=a1，比例参数=b1），B组也有两个参数（a2，b2）。通过从感兴趣的分布中模拟随机变量（例如假设不同的比例和形状参数，即a1=1.5*a2，b1=b2*0.5；或者两组之间的差异仅在于形状或比例参数），应用对数似然比检验来测试a1=a2和b1=b2（或者例如当我们知道b1=b2时，a1=a1），并估计测试的能力。

问题是全模型的对数似然值是多少，以及如何在R中编码，当a）有精确数据时，b）对于区间截断数据？

也就是说，在减少模型时（当a1=a2，b1=b2时），精确数据和区间截断数据的对数似然值为：

LL.reduced.exact <- function(par,data){sum(log(dweibull(data,shape=par[1],scale=par[2])))};
LL.reduced.interval.censored<-function(par, data.lower, data.upper) {sum(log((1-pweibull(data.lower, par[1], par[2])) – (1-pweibull(data.upper, par[1],par[2]))))}

当a1！= a2，b1！= b2时，考虑到两种不同的观测方案，即需要估计4个参数（或者在关注形状参数差异时，需要估计3个参数），完整模型有什么用途？

通过构建两个单独组的对数似然值并将它们相加（即LL.full< -LL.group1 + LL.group2），是否可以进行估计？

关于区间截尾数据的对数似然值，截尾是无信息的，所有观察结果都是区间截尾的。如有任何更好的执行此任务的想法，将不胜感激。

请查找下面的 R代码以说明问题。非常感谢您的帮助。

R Code:    
# n (sample size) = 500
# sim (number of simulations) = 1000
# alpha  = .05
# Parameters of Weibull distributions: 
   #group 1: a1=1, b1=20
   #group 2: a2=1*1.5 b2=b1

n=500
sim=1000
alpha=.05
a1=1
b1=20
a2=a1*1.5
b2=b1
#OR: a1=1, b1=20, a2=a1*1.5, b2=b1*0.5 

# the main question is how to build this log-likelihood model, when a1!=a2, and b1=b2
# (or a1!=a2, and b1!=b2)
LL.full<-????? 
LL.reduced <- function(par,data){sum(log(dweibull(data,shape=par[1],scale=par[2])))}

LR.test<-function(red,full,df) {
lrt<-(-2)*(red-full)
pvalue<-1-pchisq(lrt,df)
return(data.frame(lrt,pvalue))
}

rejections<-NULL

for (i in 1:sim) {

RV1<-rweibull (n, a1, b1)
RV2<-rweibull (n, a2, b2)
RV.Total<-c(RV1, RV2)

par.start<-c(1, 15)

mle.full<- ????????????  
mle.reduced<-optim(par.start, LL, data=RV.Total, control=list(fnscale=-1))

LL.full<-????? 
LL.reduced<-mle.reduced$value

LRT<-LR.test(LL.reduced, LL.full, 1)

rejections1<-ifelse(LRT$pvalue<alpha,1,0)
rejections<-c(rejections, rejections1)
}

table(rejections)
sum(table(rejections)[[2]])/sim   # estimated power

- user36478

1

这个问题似乎不适合在 Stack Overflow 上讨论，因为它涉及到如何推导对数似然函数，超出了 Stack Overflow 的范围。应该将其迁移到 stats.stackexchange.com。 - Roland

问题可以通过稍微改一下措辞变得符合主题，例如“如何在R中编写多参数对数似然函数”。 - Nate Pope

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Nate Pope · Accepted Answer

是的，您可以对两个组的对数似然值进行求和（如果它们是分别计算的）。就像您会为观察向量求和一样，其中每个观测具有不同的生成参数。

我更喜欢根据协变量结构（例如，群组成员资格）变化的大型向量（即形状参数）来思考。在线性模型上下文中，该向量可以等于线性预测器（经过链接函数适当转换后）：设计矩阵和回归系数向量的点积。

以下是一个（未功能化）示例：

## setup true values
nobs = 50 ## number of observations
a1 = 1  ## shape for first group
b1 = 2  ## scale parameter for both groups
beta = c(a1, a1 * 1.5)  ## vector of linear coefficients (group shapes)

## model matrix for full, null models
mm_full = cbind(grp1 = rep(c(1,0), each = nobs), grp2 = rep(c(0,1), each = nobs))
mm_null = cbind(grp1 = rep(1, nobs*2))

## shape parameter vector for the full, null models
shapes_full = mm_full %*% beta ## different shape parameters by group (full model)
shapes_null = mm_null %*% beta[1] ## same shape parameter for all obs
scales = rep(b1, length(shapes_full)) ## scale parameters the same for both groups

## simulate response from full model
response = rweibull(length(shapes_full), shapes_full, scales)

## the log likelihood for the full, null models:
LL_full = sum(dweibull(response, shapes_full, scales, log = T)) 
LL_null = sum(dweibull(response, shapes_null, scales, log = T)) 

## likelihood ratio test
LR_test = function(LL_null, LL_full, df) {
    LR = -2 * (LL_null - LL_full) ## test statistic
    pchisq(LR, df = df, ncp = 0, lower = F) ## probability of test statistic under central chi-sq distribution
    }
LR_test(LL_null, LL_full, 1) ## 1 degrees freedom (1 parameter added)

为了编写一个对数似然函数，以找到一个Weibull模型的极大似然估计(即MLE)，其中形状参数是一些协变量的线性函数，你可以采用相同的方法：

## (negative) log-likelihood function
LL_weibull = function(par, data, mm, inv_link_fun = function(.) .){
    P = ncol(mm) ## number of regression coefficients
    N = nrow(mm) ## number of observations
    shapes = inv_link_fun(mm %*% par[1:P]) ## shape vector (possibly transformed)
    scales = rep(par[P+1], N) ## scale vector
    -sum(dweibull(data, shape = shapes, scale = scales, log = T)) ## negative log likelihood
    }

那么你的电力模拟可能会像这样：

## function to simulate data, perform LRT
weibull_sim = function(true_shapes, true_scales, mm_full, mm_null){
    ## simulate response
    response = rweibull(length(true_shapes), true_shapes, true_scales)

    ## find MLE
    mle_full = optim(par = rep(1, ncol(mm_full)+1), fn = LL_weibull, data = response, mm = mm_full) 
    mle_null = optim(par = rep(1, ncol(mm_null)+1), fn = LL_weibull, data = response, mm = mm_null)

    ## likelihood ratio test
    df = ncol(mm_full) - ncol(mm_null)
    return(LR_test(-mle_null$value, -mle_full$value, df))
    }

## run simulations
nsim = 1000
pvals = sapply(1:nsim, function(.) weibull_sim(shapes_full, scales, mm_full, mm_null) )

## calculate power
alpha = 0.05
power = sum(pvals < alpha) / nsim

在上面的例子中，身份链接运作良好，但对于更复杂的模型可能需要某种形式的转换。

您不必在对数似然函数中使用线性代数 - 显然，您可以以任何您认为合适的方式构造形状向量（只要您明确地索引向量par中的适当生成参数）。

区间截止数据

Weibull分布的累积分布函数F(T)（在R中使用pweibull）给出了在时间T之前故障的概率。因此，如果观察值在时间T[0]和T[1]之间被截尾，那么对象在T[0]和T[1]之间失败的概率为F(T[1]) - F(T[0])：对象在T[1]之前失败的概率减去它在T[0]之前失败的概率（PDF在T[0]和T[1]之间的积分）。因为Weibull CDF已经在R中实现，所以上述的似然函数修改起来很容易：

LL_ic_weibull <- function(par, data, mm){
    ## 'data' has two columns, left and right times of censoring interval
    P = ncol(mm) ## number of regression coefficients
    shapes = mm %*% par[1:P]
    scales = par[P+1]
    -sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
    }

或者，如果您不想使用模型矩阵等内容，只想通过组索引形状参数向量，您可以执行以下操作：

LL_ic_weibull2 <- function(par, data, nobs){
    ## 'data' has two columns, left and right times of censoring interval
    ## 'nobs' is a vector that contains the num. observations for each group (grp1, grp2, ...)
    P = length(nobs) ## number of regression coefficients
    shapes = rep(par[1:P], nobs)
    scales = par[P+1]
    -sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
    }

测试两个函数是否给出相同的解：

## generate intervals from simulated response (above)
left = ifelse(response - 0.2 < 0, 0, response - 0.2)
right = response + 0.2
response_ic = cbind(left, right)

## find MLE w/ first LL function (model matrix)
mle_ic_full = optim(par = c(1,1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_full)
mle_ic_null = optim(par = c(1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_null)

## find MLE w/ second LL function (groups only)
nobs_per_group = apply(mm_full, 2, sum) ## just contains number of observations per group
nobs_one_group = nrow(mm_null) ## one group so only one value
mle_ic_full2 = optim(par = c(1,1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_per_group)
mle_ic_null2 = optim(par = c(1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_one_group)