R: 变量排名模型自动化代码编写为函数

3
我该如何将下面的命令列表写成一个函数?
例如:VariableRanking <- function(formula, variables,.....) { 插入命令........ }

#Variable Ranking Model automation
#exclusion of the variables that are not model variables
exclude <- c("~,", "+" ) # exclude target which is bound_count for Property
formula <- toString(formula)
formula

#listing the entire model formula out
variables_pre <- unlist(strsplit(formula, split = " "))
variables_pre

#keeping only the model variables
variables <-  sort(variables_pre[!variables_pre %in% exclude])
variables

#Exclude "," on the target variable 
variables[1] <- substr(variables[1], 1, nchar(variables[1])-1)
variables

#Assigning the variables into a data frame
d <- c(1:length(variables))
d
d= data.frame(d)
d
d= t(d)
d
colnames(d)=variables
d

# exclude target variable on the data frame
allvariables <- colnames(d)[-1]
allvariables
# container for models
listOfModels <- vector("list", length(allvariables))
listOfModels
# loop over variables
for (i in seq_along(allvariables)) {
  # exclude variable i
  currentvariable <- allvariables[-i]
  # programmatically assemble regression formula
  regressionFormula <- as.formula(
    paste(variables[1],"~", paste(currentvariable, collapse="+")))
  # fit model
  currentModel <- glm(formula = regressionFormula, family=binomial(link = "logit"), data=dataL_TT)
  # store model in container
  listOfModels[[i]] <- currentModel
} 
listOfModels

#List of AICs for each model 
lapply(listOfModels,function(xx) xx$aic)

#Assign X as the AIC of the full model
X <- modelTT$aic
X

# Difference of AICs of each model to the AIC of the full model
AICdifference <- lapply(listOfModels,function(xx) xx$aic - X)
AICdifference

# Naming the AIC Difference
AICdifference2 = data.frame(variables=allvariables, AICdiff=unlist(AICdifference))
AICdifference2

#Graph the Barchart of the AIC decrease of each variables and save it to pdf

pdf("Barchart.pdf",width=12,height=10)
par(mar=c(2,18,2,5))

barplot(sort(AICdifference2$AICdiff, decreasing = F), main="Variable Ranking based on AIC decrease", 
        horiz=TRUE, xlab="AIC Increase", names.arg= AICdifference2$variables[order(AICdifference2$AICdiff, decreasing = F)], 
        las=1, col= 'dodgerblue4')

dev.off()

有可能吗?因为它有很多参数。 基本上我只需要AICdifference2数据框的输出。 以及将条形图保存为pdf并弹出。


我认为在函数的参数中,你需要使用excludeformuladata_LTT。最好展示一个小的可重现的例子。 - akrun
“exclude <- c("~,", "+")” 不就是一个固定值吗?我认为他需要的是 formuladata_LTTmodelTT - LAP
1个回答

2

试试这个:

FOO <- function(myformula, data, fullmodel_AIC, plotname){

  exclude <- c("~,", "+" ) # exclude target which is bound_count for Property
  myformula <- toString(myformula)

  variables_pre <- unlist(strsplit(myformula, split = " "))
  variables <-  sort(variables_pre[!variables_pre %in% exclude])
  variables[1] <- substr(variables[1], 1, nchar(variables[1])-1)

  d <- t(data.frame(c(1:length(variables))))
  colnames(d)=variables

  allvariables <- colnames(d)[-1]

  listOfModels <- vector("list", length(allvariables))

  for (i in seq_along(allvariables)) {
    # exclude variable i
    currentvariable <- allvariables[-i]
    # programmatically assemble regression formula
    regressionFormula <- as.formula(
      paste(variables[1],"~", paste(currentvariable, collapse="+")))
    # fit model
    currentModel <- glm(formula = regressionFormula, family=binomial(link = "logit"), data = data)
    # store model in container
    listOfModels[[i]] <- currentModel
  } 

  AICdifference <- lapply(listOfModels,function(xx) xx$aic - fullmodel_AIC)
  AICdifference2 <- data.frame(variables=allvariables, AICdiff=unlist(AICdifference))

  pdf(paste0(plotname, ".pdf"),width=12,height=10)
  par(mar=c(2,18,2,5))

  barplot(sort(AICdifference2$AICdiff, decreasing = F), main="Variable Ranking based on AIC decrease", 
          horiz=TRUE, xlab="AIC Increase", names.arg= AICdifference2$variables[order(AICdifference2$AICdiff, decreasing = F)], 
          las=1, col= 'dodgerblue4')

  dev.off()

  return(AICdifference2)
}

您需要四个参数: myformula(公式),data(在您的代码中为dataL_TT),fullmodel_AIC(在您的代码中为modelTT$aic)和一个字符串来命名您的绘图。
尝试使用FOO(myformula, dataL_TT, modelTT$aic, "test")进行调用,并将您的公式对象插入到myformula中。
我将formula更改为myformula,因为formula是stats包的基本函数,使用基本函数的对象名称通常是不明智的。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接