更新特定的R包及其依赖项

47

我在我的电脑系统(一台服务器)上安装了大约4000个R软件包,其中大部分已经过时,因为它们是在R-3.0.0之前构建的。现在我知道......

update.packages(checkBuilt=TRUE, ask=FALSE)

我希望更新所有的包,但那太慢了。问题在于用户并没有使用大部分的包,时不时地他们会要求我更新一个他们需要使用的包(比如说fields)。如果我运行

install.packages("fields")
即使字段依赖于地图,它只会更新包字段而不是包地图。因此,当我尝试加载包字段时:
即使字段依赖于地图,它只会更新包字段而不是包地图。因此,当我尝试加载包字段时:
library("fields")

我收到了一个错误信息

Error: package ‘maps’ was built before R 3.0.0: please re-install it

有没有一种方法可以升级字段,以便它还可以自动更新取决于其的包字段?


3
与其试图重新设计或重写R的包系统,您真正应该做的是咬紧牙关运行update.packages(checkBuilt=TRUE, ask=FALSE) - Dirk Eddelbuettel
2
我会从 ap <- available.packages(); pkgs <- tools::package_dependencies("fields",db=ap,recursive=TRUE) 开始。然后你需要过滤掉内置和推荐的包,并安装其余的包。(这不涉及依赖图的顺序,但可能适用于你的情况。) - Ben Bolker
不要撤销我所做的编辑,以使用正确的代码标记!您正在使用块引用标记>,而应该使用缩进4个空格的代码/预格式化标记。 - Gavin Simpson
哎呀!我撤销了你的编辑吗?我只是想在 install.packages(fields) 中的字段周围加上双引号。 - user3175783
@user3175783,是的,你说得对。没关系,我现在会加上引号。希望这个答案有用吗?可能需要一些工作来使它更加牢固,但这是一个开始。此外,请注意which参数。如果我使用fields并设置which = "most",你将需要安装近400个软件包!对于一些更受欢迎的软件包,您可能需要安装大块的CRAN,这种情况下,您可以在周末从CRAN更新所有内容。 - Gavin Simpson
@DirkEddelbuettel:为什么使用现有的update.packages()函数更好,即使它存在一些限制?您为什么认为编写新的软件包安装函数是试图“重新设计或重写R的软件包系统”?这难道不是试图改进 R 的包系统吗?我们使用的函数,如package_dependencies()installed.packages()难道不是专门为此目的而提供的吗? - Metamorphic
2个回答

23

如Ben在他的评论中指出的那样,您需要获取fields的依赖项,然后过滤掉Priority为"Base""Recommended"的软件包,然后将该软件包列表传递给install.packages()来进行安装处理。类似于:

instPkgPlusDeps <- function(pkg, install = FALSE,
                            which = c("Depends", "Imports", "LinkingTo"),
                            inc.pkg = TRUE) {
  stopifnot(require("tools")) ## load tools
  ap <- available.packages() ## takes a minute on first use
  ## get dependencies for pkg recursively through all dependencies
  deps <- package_dependencies(pkg, db = ap, which = which, recursive = TRUE)
  ## the next line can generate warnings; I think these are harmless
  ## returns the Priority field. `NA` indicates not Base or Recommended
  pri <- sapply(deps[[1]], packageDescription, fields = "Priority")
  ## filter out Base & Recommended pkgs - we want the `NA` entries
  deps <- deps[[1]][is.na(pri)]
  ## install pkg too?
  if (inc.pkg) {
    deps = c(pkg, deps)
  }
  ## are we installing?
  if (install) {
    install.packages(deps)
  }
  deps ## return dependencies
}

这样就得到了:

R> instPkgPlusDeps("fields")
Loading required package: tools
[1] "fields" "spam"   "maps"

与之匹配

> packageDescription("fields", fields = "Depends")
[1] "R (>= 2.13), methods, spam, maps"

如果在sapply()行中依赖于deps的软件包未安装,则会收到警告。但是,我认为这些警告是无害的,因为在这种情况下返回的值是NA,我们使用它来指示我们想要安装的软件包。如果您已安装了4000个软件包,则不太可能受到影响。

默认情况下不会安装软件包,只会返回依赖项列表。我觉得这样做最安全,因为您可能没有意识到所暗示的依赖关系链,并最终意外安装数百个软件包。如果您愿意安装所指示的软件包,请传递install = TRUE

请注意,我限制了搜索的依赖类型 - 如果使用which = "most",则会出现问题 - fields中有超过300个这样的依赖项,一旦您递归解析这些依赖关系(其中包括Suggests:字段),就会不断膨胀。 which = "all"将查找所有内容,包括Enhances:,这将再次是一个更大的软件包列表。有关which参数的有效输入,请参见?tools::package_dependencies


这个可行!谢谢Gavin。顺便说一下,我将命令install.packages(deps)编辑为install.packages(c(pkg,deps)) - user3175783
我猜,考虑到我给函数起的名字,那个改变应该被做出来。因为有两个其他用户拒绝了它,所以我会进行修改。我也会看看我能做些什么,他们不应该拒绝。 - Gavin Simpson

12
我的回答建立在Gavin的回答之上...请注意,原帖用户3175783要求更智能化的update.packages()版本。该函数跳过已经更新的包的安装。但是,Gavin的解决方案会安装一个包及其所有依赖项,无论它们是否是最新的。我使用了Gavin跳过基本包的提示(它们实际上不能被安装),并编写了一个跳过最新包的方案。
主要函数是installPackages()。这个函数及其辅助函数对以给定一组包为根的依赖树进行拓扑排序。结果列表中的包会被检查是否过时,并逐一安装。下面是一些示例输出:
> remove.packages("tibble")
Removing package from ‘/home/frederik/.local/lib/x86_64/R/packages’
(as ‘lib’ is unspecified)
> installPackages(c("ggplot2","stringr","Rcpp"), dry_run=T)
##  Package  digest  is out of date ( 0.6.9 < 0.6.10 )
Would have installed package  digest 
##  Package  gtable  is up to date ( 0.2.0 )
##  Package  MASS  is up to date ( 7.3.45 )
##  Package  Rcpp  is out of date ( 0.12.5 < 0.12.8 )
Would have installed package  Rcpp 
##  Package  plyr  is out of date ( 1.8.3 < 1.8.4 )
Would have installed package  plyr 
##  Package  stringi  is out of date ( 1.0.1 < 1.1.2 )
Would have installed package  stringi 
##  Package  magrittr  is up to date ( 1.5 )
##  Package  stringr  is out of date ( 1.0.0 < 1.1.0 )
Would have installed package  stringr 
...
##  Package  lazyeval  is out of date ( 0.1.10 < 0.2.0 )
Would have installed package  lazyeval 
##  Package  tibble  is not currently installed, installing
Would have installed package  tibble 
##  Package  ggplot2  is out of date ( 2.1.0 < 2.2.0 )
Would have installed package  ggplot2 

以下是代码,抱歉长度有点长:
library(tools)

# Helper: a "functional" interface depth-first-search
fdfs = function(get.children) {
  rec = function(root) {
    cs = get.children(root);
    out = c();
    for(c in cs) {
      l = rec(c);
      out = c(out, setdiff(l, out));
    }
    c(out, root);
  }
  rec
}

# Entries in the package "Priority" field which indicate the
# package can't be upgraded. Not sure why we would exclude
# recommended packages, since they can be upgraded...
#excl_prio = c("base","recommended")
excl_prio = c("base")

# Find the non-"base" dependencies of a package.
nonBaseDeps = function(packages,
  ap=available.packages(),
  ip=installed.packages(), recursive=T) {

  stopifnot(is.character(packages));
  all_deps = c();
  for(p in packages) {
    # Get package dependencies. Note we are ignoring version
    # information
    deps = package_dependencies(p, db = ap, recursive = recursive)[[1]];
    ipdeps = match(deps,ip[,"Package"])
    # We want dependencies which are either not installed, or not part
    # of Base (e.g. not installed with R)
    deps = deps[is.na(ipdeps) | !(ip[ipdeps,"Priority"] %in% excl_prio)];
    # Now check that these are in the "available.packages()" database
    apdeps = match(deps,ap[,"Package"])
    notfound = is.na(apdeps)
    if(any(notfound)) {
      notfound=deps[notfound]
      stop("Package ",p," has dependencies not in database: ",paste(notfound,collapse=" "));
    }
    all_deps = union(deps,all_deps);
  }
  all_deps
}

# Return a topologically-sorted list of dependencies for a given list
# of packages. The output vector contains the "packages" argument, and
# recursive dependencies, with each dependency occurring before any
# package depending on it.
packageOrderedDeps = function(packages, ap=available.packages()) {

  # get ordered dependencies
  odeps = sapply(packages,
    fdfs(function(p){nonBaseDeps(p,ap=ap,recursive=F)}))
  # "unique" preserves the order of its input
  odeps = unique(unlist(odeps));

  # sanity checks
  stopifnot(length(setdiff(packages,odeps))==0);
  seen = list();
  for(d in odeps) {
    ddeps = nonBaseDeps(d,ap=ap,recursive=F)
    stopifnot(all(ddeps %in% seen));
    seen = c(seen,d);
  }

  as.vector(odeps)
}

# Checks if a package is up-to-date. 
isPackageCurrent = function(p,
  ap=available.packages(),
  ip=installed.packages(),
  verbose=T) {

    if(verbose) msg = function(...) cat("## ",...)
    else msg = function(...) NULL;

    aprow = match(p, ap[,"Package"]);
    iprow = match(p, ip[,"Package"]);
    if(!is.na(iprow) && (ip[iprow,"Priority"] %in% excl_prio)) {
      msg("Package ",p," is a ",ip[iprow,"Priority"]," package\n");
      return(T);
    }
    if(is.na(aprow)) {
      stop("Couldn't find package ",p," among available packages");
    }
    if(is.na(iprow)) {
      msg("Package ",p," is not currently installed, installing\n");
      F;
    } else {
      iv = package_version(ip[iprow,"Version"]);
      av = package_version(ap[aprow,"Version"]);
      if(iv < av) {
        msg("Package ",p," is out of date (",
            as.character(iv),"<",as.character(av),")\n");
        F;
      } else {
        msg("Package ",p," is up to date (",
            as.character(iv),")\n");
        T;
      }
    }
}

# Like install.packages, but skips packages which are already
# up-to-date. Specify dry_run=T to just see what would be done.
installPackages =
    function(packages,
             ap=available.packages(), dry_run=F,
             want_deps=T) {

  stopifnot(is.character(packages));

  ap=tools:::.remove_stale_dups(ap)
  ip=installed.packages();
  ip=tools:::.remove_stale_dups(ip)

  if(want_deps) {
    packages = packageOrderedDeps(packages, ap);
  }

  for(p in packages) {
    curr = isPackageCurrent(p,ap,ip);
    if(!curr) {
      if(dry_run) {
        cat("Would have installed package ",p,"\n");
      } else {
        install.packages(p,dependencies=F);
      }
    }
  }
}

# Convenience function to make sure all the libraries we have loaded
# in the current R session are up-to-date (and to update them if they
# are not)
updateAttachedLibraries = function(dry_run=F) {
  s=search();
  s=s[grep("^package:",s)];
  s=gsub("^package:","",s)
  installPackages(s,dry_run=dry_run);
}

1
如果您给出负面评价,请在之前或之后留下评论,这样我就知道需要修正什么... - Metamorphic
1
可能有人因为这个答案的复杂性而给它点了踩,但我认为这是一个非常好的答案,尽管仍然很复杂。 - jangorecki
当我尝试运行这个函数时,出现了以下错误: Error in nonBaseDeps(p, ap = ap, recursive = F) : Package dplyr has dependencies not in database: methods utils Calls: installPackages ... sapply -> lapply -> FUN -> get.children -> nonBaseDeps Execution halted 你知道是什么原因导致的吗?不过从@Gavin那里获取的版本可以正常运行。 - understorey

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接