无法在R中对data.table使用dput函数

14

我有以下的data.table,但我不能使用dput命令的输出结果来重新创建它:

> ddt
   Unit Anything index new
1:    A      3.4     1   1
2:    A      6.9     2   1
3:   A1      1.1     1   2
4:   A1      2.2     2   2
5:    B      2.0     1   3
6:    B      3.0     2   3
> 
> 
> str(ddt)
Classes ‘data.table’ and 'data.frame':  6 obs. of  4 variables:
 $ Unit    : Factor w/ 3 levels "A","A1","B": 1 1 2 2 3 3
 $ Anything: num  3.4 6.9 1.1 2.2 2 3
 $ index   : num  1 2 1 2 1 2
 $ new     : int  1 1 2 2 3 3
 - attr(*, ".internal.selfref")=<externalptr> 
 - attr(*, "sorted")= chr  "Unit" "Anything"
> 
> 
> dput(ddt)
structure(list(Unit = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("A", 
"A1", "B"), class = "factor"), Anything = c(3.4, 6.9, 1.1, 2.2, 
2, 3), index = c(1, 2, 1, 2, 1, 2), new = c(1L, 1L, 2L, 2L, 3L, 
3L)), .Names = c("Unit", "Anything", "index", "new"), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x8948f68>, sorted = c("Unit", 
"Anything"))
> 

粘贴时我遇到了以下错误:

> dt = structure(list(Unit = structure(c(1L, 1L, 2L, 2L, 3L, 3L), .Label = c("A", 
+ "A1", "B"), class = "factor"), Anything = c(3.4, 6.9, 1.1, 2.2, 
+ 2, 3), index = c(1, 2, 1, 2, 1, 2), new = c(1L, 1L, 2L, 2L, 3L, 
+ 3L)), .Names = c("Unit", "Anything", "index", "new"), row.names = c(NA, 
+ -6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x8948f68>, sorted = c("Unit", 
Error: unexpected '<' in:
"3L)), .Names = c("Unit", "Anything", "index", "new"), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"), .internal.selfref = <"
> "Anything"))
Error: unexpected ')' in ""Anything")"

问题出在哪里,如何进行纠正?感谢您的帮助。

3个回答

17

问题在于 dput 打印出了外部指针地址(这是 data.table 在内部使用的内容,需要在必要时重新构建),你无法真正使用它。

如果您手动剪切掉 .internal.selfref 部分,它将完美地工作,但对于某些操作,data.table 会有一次性的投诉。

您可以向 data.table 添加一个功能请求以解决此问题,但这将需要修改从 data.table 派生的基本函数,类似于当前处理的 rbind


我曾经遇到过同样的问题,我所做的就是将我的因子列转换为字符列,然后它就正常工作了。 - Johnny5ish

6

我也觉得这种行为很烦人。因此,我创建了自己的dput函数,忽略.internal.selfref属性。

dput <- function (x, file = "", control = c("keepNA", "keepInteger", 
                                    "showAttributes")) 
{
  if (is.character(file)) 
    if (nzchar(file)) {
      file <- file(file, "wt")
      on.exit(close(file))
    }
  else file <- stdout()
  opts <- .deparseOpts(control)
  # adding these three lines for data.tables
  if (is.data.table(x)) {
    setattr(x, '.internal.selfref', NULL)
  }
  if (isS4(x)) {
    clx <- class(x)
    cat("new(\"", clx, "\"\n", file = file, sep = "")
    for (n in .slotNames(clx)) {
      cat("    ,", n, "= ", file = file)
      dput(slot(x, n), file = file, control = control)
    }
    cat(")\n", file = file)
    invisible()
  }
  else .Internal(dput(x, file, opts))
}

谢谢你的回答。你确定这不会影响dput输出其他所有对象吗?我们可以将此函数重命名为dputdt,仅用于data.table对象。 - rnso
3
为什么这么复杂?使用 dput = function(x, ...) { if(is.data.table(x)) { setattr(x, '.internal.selfref', NULL) }; base::dput(x, ...) } 不就可以了吗?或者更好的是,用 inherits 替换 is.data.table - eddi

1
如果你已经使用dput命令上传了文件,而且不想在dget之前手动编辑太多内容的话,你可以使用以下方法。
data.table.parse<-function (file = "", n = NULL, text = NULL, prompt = "?", keep.source = getOption("keep.source"), 
                            srcfile = NULL, encoding = "unknown") 
{
  keep.source <- isTRUE(keep.source)
  if (!is.null(text)) {
    if (length(text) == 0L) 
      return(expression())
    if (missing(srcfile)) {
      srcfile <- "<text>"
      if (keep.source) 
        srcfile <- srcfilecopy(srcfile, text)
    }
    file <- stdin()
  }
  else {
    if (is.character(file)) {
      if (file == "") {
        file <- stdin()
        if (missing(srcfile)) 
          srcfile <- "<stdin>"
      }
      else {
        filename <- file
        file <- file(filename, "r")
        if (missing(srcfile)) 
          srcfile <- filename
        if (keep.source) {
          text <- readLines(file, warn = FALSE)
          if (!length(text)) 
            text <- ""
          close(file)
          file <- stdin()
          srcfile <- srcfilecopy(filename, text, file.mtime(filename), 
                                 isFile = TRUE)
        }
        else {
          text <- readLines(file, warn = FALSE)
          if (!length(text)) {
            text <- ""
          } else {
            text <- gsub("(, .internal.selfref = <pointer: 0x[0-9A-Fa-f]+>)","",text,perl=TRUE)
          }
          on.exit(close(file))
        }
      }
    }
  }
  #  text <- gsub("(, .internal.selfref = <pointer: 0x[0-9A-F]+>)","",text)
  .Internal(parse(file, n, text, prompt, srcfile, encoding))
}
data.table.get <- function(file, keep.source = FALSE)
  eval(data.table.parse(file = file, keep.source = keep.source))
dtget <- data.table.get

然后将您的dget调用更改为dtget。请注意,由于内联解析,这将使dtgetdget慢,因此仅在检索data.table类型对象的情况下使用它。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接