有几个问题。其中一个在tstrsplit
函数本身中定义为:
function (x, ..., fill = NA, type.convert = FALSE, keep, names = FALSE)
{
if (!isTRUEorFALSE(names) && !is.character(names))
stop("'names' must be TRUE/FALSE or a character vector.")
ans = transpose(strsplit(as.character(x), ...), fill = fill,
ignore.empty = FALSE)
if (!missing(keep)) {
keep = suppressWarnings(as.integer(keep))
chk = min(keep) >= min(1L, length(ans)) & max(keep) <=
length(ans)
if (!isTRUE(chk))
stop("'keep' should contain integer values between ",
min(1L, length(ans)), " and ", length(ans),
".")
ans = ans[keep]
}
if (type.convert)
ans = lapply(ans, type.convert, as.is = TRUE)
if (isFALSE(names))
return(ans)
else if (isTRUE(names))
names = paste0("V", seq_along(ans))
if (length(names) != length(ans)) {
str = if (missing(keep))
"ans"
else "keep"
stop("length(names) (= ", length(names), ") is not equal to length(",
str, ") (= ", length(ans), ").")
}
setattr(ans, "names", names)
ans
}
<bytecode: 0x0000019bffd6da98>
<environment: namespace:data.table>
重要的事情是注意
if
块,它检查您的
keep
是否适合返回。在您的示例中,第一行返回 NA。之所以在硬编码示例中有效是因为
strsplit
是向量化的,因此 NA 行与工作行同时运行,因此不会触发此
if
块。您可以通过将 4 更改为 40 来尝试此操作,然后您将获得此消息
Error in tstrsplit(ValueId, "-", fixed = TRUE, keep = 40) : 'keep' should contain integer values between 1 and 9.
,因为在这种情况下没有任何效果。
因此,您需要重新定义
tstrsplit
函数,以便它返回 NA 而不是停止。
tstrsplitNA<-function (x, ..., fill = NA, type.convert = FALSE, keep)
{
ans = transpose(strsplit(as.character(x), ...), fill = fill,
ignore.empty = FALSE)
if (!missing(keep)) {
keep = suppressWarnings(as.integer(keep))
chk = min(keep) >= min(1L, length(ans)) & max(keep) <=
length(ans)
if (!isTRUE(chk))
ans<-NA_character_
ans = ans[keep]
}
if (type.convert)
ans = lapply(ans, type.convert, as.is = TRUE)
return(ans)
ans
}
这还不够,因为strsplit
是向量化的,所以执行foo[, newvar := tstrsplitNA(ValueId, split="-", fixed = TRUE, keep = Level)]
不仅仅是逐行运行该函数,而是将整个ValueId
列提供给strsplit
,然后对其进行转置,返回与您想要的内容不符的无意义结果。
您可以通过使用Level
和ValueId
作为by
参数,告诉data.table逐行执行操作。
foo[, newvar := tstrsplitNA(ValueId, split="-", fixed = TRUE, keep = Level), by=c('Level','ValueId')]
foo
Level ValueId newvar
1: 2 11983:1055521 NA
2: 2 11983:1055521-5168:290668-198:100798 5168:290668
3: 3 11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771-5162:290728-5166:290620 198:100798
4: 4 11983:1055521-5168:290668-198:100798-92:91604-139:94569-135:94719-5161:290771 92:91604
5: 3 11983:1055521-5168:290676-198:100794-92:91781-139:95090-135:95353 198:100794