有人能解释一下这里到底发生了什么吗?当一个变量被编码为因子(factor)并且nchar强制转换为字符时,为什么该函数不能有效地计算字符数?
> x <- c("73210", "73458", "73215", "72350")
> nchar(x)
[1] 5 5 5 5
>
> x <- factor(x)
> nchar(x)
[1] 1 1 1 1
>
> nchar(as.character(x))
[1] 5 5 5 5
thanks.
有人能解释一下这里到底发生了什么吗?当一个变量被编码为因子(factor)并且nchar强制转换为字符时,为什么该函数不能有效地计算字符数?
> x <- c("73210", "73458", "73215", "72350")
> nchar(x)
[1] 5 5 5 5
>
> x <- factor(x)
> nchar(x)
[1] 1 1 1 1
>
> nchar(as.character(x))
[1] 5 5 5 5
thanks.
因为有因素,你的数据会被表示为1、2等。你想做的是计算级别的字符数:
> nchar(levels(x)[x])
[1] 5 5 5 5
?factor
的警告部分:The interpretation of a factor depends on both the codes and the
‘"levels"’ attribute. Be careful only to compare factors with the
same set of levels (in the same order). In particular,
‘as.numeric’ applied to a factor is meaningless, and may happen by
implicit coercion. To transform a factor ‘f’ to approximately its
original numeric values, ‘as.numeric(levels(f))[f]’ is recommended
and slightly more efficient than ‘as.numeric(as.character(f))’.
nchar(levels(x))
str_length
函数避免了这个烦人的错误(以及令人讨厌的NA行为)。 - hadley