如何将一个因子数据框转换为数值型？

Question

如何将一个因子数据框转换为数值型？

9

我有一个包含所有因子值的数据框

V1 V2 V3
 a  b  c
 c  b  a
 c  b  c
 b  b  a

如何将数据框中的所有值转换为新的数字值（a变为1，b变为2，c变为3等）

- mamatv

3个回答

10

将factor转换为numeric会给出整数值。但是，如果factor列的级别被指定为c（'b'，'a'，'c'，'d'）或c（'c'，'b'，'a'），则整数值将按照该顺序排列。为了避免这种情况，我们可以通过再次调用factor来指定levels（更安全）。

df1[] <- lapply(df1, function(x) 
                as.numeric(factor(x, levels=letters[1:3])))

如果我们正在使用 data.table，一种选择是使用 set。对于大型数据集来说，这将更加高效。转换为 matrix 可能会导致内存问题。

library(data.table)
setDT(df1)
for(j in seq_along(df1)){
 set(df1, i=NULL, j=j, 
     value= as.numeric(factor(df1[[j]], levels= letters[1:3])))
 }

- akrun

我很好奇：df1[] <- ...与df1 <- ...有什么不同？我认为它们最终会得出相同的结果，但可能是通过不同的路径实现的？ - atiretoo

@atiretoo 它保留了原始数据集中的结构。 - akrun

1

啊哈！谢谢，特别是df1仍将是一个数据框。 - atiretoo

5

这种方法与Ananda的类似，但使用了unlist()而不是factor(as.matrix())。由于所有列已经是因子，unlist()将把它们合并成一个具有适当级别的因子向量。

因此，让我们看一下当我们unlist()你的数据框会发生什么。

unlist(df, use.names = FALSE)
#  [1] a c c b b b b b c a c a
# Levels: a b c

现在我们可以简单地在上面的代码中运行as.integer()（或c()），因为因子的整数值与您想要的映射匹配。因此，以下内容将重新设置您的整个数据框。

df[] <- as.integer(unlist(df, use.names = FALSE))
## note that you can also just drop the factor class with c()
## df[] <- c(unlist(df, use.names = FALSE))
df
#   V1 V2 V3
# 1  1  2  3
# 2  3  2  1
# 3  3  2  3
# 4  2  2  1

注意：不需要使用use.names = FALSE。但是，删除名称属性会比保留更有效率。

数据：

df <- structure(list(V1 = structure(c(1L, 3L, 3L, 2L), .Label = c("a", 
"b", "c"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L
), .Label = "b", class = "factor"), V3 = structure(c(2L, 1L, 
2L, 1L), .Label = c("a", "c"), class = "factor")), .Names = c("V1", 
"V2", "V3"), class = "data.frame", row.names = c(NA, -4L))

- Rich Scriven

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

12

我会尝试：

> mydf[] <- as.numeric(factor(as.matrix(mydf)))
> mydf
  V1 V2 V3
1  1  2  3
2  3  2  1
3  3  2  3
4  2  2  1

- A5C1D2H2I1M1N2O1R2T1

你能解释一下为什么一个简单的 apply(mydf, 2, as.numeric) 不起作用吗？ - Masclins

@AlbertMasclans，请读取“详情”部分的第一行，其中涉及apply函数。 apply函数首先对data.frame执行as.matrix操作（将所有内容转换为字符型）。如果你直接在character向量上使用as.numeric，结果会得到一堆NA值。 - A5C1D2H2I1M1N2O1R2T1