使用R从数据框中选取表格列的子集

Question

使用R从数据框中选取表格列的子集

3

下面的数据框中的一列（new）是一个表格。

#dput(head(df1))
structure(list(a = c(1, 2, 3, 4, 5, 7), b = c(2, 3, 3, 5, 5, 
7), c = c(1, 3, 2, 4, 5, 7), new = list(structure(2:1, .Dim = 2L, .Dimnames = structure(list(
    c("1", "2")), .Names = ""), class = "table"), structure(1:2, .Dim = 2L, .Dimnames = structure(list(
    c("2", "3")), .Names = ""), class = "table"), structure(1:2, .Dim = 2L, .Dimnames = structure(list(
    c("2", "3")), .Names = ""), class = "table"), structure(2:1, .Dim = 2L, .Dimnames = structure(list(
    c("4", "5")), .Names = ""), class = "table"), structure(c(`5` = 3L), .Dim = 1L, .Dimnames = structure(list(
    "5"), .Names = ""), class = "table"), structure(c(`7` = 3L), .Dim = 1L, .Dimnames = structure(list(
    "7"), .Names = ""), class = "table"))), row.names = c(NA, 
6L), class = "data.frame")

new列是使用apply(df1, 1, table)得到的结果。对new列进行子集提取的一个示例是使用df1[4, "new"][[1]]，输出如下。

df1[4, "new"][[1]]

#4 5 --> Vals
#2 1 --> Freq

我希望制定一个条件，例如给我所有的Vals，其中new列中的Freq大于等于某个条件，并将其用于对new列进行子集划分。

以下是一个示例和我迄今为止所做的。

df1[4, "new"][[1]][]>=2
#    4     5 
# TRUE FALSE 

# Subsetting using the above logical
as.integer(names(df1[4, "new"][[1]][df1[4, "new"][[1]][]>=2]))
#[1] 4

结果符合我的预期。但它过于冗长，如果有更短的版本（目前这不是紧急问题，但如果您能让我学会写出简洁明了的代码，我将不胜感激及欣喜），我会很高兴。

我面临的紧急问题是如何修改条件 as.integer(names(df1[4, "new"][[1]][df1[4, "new"][[1]][]>=2])) 并将其应用到整个列中。例如，对于条件列 new == 3，期望输出为 5 和 7。

我看过类似的帖子here和 here，但没能解决如何将子集条件应用于一个表格列的问题。

谢谢。

- deepseefan

你能确定你想要返回什么吗？如果条件是>3，你想要返回最后两行吗？还是只包含5和7的向量？如果条件是>=2呢？你会返回所有行吗？ - Calum You

谢谢。我想要的输出只有 5 和 7；当条件为 >=2 时，将是满足条件的 new 列中的特定值（names）。 - deepseefan

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- jay.sf · Accepted Answer

调查对象（即列）的 class 将返回 "list"。

class(df1$new)
# [1] "list"

通常我们使用例如lapply()函数将一个函数应用于列表中的元素。如果我们想要得到向量或矩阵而不是列表作为结果，我们可以尝试使用sapply。

因此，请定义您的条件，

COND <- 2

您可以在 sapply 中使用您的函数：

sapply(df1$new, function(x) as.numeric(names(x[x >= COND])))
# [1] 1 3 3 4 5 7