通过逻辑索引向量对列表进行子集化

5

我有一个复杂的列表,需要根据布尔元素的值(我需要具有hidden值等于FALSE的记录)从中选择子集。 我尝试了以下代码,基于索引向量,但它失败了(如本输出末尾所示):

startups <- data$startups[data$startups$hidden == FALSE]

或者,另一种选择是:
startups <- data$startups[!as.logical(data$startups$hidden)]

交互式的 R 会话证明数据已经存在:

Browse[1]> str(data$startups, list.len=3)
List of 50
 $ :List of 23
  ..$ id               : num 357496
  ..$ hidden           : logi FALSE
  ..$ community_profile: logi FALSE
  .. [list output truncated]
 $ :List of 2
  ..$ id    : num 352159
  ..$ hidden: logi TRUE
 $ :List of 2
  ..$ id    : num 352157
  ..$ hidden: logi TRUE
  [list output truncated]

Browse[1]> data$startups[data$startups$hidden == FALSE]
list()

Browse[1]> data$startups[!as.logical(data$startups$hidden)]
list()

我的代码有什么问题?

更新(希望包含可重现的示例,对于复杂结构表示歉意)

aa <- dput(head(data$startups, n=3))

会生成以下输出:

list(structure(list(id = 386938, hidden = FALSE, community_profile = FALSE, 
    name = "Pritunl", angellist_url = "https://angel.co/pritunl", 
    logo_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-medium_jpg.jpg?buster=1398401450", 
    thumb_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-thumb_jpg.jpg?buster=1398401450", 
    quality = 0, product_desc = "Enterprise VPN/cloud networking server", 
    high_concept = "Enterprise cloud networking", follower_count = 1, 
    company_url = "http://pritunl.com", created_at = "2014-04-25T04:50:57Z", 
    updated_at = "2014-04-25T06:02:05Z", crunchbase_url = NULL, 
    twitter_url = "http://twitter.com/pritunl", blog_url = "", 
    video_url = "", markets = list(structure(list(id = 12, tag_type = "MarketTag", 
        name = "enterprise software", display_name = "Enterprise Software", 
        angellist_url = "https://angel.co/enterprise-software"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 59, tag_type = "MarketTag", name = "open source", 
        display_name = "Open Source", angellist_url = "https://angel.co/open-source"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 123, tag_type = "MarketTag", name = "internet infrastructure", 
        display_name = "Internet Infrastructure", angellist_url = "https://angel.co/internet-infrastructure"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 306, tag_type = "MarketTag", name = "cloud management", 
        display_name = "Cloud Management", angellist_url = "https://angel.co/cloud-management"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url"))), locations = list(
        structure(list(id = 2071, tag_type = "LocationTag", name = "new york", 
            display_name = "New York", angellist_url = "https://angel.co/new-york"), .Names = c("id", 
        "tag_type", "name", "display_name", "angellist_url"))), 
    company_size = "1-10", company_type = list(structure(list(
        id = 94212, tag_type = "CompanyTypeTag", name = "startup", 
        display_name = "Startup", angellist_url = "https://angel.co/startup"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url"))), status = NULL, 
    screenshots = list(structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-original.png"), .Names = c("thumb", 
    "original")), structure(list(thumb = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-thumb_jpg.jpg", 
        original = "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-original.png"), .Names = c("thumb", 
    "original")))), .Names = c("id", "hidden", "community_profile", 
"name", "angellist_url", "logo_url", "thumb_url", "quality", 
"product_desc", "high_concept", "follower_count", "company_url", 
"created_at", "updated_at", "crunchbase_url", "twitter_url", 
"blog_url", "video_url", "markets", "locations", "company_size", 
"company_type", "status", "screenshots")), structure(list(id = 385596, 
    hidden = FALSE, community_profile = TRUE, name = "Lariat ", 
    angellist_url = "https://angel.co/lariat-1", logo_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-medium_jpg.jpg?buster=1398260121", 
    thumb_url = "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-thumb_jpg.jpg?buster=1398260121", 
    quality = 0, product_desc = "Thus far, the internet has gone from discovery to search discovery, and then social discovery, but with little focus on recall. Remembering your digital footprint is difficult. We aim to solve that problem. Lariat is a cloud-based recall engine to securely recall information from any page in your search history instantly through intuitive keyword search, not just from page titles, but from the contents and context of the underlying pages.\r\n\r\nWrangle in the information you want, easier and faster.", 
    high_concept = "Recall your digital footprint on the web instantly", 
    follower_count = 1, company_url = "http://www.lariattech.com", 
    created_at = "2014-04-23T13:17:47Z", updated_at = "2014-04-23T13:48:38Z", 
    crunchbase_url = NULL, twitter_url = "", blog_url = "", video_url = NULL, 
    markets = list(structure(list(id = 4, tag_type = "MarketTag", 
        name = "digital media", display_name = "Digital Media", 
        angellist_url = "https://angel.co/digital-media"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 12, tag_type = "MarketTag", name = "enterprise software", 
        display_name = "Enterprise Software", angellist_url = "https://angel.co/enterprise-software"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 59, tag_type = "MarketTag", name = "open source", 
        display_name = "Open Source", angellist_url = "https://angel.co/open-source"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url")), structure(list(
        id = 282, tag_type = "MarketTag", name = "semantic search", 
        display_name = "Semantic Search", angellist_url = "https://angel.co/semantic-search"), .Names = c("id", 
    "tag_type", "name", "display_name", "angellist_url"))), locations = list(
        structure(list(id = 1620, tag_type = "LocationTag", name = "boston", 
            display_name = "Boston", angellist_url = "https://angel.co/boston"), .Names = c("id", 
        "tag_type", "name", "display_name", "angellist_url"))), 
    company_size = "1-10", company_type = structure(list(), class = "AsIs"), 
    status = NULL, screenshots = structure(list(), class = "AsIs")), .Names = c("id", 
"hidden", "community_profile", "name", "angellist_url", "logo_url", 
"thumb_url", "quality", "product_desc", "high_concept", "follower_count", 
"company_url", "created_at", "updated_at", "crunchbase_url", 
"twitter_url", "blog_url", "video_url", "markets", "locations", 
"company_size", "company_type", "status", "screenshots")), structure(list(
    id = 385595, hidden = TRUE), .Names = c("id", "hidden")))

同样的(用更易读的形式表示:aa):

[[1]]
[[1]]$id
[1] 386938

[[1]]$hidden
[1] FALSE

[[1]]$community_profile
[1] FALSE

[[1]]$name
[1] "Pritunl"

[[1]]$angellist_url
[1] "https://angel.co/pritunl"

[[1]]$logo_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-medium_jpg.jpg?buster=1398401450"

[[1]]$thumb_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/386938-fac0b8cba76c7e9252eee6646ec5b681-thumb_jpg.jpg?buster=1398401450"

[[1]]$quality
[1] 0

[[1]]$product_desc
[1] "Enterprise VPN/cloud networking server"

[[1]]$high_concept
[1] "Enterprise cloud networking"

[[1]]$follower_count
[1] 1

[[1]]$company_url
[1] "http://pritunl.com"

[[1]]$created_at
[1] "2014-04-25T04:50:57Z"

[[1]]$updated_at
[1] "2014-04-25T06:02:05Z"

[[1]]$crunchbase_url
NULL

[[1]]$twitter_url
[1] "http://twitter.com/pritunl"

[[1]]$blog_url
[1] ""

[[1]]$video_url
[1] ""

[[1]]$markets
[[1]]$markets[[1]]
[[1]]$markets[[1]]$id
[1] 12

[[1]]$markets[[1]]$tag_type
[1] "MarketTag"

[[1]]$markets[[1]]$name
[1] "enterprise software"

[[1]]$markets[[1]]$display_name
[1] "Enterprise Software"

[[1]]$markets[[1]]$angellist_url
[1] "https://angel.co/enterprise-software"


[[1]]$markets[[2]]
[[1]]$markets[[2]]$id
[1] 59

[[1]]$markets[[2]]$tag_type
[1] "MarketTag"

[[1]]$markets[[2]]$name
[1] "open source"

[[1]]$markets[[2]]$display_name
[1] "Open Source"

[[1]]$markets[[2]]$angellist_url
[1] "https://angel.co/open-source"


[[1]]$markets[[3]]
[[1]]$markets[[3]]$id
[1] 123

[[1]]$markets[[3]]$tag_type
[1] "MarketTag"

[[1]]$markets[[3]]$name
[1] "internet infrastructure"

[[1]]$markets[[3]]$display_name
[1] "Internet Infrastructure"

[[1]]$markets[[3]]$angellist_url
[1] "https://angel.co/internet-infrastructure"


[[1]]$markets[[4]]
[[1]]$markets[[4]]$id
[1] 306

[[1]]$markets[[4]]$tag_type
[1] "MarketTag"

[[1]]$markets[[4]]$name
[1] "cloud management"

[[1]]$markets[[4]]$display_name
[1] "Cloud Management"

[[1]]$markets[[4]]$angellist_url
[1] "https://angel.co/cloud-management"



[[1]]$locations
[[1]]$locations[[1]]
[[1]]$locations[[1]]$id
[1] 2071

[[1]]$locations[[1]]$tag_type
[1] "LocationTag"

[[1]]$locations[[1]]$name
[1] "new york"

[[1]]$locations[[1]]$display_name
[1] "New York"

[[1]]$locations[[1]]$angellist_url
[1] "https://angel.co/new-york"



[[1]]$company_size
[1] "1-10"

[[1]]$company_type
[[1]]$company_type[[1]]
[[1]]$company_type[[1]]$id
[1] 94212

[[1]]$company_type[[1]]$tag_type
[1] "CompanyTypeTag"

[[1]]$company_type[[1]]$name
[1] "startup"

[[1]]$company_type[[1]]$display_name
[1] "Startup"

[[1]]$company_type[[1]]$angellist_url
[1] "https://angel.co/startup"



[[1]]$status
NULL

[[1]]$screenshots
[[1]]$screenshots[[1]]
[[1]]$screenshots[[1]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-thumb_jpg.jpg"

[[1]]$screenshots[[1]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/5f7410543201d583eaba1975b931f3fd-original.png"


[[1]]$screenshots[[2]]
[[1]]$screenshots[[2]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-thumb_jpg.jpg"

[[1]]$screenshots[[2]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/006c4fb50d4b10df7caf7800ee482c6b-original.png"


[[1]]$screenshots[[3]]
[[1]]$screenshots[[3]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-thumb_jpg.jpg"

[[1]]$screenshots[[3]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/741225c3de5021399c0cfc33cecb8830-original.png"


[[1]]$screenshots[[4]]
[[1]]$screenshots[[4]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-thumb_jpg.jpg"

[[1]]$screenshots[[4]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/969b60b6ccda577e77b7c9a5c169b2fd-original.png"


[[1]]$screenshots[[5]]
[[1]]$screenshots[[5]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-thumb_jpg.jpg"

[[1]]$screenshots[[5]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/2b2cc3a046c5a4d20b328045ca7f0254-original.png"


[[1]]$screenshots[[6]]
[[1]]$screenshots[[6]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-thumb_jpg.jpg"

[[1]]$screenshots[[6]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/053c3a1c74fc7f39de1117770f9debef-original.png"


[[1]]$screenshots[[7]]
[[1]]$screenshots[[7]]$thumb
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-thumb_jpg.jpg"

[[1]]$screenshots[[7]]$original
[1] "https://s3.amazonaws.com/screenshots.angel.co/ae/386938/8adcf2d6a6cafc9c6b810f8359a3fedf-original.png"




[[2]]
[[2]]$id
[1] 385596

[[2]]$hidden
[1] FALSE

[[2]]$community_profile
[1] TRUE

[[2]]$name
[1] "Lariat "

[[2]]$angellist_url
[1] "https://angel.co/lariat-1"

[[2]]$logo_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-medium_jpg.jpg?buster=1398260121"

[[2]]$thumb_url
[1] "https://s3.amazonaws.com/photos.angel.co/startups/i/385596-29de05d584176c3972da411aed5485f0-thumb_jpg.jpg?buster=1398260121"

[[2]]$quality
[1] 0

[[2]]$product_desc
[1] "Thus far, the internet has gone from discovery to search discovery, and then social discovery, but with little focus on recall. Remembering your digital footprint is difficult. We aim to solve that problem. Lariat is a cloud-based recall engine to securely recall information from any page in your search history instantly through intuitive keyword search, not just from page titles, but from the contents and context of the underlying pages.\r\n\r\nWrangle in the information you want, easier and faster."

[[2]]$high_concept
[1] "Recall your digital footprint on the web instantly"

[[2]]$follower_count
[1] 1

[[2]]$company_url
[1] "http://www.lariattech.com"

[[2]]$created_at
[1] "2014-04-23T13:17:47Z"

[[2]]$updated_at
[1] "2014-04-23T13:48:38Z"

[[2]]$crunchbase_url
NULL

[[2]]$twitter_url
[1] ""

[[2]]$blog_url
[1] ""

[[2]]$video_url
NULL

[[2]]$markets
[[2]]$markets[[1]]
[[2]]$markets[[1]]$id
[1] 4

[[2]]$markets[[1]]$tag_type
[1] "MarketTag"

[[2]]$markets[[1]]$name
[1] "digital media"

[[2]]$markets[[1]]$display_name
[1] "Digital Media"

[[2]]$markets[[1]]$angellist_url
[1] "https://angel.co/digital-media"


[[2]]$markets[[2]]
[[2]]$markets[[2]]$id
[1] 12

[[2]]$markets[[2]]$tag_type
[1] "MarketTag"

[[2]]$markets[[2]]$name
[1] "enterprise software"

[[2]]$markets[[2]]$display_name
[1] "Enterprise Software"

[[2]]$markets[[2]]$angellist_url
[1] "https://angel.co/enterprise-software"


[[2]]$markets[[3]]
[[2]]$markets[[3]]$id
[1] 59

[[2]]$markets[[3]]$tag_type
[1] "MarketTag"

[[2]]$markets[[3]]$name
[1] "open source"

[[2]]$markets[[3]]$display_name
[1] "Open Source"

[[2]]$markets[[3]]$angellist_url
[1] "https://angel.co/open-source"


[[2]]$markets[[4]]
[[2]]$markets[[4]]$id
[1] 282

[[2]]$markets[[4]]$tag_type
[1] "MarketTag"

[[2]]$markets[[4]]$name
[1] "semantic search"

[[2]]$markets[[4]]$display_name
[1] "Semantic Search"

[[2]]$markets[[4]]$angellist_url
[1] "https://angel.co/semantic-search"



[[2]]$locations
[[2]]$locations[[1]]
[[2]]$locations[[1]]$id
[1] 1620

[[2]]$locations[[1]]$tag_type
[1] "LocationTag"

[[2]]$locations[[1]]$name
[1] "boston"

[[2]]$locations[[1]]$display_name
[1] "Boston"

[[2]]$locations[[1]]$angellist_url
[1] "https://angel.co/boston"



[[2]]$company_size
[1] "1-10"

[[2]]$company_type
list()

[[2]]$status
NULL

[[2]]$screenshots
list()


[[3]]
[[3]]$id
[1] 385595

[[3]]$hidden
[1] TRUE

最后,通过逻辑索引向量应用子集操作:
aa[data$startups$hidden == FALSE]

结果是一个空列表(尽管第1个和第2个元素的hidden属性为FALSE):
list()

很抱歉输出的大小,但我必须保留列表的结构。

注意事项:

根据R项目的“R简介” (http://cran.r-project.org/doc/manuals/R-intro.html#Index-vectors),

“通过在向量名称后面添加带方括号的索引向量,可以选择向量的元素子集。更一般地说,任何计算结果为向量的表达式都可以通过在该表达式后立即添加带方括号的索引向量来选择其元素的子集。”

同时,根据Hadley Wickham的“高级R” (http://adv-r.had.co.nz/Subsetting.html),

“对列表进行子集操作的方式与对原子向量进行子集操作的方式完全相同。”


@Roland:我按照你的建议操作了,但仍然不确定是什么导致了错误。我认为我可能需要使用我的原始代码来引用整个列表,同时通过索引向量进行子集化:data$startups[data$startups$hidden == FALSE]。然而,这返回了一个空列表。 - Aleksandr Blekh
@Roland:我在以前版本的代码中使用了“ [”符号:https://github.com/abnova/diss-floss/blob/master/import/getAngelListData.R,第78行。你为什么要投票反对我的问题?难道你是在惩罚初学者R用户因为他们缺乏经验和短暂的混乱期吗? - Aleksandr Blekh
请构建一个最小可重现的示例。这将需要展示 data 的精简版本的 dput(data) 结果。 - G. Grothendieck
@G.Grothendieck:好的,我会尽力而为。我提供真实数据收集会话输出的原因是我认为问题可能与真实数据有关。我将尝试提供可重现的示例。 - Aleksandr Blekh
@G.Grothendieck:抱歉耽搁了,网站处于只读模式,然后我也离线了。最终,我能够使用可重现的示例更新我的问题。 - Aleksandr Blekh
显示剩余2条评论
2个回答

7
问题中的示例数据是长度为3的列表,我们将其称为L。它的每个组件本身都是一个列表,每个子列表的一个组件是hidden。我们可以将子列表的hidden组件提取到逻辑向量hidden中。使用该逻辑向量,我们可以对原始列表进行子集操作,得到一个仅包含那些hidden组件为TRUE的子列表的新列表。
hidden <- sapply(L, "[[", "hidden") # create logical vector hidden
L[hidden]

针对提供的数据,我们得到一个只有一个组件的列表:

> length(L[hidden])
[1] 1

如果我们知道只有一个组件,那么L[hidden][[1]]L[[which(hidden)]]将返回该单个组件。


1
使用提供的数据进行子集操作会得到一个长度为1的列表,而不是空列表。我已经添加了示例来说明这一点。 - G. Grothendieck
1
就L而言,问题中提供的代码是L[L$hidden],但是L没有名为hidden的组件;因此,L$hiddenNULL。事实上,L[[1]]L[[2]]L[[3]]各自具有一个hidden组件与L本身无关。 - G. Grothendieck
嗯...我的原始代码似乎与您在先前评论中的完全一样:startups <- data$startups[data$startups$hidden == FALSE]。这里将L替换为data$startups - Aleksandr Blekh
1
问题中的代码等同于 L[L$hidden],但答案是 L[hidden],其中 hiddensapply 的结果。这两者是不同的。如果你想要所有未隐藏的内容,则应该使用 L[!hidden] - G. Grothendieck
我现在明白了。我以为我可以使用这个语义,然后R会理解它并自动构建逻辑索引向量。我想R的魔力也有限:-)。再次感谢! - Aleksandr Blekh
显示剩余3条评论

1
数据框使用两个数字进行索引。如果只想选择行,需要执行以下操作:
data$startups[data$startups$hidden == FALSE, ]

data$startups 不是一个数据框,而是一个列表data <- jsonlite::fromJSON(startupData, simplifyVector = FALSE)。还请参见我问题中的 str() 输出。 - Aleksandr Blekh

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接