我从许多文件夹中读取小文本文件并将它们存储到列表中。因此,我有一个长度为n的列表,其中包含2个
以下是列表中第3个元素的示例(请参阅问题末尾的dput)。
data.frames
。以下是列表中第3个元素的示例(请参阅问题末尾的dput)。
ip_list[[3]]
$`dc:a6:32:2d:b6:c4`
# A tibble: 2 x 1
X1
<chr>
1 MAC address is: dc:a6:32:2d:b6:c4
2 IP is: 18.21.162.74
$`dc:a6:32:2d:b6:c4_running`
# A tibble: 1 x 1
datetime
<dttm>
1 2020-03-13 19:11:07
我的目标是将列表转换为带有n台机器和3列(mac、ip、datetime)的数据框。我已经使用了可能有些繁琐的方式来完成这个任务:
n_machines <- length(ip_list)
# first element will be the mac and ip
df <- lapply(1:n_machines,
function (xx) as.data.frame(t(ip_list[[xx]][[1]]),
stringsAsFactors = FALSE)) %>%
bind_rows() %>%
# now clean
rename(mac = V1, ip = V2) %>%
mutate(mac = str_remove(mac, "MAC address is: "),
ip = str_remove(ip, "IP is: "))
# second element will be running time
running_time <- lapply(1:n_machines,
function (xx) as.data.frame(t(ip_list[[xx]][[2]]),
stringsAsFactors = FALSE)) %>%
bind_rows() %>%
rename(datetime = V1)
# join stuff (order should be kept)
df <- bind_cols(df, running_time)
这将产生预期的结果:
df
mac ip datetime
1 dc:a6:32:21:59:2b 18.21.129.94
2 dc:a6:32:2d:8c:ca 18.21.171.210
3 dc:a6:32:2d:b6:c4 18.21.162.74 2020-03-13 19:11:07
4 dc:a6:32:2d:b8:62 18.21.178.96
问题:有没有更好的方法?我感觉应该有一种方法来完成这个任务。特别是:
- 依赖元素的顺序可能会隐藏问题(与其依赖1:n的顺序,我更希望有一种通过MAC地址合并的方法)
- 整个lapply的过程虽然能够完成任务,但我感觉它很难调试(如果n_machines = 0,则肯定会出问题)
- 使用
as.data.frame(t(...))
非常麻烦,而且我会丢失名称(之后必须重新分配),但我找不到pivot_wider
或类似方法。
这里是一个小的dput示例:
dput(ip_list[1:4])
list(list(`dc:a6:32:21:59:2b` = structure(list(X1 = c("MAC address is: dc:a6:32:21:59:2b",
"IP is: 18.21.129.94")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -2L), spec = structure(list(
cols = list(X1 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 0), class = "col_spec")), `dc:a6:32:21:59:2b_running` = structure(list(
datetime = ""), class = "data.frame", row.names = c(NA, -1L
))), list(`dc:a6:32:2d:8c:ca` = structure(list(X1 = c("MAC address is: dc:a6:32:2d:8c:ca",
"IP is: 18.21.171.210")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -2L), spec = structure(list(
cols = list(X1 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 0), class = "col_spec")), `dc:a6:32:2d:8c:ca_running` = structure(list(
datetime = ""), class = "data.frame", row.names = c(NA, -1L
))), list(`dc:a6:32:2d:b6:c4` = structure(list(X1 = c("MAC address is: dc:a6:32:2d:b6:c4",
"IP is: 18.21.162.74")), class = c("spec_tbl_df", "tbl_df", "tbl",
"data.frame"), row.names = c(NA, -2L), spec = structure(list(
cols = list(X1 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 0), class = "col_spec")), `dc:a6:32:2d:b6:c4_running` = structure(list(
datetime = structure(1584126667.65542, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -1L), spec = structure(list(
cols = list(datetime = structure(list(format = ""), class = c("collector_datetime",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 0), class = "col_spec"))), list(`dc:a6:32:2d:b8:62` = structure(list(
X1 = c("MAC address is: dc:a6:32:2d:b8:62", "IP is: 18.21.178.96"
)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), spec = structure(list(cols = list(
X1 = structure(list(), class = c("collector_character", "collector"
))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 0), class = "col_spec")), `dc:a6:32:2d:b8:62_running` = structure(list(
datetime = ""), class = "data.frame", row.names = c(NA, -1L
))))
更新
这是一个与 dplyr
相关的问题,它在我升级到开发版(目前版本为 0.8.99.9002)时得到了解决。
两个回答都提供了预期的结果,并且我认为它们都是改进。我认为被接受的答案更易于阅读,但这是高度主观的。我的唯一担忧是,我的以前的 lapply
选项随着时间的推移可能会变得相当稳定,而 purrr
/dplyr
则经常会被淘汰。
map_dfr()
命令时遇到了Error: Argument 2 must be length 2, not 1
的错误提示,可能是因为我使用的purrr
版本过旧(0.3.3)? - Matias Andinapurrr
版本。 - akrunmap(ip_list, ~ .x %>% bind_cols %>% mutate(datetime = ymd_hms(datetime)))
时遇到了错误? - akrundplyr
的问题。例如,map(ip_list, bind_cols)
对我有效。 - akrunbind_cols(ip_list [ [1]])
时,是否有任何错误?我发现其中一个数据集是data.frame
,另一个是tibble
,但这不应该有任何问题。 - akrun