如何在R语言中选择每行的最后n个值?

3
我想从我的数据框中选择每行最后的三个非空值:
df <- structure(list(V1 = c("Johannes Gutenberg University of Mainz", 
"Eldagsener Str. 38", "Linneper Weg 1", "Gohrstraße 74", "Düppelstraße 36", 
"Blutspende: Haus A3"), V2 = c(" Gebäude 900", " 31832 Springe", 
" 40885 Ratingen", " 47475 Kamp-Lintfort", " 12163 Berlin", " Ebene -3"
), V3 = c(" Augustuspl. 4", " Germany", " Germany", " Germany", 
" Germany", " Zentrum Innere Medizin (ZIM Blutbank / Immunhämatologisches Labor Haus A1"
), V4 = c(" 55131 Mainz", "", "", "", "", " Zentrum Operative Medizin (ZOM"
), V5 = c(" Germany", "", "", "", "", " Oberdürrbacher Str. 6"
), V6 = c("", "", "", "", "", " 97080 Würzburg"), V7 = c("", 
"", "", "", "", " Germany")), row.names = 24:29, class = "data.frame")

有时开头会有不相关的文本信息。而在结尾处可能会出现空单元格。每行中关键信息总是最后3个非空条目。 我希望使用tidyverse,但也欢迎其他解决方案。

enter image description here


你想要你的结果以什么格式呈现? - zephryl
提取了3列后的数据框。 - Marco
4个回答

2
使用tidyr和dplyr:
library(dplyr)
library(tidyr)

df %>% 
  mutate(row = row_number()) %>%
  pivot_longer(!row) %>%
  filter(value != "") %>%
  group_by(row) %>%
  slice_tail(n = 3) %>%
  mutate(name = paste0("V", 1:3)) %>%
  ungroup() %>%
  pivot_wider()

# A tibble: 6 × 4
    row V1                       V2                     V3        
  <int> <chr>                    <chr>                  <chr>     
1     1 " Augustuspl. 4"         " 55131 Mainz"         " Germany"
2     2 "Eldagsener Str. 38"     " 31832 Springe"       " Germany"
3     3 "Linneper Weg 1"         " 40885 Ratingen"      " Germany"
4     4 "Gohrstraße 74"          " 47475 Kamp-Lintfort" " Germany"
5     5 "Düppelstraße 36"        " 12163 Berlin"        " Germany"
6     6 " Oberdürrbacher Str. 6" " 97080 Würzburg"      " Germany"

2

按行循环 - apply,去除空格,获取最后3个值:

data.frame(t(apply(df, 1, function(i){ tail(i[ i != "" ], 3) })))
#                        X1                   X2       X3
# 24          Augustuspl. 4          55131 Mainz  Germany
# 25     Eldagsener Str. 38        31832 Springe  Germany
# 26         Linneper Weg 1       40885 Ratingen  Germany
# 27          Gohrstraße 74  47475 Kamp-Lintfort  Germany
# 28        Düppelstraße 36         12163 Berlin  Germany
# 29  Oberdürrbacher Str. 6       97080 Würzburg  Germany

注意:如果值之间有空格,则会被删除,列可能不对齐,例如,请比较第一行:

df[1, 4] <- ""
data.frame(t(apply(df, 1, function(i){ tail(i[ i != "" ], 3)})))
#                        X1                   X2       X3
# 24            Gebäude 900        Augustuspl. 4  Germany
# 25     Eldagsener Str. 38        31832 Springe  Germany
# 26         Linneper Weg 1       40885 Ratingen  Germany
# 27          Gohrstraße 74  47475 Kamp-Lintfort  Germany
# 28        Düppelstraße 36         12163 Berlin  Germany
# 29  Oberdürrbacher Str. 6       97080 Würzburg  Germany

2
library(tidyverse)

df |> 
  mutate(rn = row_number()) |> 
  pivot_longer(cols = V1:V7) |> 
  mutate(isem = value !="") |>
  filter(isem) |> 
  group_by(rn) |> 
  slice_tail(n=3) |> 
  select(-name, -isem) |> 
  mutate(rn = 1:3) |> 
  pivot_wider(names_from = rn, values_from = value) |> 
  unnest()

#> # A tibble: 6 × 3
#>   `1`                      `2`                    `3`       
#>   <chr>                    <chr>                  <chr>     
#> 1 " Augustuspl. 4"         " 55131 Mainz"         " Germany"
#> 2 "Eldagsener Str. 38"     " 31832 Springe"       " Germany"
#> 3 "Linneper Weg 1"         " 40885 Ratingen"      " Germany"
#> 4 "Gohrstraße 74"          " 47475 Kamp-Lintfort" " Germany"
#> 5 "Düppelstraße 36"        " 12163 Berlin"        " Germany"
#> 6 " Oberdürrbacher Str. 6" " 97080 Würzburg"      " Germany"

0
假设你的最后三列的名称像示例中一样分别为“V5”、“V6”和“V7”,你可以在一行命令中使用filter
 filter(df,V5!= "",V6 != "",V7 != "")

如果你只需要这些列,你可以这样做

df |> 
  select(V5:V7) |> 
  filter(V5!= "",V6 != "",V7 != "")

嘿,“每行的最后3个值”意味着测试数据中第二行的“前3个值”。我需要一种更灵活的方法来选择每行的最后n个值。对于大多数情况,前3列中包含信息,但有些情况除外。在此数据中限制为7个。 - Marco

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接