如何基于另一列的字符,在字符串中查找匹配项?

4

我有这些数据:

df1 <- data.frame(matrix(, nrow=2, ncol=2))
colnames(df1) <- c("ca", "ea")
df1$ca <- c("A=C,T=G", "T=C,G=G")
df1$ea <- c("G", "T")

我想创建一个名为“match”的新列,该列给出与“ea”列中相同字母对应的“ca”列中的字母。因此,我的输出将如下所示:

df1 <- data.frame(matrix(, nrow=2, ncol=2))
colnames(df1) <- c("ca", "ea")
df1$ca <- c("A=C,T=G", "T=C,G=G")
df1$ea <- c("G", "T")
df1$match <- c("T", "C")

这很棘手,因为第一次出现想匹配的字母在"="之后,而第二次出现在其之前。

4个回答

1
我相信这个方法适用于你:
df1 <- data.frame(matrix(, nrow=2, ncol=2))
colnames(df1) <- c("ca", "ea")
df1$ca <- c("A=C,T=G", "T=C,G=G")
df1$ea <- c("G", "T")

my_f <- function(x) {
  my_pattern <- paste("[ACGT]=", df1[x, "ea"], "|", df1[x, "ea"], "=[ACGT]", sep 
= "")
  my_a <- str_extract_all(string = df1[x, "ca"], pattern = my_pattern, simplify = TRUE)
  my_pattern <- paste(df1[x, "ea"], "|=", sep = "")
  my_a <- gsub(pattern = my_pattern, replacement = "", x = my_a)
  return (my_a)
}
df1$match <- lapply(1:nrow(df1), my_f)

1

这里有另一个使用正则表达式可能更简单的tidyverse解决方案。如果您的 R 版本小于 4.0,则需要使用 %>% 替代 |> 管道操作符。

library(tidyverse)

df1 |>
  # add a named match column as an extracted string by the following 
  # two possible patterns
  mutate(match = str_extract(ca, 
                             # Search for the letter preceded by ea=
                             paste0(paste0("(?<=",ea,"\\=)","[A-Z]"),
                                    # or
                                    "|",
                                    # search for the letter followed by =ea
                                    paste0("[A-Z]","(?=\\=",ea,")"))))

#        ca ea match
# 1 A=C,T=G  G     T
# 2 T=C,G=G  T     C

1

对于一个简单的分支重置组来说,这是一个很好的机会。

df1 <- data.frame(matrix(, nrow=2, ncol=2))
colnames(df1) <- c("ca", "ea")
df1$ca <- c("A=C,T=G", "T=C,G=G")
df1$ea <- c("G", "T")
df1$match <- c("T", "C")

mapply(
  function(p, x)
  gsub(sprintf('(?|%s=(.)|(.)=%s)|.', p, p), '\\1', x, perl = TRUE),
  df1$ea, df1$ca, USE.NAMES = FALSE
)
# [1] "T" "C"

0
library(tidyverse)

df1 <- data.frame(matrix(, nrow = 2, ncol = 2))
colnames(df1) <- c("ca", "ea")
df1$ca <- c("A=C,T=G", "T=C,G=G")
df1$ea <- c("G", "T")
df1
#>        ca ea
#> 1 A=C,T=G  G
#> 2 T=C,G=G  T

df1 %>%
  mutate(
    match = ea %>% map2_chr(ca, function(ea, ca) {
      ca %>%
        str_split(",") %>%
        simplify() %>%
        keep(~ str_detect(.x, ea)) %>%
        str_remove_all(str_glue("[=|{ea}]"))
    })
  )
#>        ca ea match
#> 1 A=C,T=G  G     T
#> 2 T=C,G=G  T     C

reprex package (v2.0.1)于2021年12月08日创建


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接