将字符串中的A1-A9替换为A01-A09等等。

Question

将字符串中的A1-A9替换为A01-A09等等。

11

嗨，我的数据中有以下字符串，我想将A1-A9替换为A01-A09，将B1-B9替换为B01-B09，但保留数字>=10。

你好，我有一些字符串在数据中，希望将A1-A9替换为A01-A09，将B1-B9替换为B01-B09，但是要保留数字>=10。

rep_data=data.frame(Str= c("A1B10", "A2B3", "A11B1", "A5B10"))

    Str
1 A1B10
2  A2B3
3 A11B1
4 A5B10

这里有一个类似的帖子（链接），但我的问题有些不同！我在str_replace中没有看到类似的示例。

如果您知道解决方案，我将非常高兴。

期望输出

Str
1 A01B10
2 A02B03
3 A11B01
4 A05B10

- Alexander

{btsdaf} - Wiktor Stribiżew

not necessarily! - Alexander

7个回答

3

这样怎么样？

num_pad <- function(x) {
  x <- as.character(x)
  mm <- gregexpr("\\d+|\\D+",x)  
  parts <- regmatches(x, mm)
  pad_number <- function(x) {
    nn<-suppressWarnings(as.numeric(x))
    x[!is.na(nn)] <- sprintf("%02d", nn[!is.na(nn)])
    x
  }
  parts <- lapply(parts, pad_number)
  sapply(parts, paste0, collapse="")
}


num_pad(rep_data$Str)
# [1] "A01B10" "A02B03" "A11B01" "A05B10"

基本上，我们使用正则表达式将字符串分成数字和非数字组。然后找到那些看起来像数字的值，并使用sprintf()将它们零填充为2个字符。然后我们将填充后的值插入向量中，并将所有内容粘贴在一起。

- MrFlick

2

没有彻底检查

x = c("A1B10", "A2B3", "A11B1", "A5B10")
sapply(strsplit(x, ""), function(s){
    paste(sapply(split(s, cumsum(s %in% LETTERS)), function(a){
        if(length(a) == 2){
            a[2] = paste0(0, a[2])
        }
        paste(a, collapse = "")
    }), collapse = "")
})
#[1] "A01B10" "A02B03" "A11B01" "A05B10"

- d.b

2

tidyverse和stringr提供的解决方案。

library(tidyverse)
library(stringr)

rep_data2 <- rep_data %>%
  extract(Str, into = c("L1", "N1", "L2", "N2"), regex = "(A)(\\d+)(B)(\\d+)") %>%
  mutate_at(vars(starts_with("N")), funs(str_pad(., width = 2, pad = "0"))) %>%
  unite(Str, everything(), sep = "")
rep_data2
     Str
1 A01B10
2 A02B03
3 A11B01
4 A05B10

- www

2

这是我能想到的最简洁的整洁解决方案：

library(tidyverse)
library(stringr)

rep_data %>%
  mutate(
    num_1 = str_match(Str, "A([0-9]+)")[, 2],
    num_2 = str_match(Str, "B([0-9]+)")[, 2],
    num_1 = str_pad(num_1, width = 2, side = "left", pad = "0"),
    num_2 = str_pad(num_2, width = 2, side = "left", pad = "0"),
    Str = str_c("A", num_1, "B", num_2)
  ) %>%
  select(- num_1, - num_2)

- Stijn

2

与 @Mike 的答案有些相似，但是这个解决方案使用了一个正向先行断言：

gsub("(\\D)(?=\\d(\\D|\\b))", "\\10", rep_data$Str, perl = TRUE)
# [1] "A01B10" "A02B03" "A11B01" "A05B10"

使用 tidyverse：

library(dplyr)
library(stringr)

rep_data %>%
  mutate(Str = str_replace_all(Str, "(\\D)(?=\\d(\\D|\\b))", "\\10"))

#      Str
# 1 A01B10
# 2 A02B03
# 3 A11B01
# 4 A05B10

这个正则表达式匹配所有非数字，它们后面跟着一个数字，接着是另一个非数字或单词边界。 \\10 看起来像是替换为第十个捕获组，但实际上，它将被第一捕获组加上一个零来代替。

- acylam

1

这里有一个使用 gsubfn 的选项。

library(gsubfn)
gsubfn("(\\d+)", ~sprintf("%02d", as.numeric(x)), as.character(rep_data$Str))
#[1] "A01B10" "A02B03" "A11B01" "A05B10"

- akrun

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike H. · Accepted Answer

我认为这应该可以满足你的需求：

gsub("(?<![0-9])([0-9])(?![0-9])", "0\\1", rep_data$Str, perl = TRUE)
#[1] "A01B10" "A02B03" "A11B01" "A05B10"

它使用PCRE的前瞻/后顾来匹配1位数字，然后将“0”粘贴到它上面。