在R中将字符串拆分为不同长度的子字符串

Question

在R中将字符串拆分为不同长度的子字符串

3

我已经阅读了类似的主题，但是我的子字符串长度不同（分别为9、3、5个字符），因此没有找到任何答案。

我需要将长度为17个字符的字符串拆分成三个子字符串，其中第一个子字符串长度为9，下一个长度为3，最后一个长度为5个字符。

例如：

 N12345671004UN005
 N34567892902UN002

我希望将字符串分成三列：

第一列长度为9个字符。

"N12345671"      
"N34567892"

第二列3个字符长度

"004"          
"902"

第三列 5 个字符长度

"UN005"  
"UN002"

- zone1

2

?substr会帮助你。 - mts

2个回答

4

instr = c("N12345671004UN005", "N34567892902UN002")
out1 = substr(instr, 1, 9)
out2 = substr(instr, 10, 12)
out3 = substr(instr, 13, 17)

- mts

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- akrun · Accepted Answer

您可以尝试使用read.fwf函数并指定widths参数来实现。

ff <- tempfile()
cat(file=ff, instr, sep='\n')
read.fwf(ff, widths=c(9,3,5), colClasses=rep('character', 3))
#        V1  V2    V3
#1 N12345671 004 UN005
#2 N34567892 902 UN002

或者使用 tidyr/dplyr。

library(dplyr)
library(tidyr)
as.data.frame(instr) %>%
       extract(instr, into=paste0('V', 1:3), '(.{9})(.{3})(.{5})')
#         V1  V2    V3
#1 N12345671 004 UN005
#2 N34567892 902 UN002

或者结合使用 sub 和 read.table 函数。

read.table(text=sub('(.{9})(.{3})(.{5})', '\\1 \\2 \\3', instr),
              colClasses=rep('character', 3))
#         V1  V2    V3
#1 N12345671 004 UN005 
#2 N34567892 902 UN002

数据

instr = c("N12345671004UN005", "N34567892902UN002")