在 R 中从多行字符中提取信息

Question

在 R 中从多行字符中提取信息

3

这是我的字符向量：

my_string <- "\n
1. the user first name: Jamie.xx \n
2. the user name: yumi.xx \n
3. the name is: Myrile.xx \n
...
"

正如你所看到的，数据相当随机和不系统化。例如，冒号符号并不总是出现在同样的位置。

我已经尝试使用正则表达式：

y <- gsub("\\:(.)(.*?)\\n","\\1",my_string)

我的期望结果是：

the user first name
the user name
the name is

然而，我所拥有的是：

\n1. the user first name 2. the user name 3. the name is

我不确定我哪里错了，有人能帮我吗？我要做两件事，第一是让内容不包含(:或1.2.3.)。

第二，我也想删除\n并将my_string转换为列表。

谢谢

- JamesNEW

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Tim Biegeleisen · Accepted Answer

这里有一个有效的 子 方法：

my_string <- "\n
1. the user first name: Jamie.xx \n
2. the user name: yumi.xx \n
3. the name is: Myrile.xx \n"

output <- gsub("(?<=\n)\\d\\.\\s*(.*?):.*?\n", "\\1", my_string, perl=TRUE)
output <- sub("^\\s*|\\s*$", "", output)
output  # if you want a newline-separated string, stop here

lines <- strsplit(output, "\n")[[1]]
lines   # if you want a vector of lines, then use this

[1] "the user first name\nthe user name\nthe name is"
[1] "the user first name" "the user name"       "the name is"