有没有一种方法可以在R中分割驼峰命名字符串?
我已经尝试过:
string.to.split = "thisIsSomeCamelCase"
unlist(strsplit(string.to.split, split="[A-Z]") )
# [1] "this" "s" "ome" "amel" "ase"
有没有一种方法可以在R中分割驼峰命名字符串?
我已经尝试过:
string.to.split = "thisIsSomeCamelCase"
unlist(strsplit(string.to.split, split="[A-Z]") )
# [1] "this" "s" "ome" "amel" "ase"
string.to.split = "thisIsSomeCamelCase"
gsub("([A-Z]){1}", " \\1", string.to.split)
# [1] "this Is Some Camel Case"
# added a counter to prevent situation mentioned in comment
strsplit(gsub("([A-Z]{1})", " \\1", string.to.split), " ")
# [[1]]
# [1] "this" "Is" "Some" "Camel" "Case"
# another attempt to meet the commenter's concern
# inserts space between lower-single upper sequence
gsub("([[:lower:]])([[:upper:]]){1}", "\\1 \\2", string.to.split)
从Ramnath的和我的角度来看,我可以说我的最初印象是这个问题没有明确说明,这一点得到了支持。
同时,给Tommy和Ramnath点赞,因为他们指出了[:upper:]
strsplit(gsub("([[:upper:]])", " \\1", string.to.split), " ")
# [[1]]
# [1] "this" "Is" "Some" "Camel" "Case"
以下是一种实现方法
split_camelcase <- function(...){
strings <- unlist(list(...))
strings <- gsub("^[^[:alnum:]]+|[^[:alnum:]]+$", "", strings)
strings <- gsub("(?!^)(?=[[:upper:]])", " ", strings, perl = TRUE)
return(strsplit(tolower(strings), " ")[[1]])
}
split_camelcase("thisIsSomeGood")
# [1] "this" "is" "some" "good"
以下是一种使用单个正则表达式的方法(前瞻和后顾):
strsplit(string.to.split, "(?<=[a-z])(?=[A-Z])", perl = TRUE)
## [[1]]
## [1] "this" "Is" "Some" "Camel" "Case"
strsplit(string.to.split, "(?<=[[:lower:]])(?=[[:upper:]])", perl = TRUE)
- malcook^
),接着是一个或多个小写字母([[:lower:]]+
)或者(|
)一个大写字母([[:upper:]]
)后面跟着零个或多个小写字母([[:lower:]]*
),并使用处理匹配到的字符串(将所有匹配结果连接成一个向量)。与类似,它返回一个列表,因此我们取第一个组件([[1]]
)。library(gsubfn)
strapply(string.to.split, "^[[:lower:]]+|[[:upper:]][[:lower:]]*", c)[[1]]
## [1] "this" "Is" "Camel" "Case"
我认为我的另一个答案比下面的更好,但如果只需要一行代码来分割...我们来试试:
library(snakecase)
unlist(strsplit(to_parsed_case(string.to.split), "_"))
#> [1] "this" "Is" "Some" "Camel" "Case"
回答的开端是将所有字符拆分:
sp.x <- strsplit(string.to.split, "")
然后找出哪些字符串位置是大写字母:
ind.x <- lapply(sp.x, function(x) which(!tolower(x) == x))
然后使用它来分割每个字符的运行. . .
这里有一个简单的解决方案,使用snakecase和一些tidyverse辅助工具:
install.packages("snakecase")
library(snakecase)
library(magrittr)
library(stringr)
library(purrr)
string.to.split = "thisIsSomeCamelCase"
to_parsed_case(string.to.split) %>%
str_split(pattern = "_") %>%
purrr::flatten_chr()
#> [1] "this" "Is" "Some" "Camel" "Case"
snakecase的Github链接:https://github.com/Tazinho/snakecase
?regex
,找到小写字符的正确模式,并使用任意数量的它们的正确符号。 - IRTFM