我希望确定下面数据框中的字符串列在前20个字符中是否至少重复了5次字母“V”或“G”。
示例数据:
data = data.frame(class = c('a','b','C'), string =
c("ASADSASAVVVVGVGGGSDASSSDDDFGDFGHFGHFGGGGGDDFFDDFGDFGTYJ",
"AWEERTGVTHRGEFGDFSDFSGGGGGGDAWSDFAASDADAADWERWEQWD",
"GRTVVGGVVVGGSWERGERVGEGDDFASDGGVQWEQWEQWERERYRYER"))
例如,第一行字符串的前20个字符位置中包含“VVVVG”。同样,第三行字符串中包含“VVGGV”。
data
# class string
#1 a ASADSASAVVVVGVGGGSDASSSDDDFGDFGHFGHFGGGGGDDFFDDFGDFGTYJ
#2 b AWEERTGVTHRGEFGDFSDFSGGGGGGDAWSDFAASDADAADWERWEQWD
#3 C GRTVVGGVVVGGSWERGERVGEGDDFASDGGVQWEQWEQWERERYRYER
期望的输出应该是这样的:
# class string result
# 1 a ASADSASAVVVVGVGGGSDASSSDDDFGDFGHFGHFGGGGGDDFFDDFGDFGTYJ TRUE
# 2 b AWEERTGVTHRGEFGDFSDFSGGGGGGDAWSDFAASDADAADWERWEQWD FALSE
# 3 C GRTVVGGVVVGGSWERGERVGEGDDFASDGGVQWEQWEQWERERYRYER TRUE
data$result <- grepl('(V|G){5,}', substr(data$string, 1,20))
- akrunVVVVV
而没有任何 'G',那该怎么办? - akrun