字符串中去除重复字符

4
这个问题可能与这个问题有关。
不幸的是,那里给出的解决方案在我的数据中不起作用。
我有以下矢量示例:
example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")

当然,我希望得到没有重复的相同字符串,即:

  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"

可以的吗?

1
你是如何最初得到这样的向量的?似乎更容易修复导致此类数据的步骤,而不是事后修复它。 - MrFlick
3个回答

10
你可以使用sub来直接捕获你在pattern中想要的比特位:
sub("(.+)\\1", "\\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"

(.+) 允许捕获某些模式,\\1 显示您刚刚捕获的内容,因此您要查找的是“任何东西重复出现两次”,然后用相同的“任何东西”替换,但只出现一次。


我不擅长正则表达式,你有没有参加过任何课程或者有相关的文档资料? - Henry Navarro
1
@HenryNavarro 我主要是通过帮助页面?regex和阅读SO上的问答来学习的;-) - Cath
1
所有的答案都很好,但我必须选择一个。每个答案都已经被我点赞了,感谢大家。 - Henry Navarro
2
没想到在模式本身中也可以使用 \\1!谢谢! - iod

5

如果所有字符串都是重复的,那么它们的长度是需要的两倍,因此取每个字符串的前一半:

> substr(example, 1, nchar(example)/2)
 [1] "Children"                      "Clothing and shoes"           
 [3] "Education, health and beauty"  "Leisure activities, traveling"
 [5] "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"      
 [9] "Transportation"                "Utility services"             

2
这确实很大程度上取决于重复。@Cath的解决方案具有这样的属性,即如果一个字符串没有重复,则会将整个字符串返回,而不是像我的代码一样只返回一半。 - Spacedman
所有的答案都很好,但我必须选择一个。每个答案都已经被我点赞了,感谢大家。 - Henry Navarro

3
我们可以尝试以下方法:
stringr::str_remove_all(example,"[a-z].*[A-Z]")

结果:

[1] "Children"                      "Clothing and shoes"            "Education, health and beauty" 
 [4] "Leisure activities, traveling" "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"       "Transportation"               
[10] "Utility services"  

1
所有的答案都很好,但我必须选择一个。每个答案都已经被我点赞了,感谢大家。 - Henry Navarro

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接