如何在字符串中查找子字符串，不考虑空格并知道原始字符串的索引

Question

如何在字符串中查找子字符串，不考虑空格并知道原始字符串的索引

4

通常我使用 str.find() 在Python中查找子字符串。

目前我有一个特殊情况。

First I want to ignore the whitespaces For example

s1= ' first words s t r i n g last words '
s2= 'string'
s3= 's tring'
s4= any other combination with the spaces

I want find to return true when I search s2 and s3 in s1

Secondly I want to get the starting and ending index of the substring within the original string

It could be that there are many spaces in the original string, for example
```
 s1= ' first words s t r    i n g last words '
```
I would like to have indices starting at s and ending at g in the original string.

编辑1

为了澄清，在源字符串和目标字符串中，空格都是无关紧要的。

谢谢

- Shan

1

你会期望在目标字符串为"st"的情况下，搜索字符串为"s t"的搜索成功吗？换句话说，搜索字符串中的空格是否对搜索有影响，但在目标字符串中则没有影响？ - holdenweb

你可以像这个一样使用正则表达式搜索。 - PyHomer

它们在源字符串和目标字符串上都是无关紧要的，感谢您的提问。我会相应地编辑问题。 - Shan

3个回答

1

您可以通过先去除要查找的字符串中的空格，然后在每个字符之间放置' *'（任意数量的空格）来创建正则表达式模式。由于您希望能够在搜索字符串中使用任何特殊字符，因此我们还需要对它们进行转义：

import re

def find_with_spaces(pattern, text):
    pattern = pattern.replace(' ', '')
    pattern_re = re.compile(' *'.join(map(re.escape, pattern)))

    m = pattern_re.search(text)
    if m:
        return m.start(), m.end()


s1= ' first words s { r * n g? last words '
s2= 's{r*ng?'

start, end = find_with_spaces(s2, s1)
print(start, end)
print(s1[start:end])

# 13 25
# s { r * n g?

在这种情况下，函数创建并使用的正则表达式模式是r's *\{ *r *\* *n *g *\?'。请注意，结束索引为25，而最后一个'?'位于索引24处 - 这使您可以使用s1[start:end]来获取匹配的子字符串。

s3= ' * ng?la'
start, end = find_with_spaces(s3, s1)
print(start, end)
print(s1[start:end])

# 19 28
# * n g? la

- Thierry Lathuille

嗨，看起来是个不错的解决方案，但如果“pattern”中包含“*”，它就无法工作。以下是错误信息：“sre_constants.error: multiple repeat at position 12”。 - Shan

嗨，我发现正则表达式使用的每个特殊字符都有问题，例如现在它在'?'上出现了问题。有一个're.escape()'函数可以解决这个问题，我尝试过简单地应用它，但它没有起作用。我尝试了re.escape(pattern)。 - Shan

没错，我应该用 '*' 来处理它们。完成了！ - Thierry Lathuille

你必须在连接每个字符之前独立转义它们，否则你也会在反斜杠和特殊字符之间插入 ' *'，请参见更新的答案。 - Thierry Lathuille

0

你可以试一下

import re

# Part-1. Check for substring existence
s1= str(input("String>\t\t"))
s2 = str(input("Substring>\t"))
print("Is substring present in string?\t-",s2.replace(" ", "") in s1.replace(" ", ""))

# Part-2. Search for exact pattern and indices in original string
s2 = s2.replace(" ", "")
s2 = re.compile(" *".join(s2))  #Define the pattern  of the substring here
if(s2.search(s1)):              #Search for defined pattern in original string 
    print((s2.search(s1)).start(), (s2.search(s1)).end())

"

" * "是要搜索的模式，它在任何字符后面、任意数量的空格后面，使用 * 作为合适的量词。抱歉，我正在努力习惯运行时输入并尽量减少变量使用，但这个方法完全可以正常工作。

"

- Sai Kiran

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- John · Accepted Answer

忽略任何字符串中的空格，您可以使用string.replace(" ", "") 要在字符串中查找子字符串，可以使用string.find(substr) 要从find结果中找到原始字符串中的位置，您必须将其移位到该点之前删除的空格数。

s1= ' first words s t r i n g last words '
s2= 'string'

s1_nospace=s1.replace(" ", "")
s2_nospace=s2.replace(" ", "")

nospace_index=s1_nospace.find(s2_nospace)

isnt_space = [not (x==" ") for x in s1]
# Cumulative sum of isnt_space
chars_before = reduce(lambda c, x: c + [c[-1] + x], isnt_space, [0])[1:]

start_index = chars_before.index(nospace_index+1)
end_index = chars_before.index(nospace_index+len(s2_nospace))

# start_index == 13
# end_index == 23

你当然可以对其进行清理和/或加速，但这应该以相对易读的方式完成任务。