Python：如何在给定索引列表的字符串中替换子串

Question

Python：如何在给定索引列表的字符串中替换子串

8

我有一个字符串：

"A XYZ B XYZ C"

并且还有一组索引元组的列表：

((2, 5), (8, 11))

我想要将每个由索引定义的子字符串替换为它们的和：

A 7 B 19 C

我不能使用字符串替换，因为它会匹配XYZ的两个实例。使用索引信息进行替换将在第二次和第四次迭代中出现错误，因为索引在整个过程中正在移动。

有没有一个好的解决方案来解决这个问题？

更新。 字符串仅用作示例。我事先不知道其内容，也无法在解决方案中使用它们。

我的临时解决方案是：

text = "A XYZ B XYZ C"
replace_list = ((2, 5), (8, 11))

offset = 0
for rpl in replace_list:
    l = rpl[0] + offset
    r = rpl[1] + offset

    replacement = str(r + l)
    text = text[0:l] + replacement + text[r:]

    offset += len(replacement) - (r - l)

它依赖于索引元组的顺序是升序的。有更好的方法吗？

- Denis Kulagin

9个回答

5

您可以使用 re.sub()：

In [17]: s = "A XYZ B XYZ C"

In [18]: ind = ((2, 5), (8, 11))

In [19]: inds = map(sum, ind)

In [20]: re.sub(r'XYZ', lambda _: str(next(inds)), s)
Out[20]: 'A 7 B 19 C'

但请注意，如果匹配的数量大于您的索引对，则会引发一个StopIteration错误。在这种情况下，您可以传递一个默认参数给next()来替换子字符串。

如果您想使用索引元组查找子字符串，这里提供另一种解决方案：

In [81]: flat_ind = tuple(i for sub in ind for i in sub)
# Create all the pairs with respect to your intended indices. 
In [82]: inds = [(0, ind[0][0]), *zip(flat_ind, flat_ind[1:]), (ind[-1][-1], len(s))]
# replace the respective slice of the string with sum of indices of they exist in intended pairs, otherwise just the sub-string itself.
In [85]: ''.join([str(i+j) if (i, j) in ind else s[i:j] for i, j in inds])
Out[85]: 'A 7 B 19 C'

- Mazdak

XYZ只是一个例子，他们想要替换给定范围内的项目。 - Ashwini Chaudhary

@AshwiniChaudhary 是的，我现在看到了编辑。我会更新答案，感谢提醒。 - Mazdak

2

以下是一种快速且稍微有点粗糙的解决方案，使用字符串格式化和元组解包：

s = 'A XYZ B XYZ C'
reps = ((2, 5), (8, 11))
totals = (sum(r) for r in reps)
print s.replace('XYZ','{}').format(*totals)

这将打印：

A 7 B 19 C

首先，我们使用生成器表达式找到每个替换的总数。然后，通过用'{}'替换'XYZ'，我们可以使用字符串格式化——*totals将确保我们按正确的顺序获得总数。

编辑

我没有意识到索引实际上是字符串索引——我的错误。为了做到这一点，我们可以使用以下代码：re.sub：

import re
s = 'A XYZ B XYZ C'

reps = ((2, 5), (8, 11))
for a, b in reps:
    s = s[:a] + '~'*(b-a) + s[b:]
totals = (sum(r) for r in reps)
print re.sub(r'(~+)', r'{}', s).format(*totals)

假设您的字符串中没有使用波浪线（~）-如果有，请用其他字符替换。这还假设没有任何“替换”组是连续的。

- asongtoruin

1

那是一个特殊情况。实际上，我不知道子字符串由索引定义。XYZ只是重复标记的示例。 - Denis Kulagin

@DenisKulagin 对不起，我误解了问题。让我更新答案。 - asongtoruin

2

使用itertools.groupby是实现这一点的一种方法。

from itertools import groupby


indices = ((2, 5), (8, 11))
data = list("A XYZ B XYZ C")

我们首先用相同数量的None替换匹配项的范围。

for a, b in indices:
    data[a:b] = [None] * (b - a)

print(data)
# ['A', ' ', None, None, None, ' ', 'B', ' ', None, None, None, ' ', 'C']

我们遍历分组数据并用来自indices列表的总和替换None分组。

it = iter(indices)
output = []
for k, g in groupby(data, lambda x: x is not None):
    if k:
        output.extend(g)
    else:
        output.append(str(sum(next(it))))

print(''.join(output))
# A 7 B 19 C

- Ashwini Chaudhary

2

假设没有重叠的部分，那么你可以按照相反的顺序完成它。

text = "A XYZ B XYZ C"
replace_list = ((2, 5), (8, 11))

for start, end in reversed(replace_list):
    text = f'{text[:start]}{start + end}{text[end:]}'

# A 7 B 19 C

- Steven Summers

1

这是一个反向列表切片赋值的解决方案：

text = "A XYZ B XYZ C"
indices = ((2, 5), (8, 11))
chars = list(text)

for start, end in reversed(indices):
    chars[start:end] = str(start + end)

text = ''.join(chars) # A 7 B 19 C

- Jared Goguen

0

另一个 itertools 解决方案

from itertools import *

s = "A XYZ B XYZ C"
inds = ((2, 5), (8, 11))
res = 'A 7 B 19 C'


inds = list(chain([0], *inds, [len(s)]))
res_ = ''.join(s[i:j] if k % 2 == 0 else str(i + j)
        for k, (i,j) in enumerate(zip(inds, inds[1:])))

assert res == res_

- hilberts_drinking_problem

0

预计如果这些整数对选择在这里有用，它们也将在其他地方有用，那么我可能会做类似于这样的事情：

def make_selections(data, selections):
    start = 0
    # sorted(selections) if you don't want to require the caller to provide them in order
    for selection in selections:
        yield None, data[start:selection[0]]
        yield selection, data[selection[0]:selection[1]]
        start = selection[1]
    yield None, data[start:]

def replace_selections_with_total(data, selections):
    return ''.join(
        str(selection[0] + selection[1]) if selection else value
        for selection, value in make_selections(data, selections)
    )

这仍然依赖于选择不重叠，但我甚至不确定它们重叠会意味着什么。

然后你可以使替换本身更加灵活：

def replace_selections(data, selections, replacement):
    return ''.join(
        replacement(selection, value) if selection else value
        for selection, value in make_selections(data, selections)
    )

def replace_selections_with_total(data, selections):
    return replace_selections(data, selections, lambda s,_: str(s[0]+s[1]))

- Steve Jessop

0

还有一个解决方案可以完全满足您的需求。我还没有完全弄清楚，但您可能想使用：来自re库的re.sub()。

看这里，并查找函数re.sub()或re.subn()： https://docs.python.org/2/library/re.html

如果我有时间，我会在今天晚些时候解决您的示例。

- Ramon van der Werf

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mike Müller · Accepted Answer

命令式和有状态的:

s = 'A XYZ B XYZ C'
indices = ((2, 5), (8, 11))
res = []
i = 0
for start, end in indices:
    res.append(s[i:start] + str(start + end))
    i = end
res.append(s[end:])
print(''.join(res))

结果：

A 7 B 19 C