有没有一种方法可以替换和删除多行字符串中的某些行？

Question

有没有一种方法可以替换和删除多行字符串中的某些行？

4

我正在尝试处理一个多行字符串，替换并删除一些行。以下是代码。

>>> txt
'1 Introduction\nPart I: Applied Math and Machine Learning Basics\n2 Linear Algebra'
>>> tmp = []
>>> for line in txt.splitlines():
...     if re.findall('[0-9]', line):
...         replaced = re.sub('[0-9]', '#', line)
...         tmp.append(replaced)
>>> print(tmp)
['# Introduction', '# Linear Algebra']

这段代码虽然已经完成了我的工作，但我不确定它是否是最有效的方法。

我尝试了这个post和这个doc，似乎它们的多次查找都不适用于多行。

有更有效的方法吗？

- user11074017

你的代码没有问题。如果觉得更易读，你可以使用列表推导式将其压缩成一行：[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.findall('[0-9]', line)]。 - Selcuk

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sreeram TP · Answer 1

你可以在问题中提供的代码中使用列表推导式，这会使代码更加整洁。

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.findall('[0-9]', line) ]

# Output 
['# Introduction', '# Linear Algebra']

此外，就像 @CertainPerformance 在评论中提到的那样，如果您只想知道一个数字是否存在于字符串中，最好使用 search 而不是 findall。然后，您可以将列表推导代码重写为：

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.search('[0-9]', line) ]

# Output 
['# Introduction', '# Linear Algebra']

我在我的机器上使用search时，可以看到一些小的性能提升。

%%timeit 1000000

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.search('[0-9]', line) ]

# 4.76 µs ± 53.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit 1000000

[re.sub('[0-9]', '#', line) for line in txt.splitlines() if re.findall('[0-9]', line) ]

# 5.21 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)