如何从列表中删除符合特定格式的所有字符串？

Question

如何从列表中删除符合特定格式的所有字符串？

7

问题：假设有一个列表a = ['abd', ' the dog', '4:45 AM', '1234 total', '等等...','6:31 PM', '2:36']，如何删除其中形如number:number|number和以AM/PM结尾的元素？即如何删除这些元素：4:45 AM、6:31 PM、'2:36'？

老实说，我还没有试过很多，因为我不确定从哪里开始，除了像下面这样的方法：

[x for x in a if x != something]

- cjg123

1

你有研究过正则表达式吗？ - dustinos3

1

^(\d+:\d+|\d+)$ 是你的正则表达式。 - Jean-François Fabre

如果字符串看起来像时间，但不是一个有效的时间怎么办？例如：133: 89 PM - Jerry

7个回答

3

考虑使用内置的 filter 函数和编译后的正则表达式。

>>> import re
>>> no_times = re.compile(r'^(?!\d\d?:\d\d(\s*[AP]M)?$).*$')
>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']

>>> filter(no_times.match, a)
['abd', ' the dog', '1234 total', 'etc...']

如果您想避免编译正则表达式，也可以使用lambda作为第一个参数，但这样会显得更杂乱无章。

>>> filter(lambda s: not re.match(r'^\d\d?:\d\d(\s*[AP]M)?$', s), a)
['abd', ' the dog', '1234 total', 'etc...']

请注意，在Python 3中，filter返回一个可迭代对象而不是一个列表。

这里的正则表达式的作用是接受所有字符串，除了匹配\d\d?:\d\d(\s*[AP]M)?$的字符串。这意味着除了以可选空格后跟AM或PM结尾的HH:MM之外的所有字符串。

- TehVulpes

避免编译正则表达式的原因可能是什么？ - sigjuice

我能想到的唯一可能是避免创建另一个变量名称，所以可能不是最好的理由。至少在cpython实现中，“match（）”无论如何都会编译和缓存模式。 - TehVulpes

3

在纯Python中尝试此代码。首先，它检查最后两个字符，如果最后两个字符等于'am'或'pm'，则应从列表中删除元素。其次，它检查每个元素是否包含“：”，如果在元素中找到“：”，则它会检查“：”之前和之后的字符。如果“：”之前和之后的字符都是数字，则从列表中删除该元素。该想法支持数字|数字:数字和数字:数字|数字。

def removeElements(a):
    removed_elements = []
    L = len(a)
    for i in range(L):
        element = a[i]
        if 'am' == element[-2:].lower() or 'pm' ==element[-2:].lower() :
            removed_elements.append(element)
        if ':' in element:
            part1 = element.split(':')
            part2 = element.split(':')
            if part1[-1].isdigit() and part2[0].isdigit():
                removed_elements.append(element)
    output =  []
    for element in a:
        if not(element in removed_elements):
            output.append(element)
    return output

a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
output = removeElements(a)
print output

这个例子的输出结果是： ['abd', ' the dog', '1234 total', '等等...']

- user10941319

2

请查看这个实现。

最初的回答

import re

a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
regex = re.compile(r'^[0-2]{0,1}[0-9]\:[0-5][0-9]\s{0,1}([AP][M]){0,1}')

a  = [x for x in a if not regex.match(x)]
print(a)

输出

['abd', ' the dog', '1234 total', 'etc...']

- chandima

2

正则表达式 \d:\d\d$ 匹配一个数字，后面跟着一个:，再后面跟着两个数字。

>>> import re
>>> a = ['abd', ' the dog', '4:45', '1234 total', 'etc...', '6:31']
>>> regex = re.compile('\d:\d\d$')
>>> [s for s in a if regex.match(s)]
['4:45', '6:31']
>>> [s for s in a if not regex.match(s)]
['abd', ' the dog', '1234 total', 'etc...']

\d+:\d+$会匹配冒号两侧至少有一个数字的任意数量。建议您进行尝试。文档在这里。

详细信息：$指定字符串的结尾，而re.match从字符串的开头开始查找。

- timgeb

2

正则表达式是一个简单的答案。

以下是纯Python的另一种选择：

>>> a = ['abd', ' the dog', '4:45', '1234 total', 'etc...','6:31', '1234']
>>> [s for s in a if not all(e.isdigit() for e in s.split(':'))]
['abd', ' the dog', '1234 total', 'etc...']

请注意，'1234'.split(':') 的一个副作用是将所有数字都过滤掉了。

如果可能出现类似'1:2:3'的数字，请注意：

>>> a = ['abd', ' the dog', '4:45', '1234 total', 'etc...','6:31', '1234', '1:2:3']
>>> [s for s in a if len(s.split(':'))<=2 and not all(e.isdigit() for e in s.split(':'))]
['abd', ' the dog', '1234 total', 'etc...']

- dawg

[s for s in a if len(s.split(':'))<=2 and not all(e.isdigit() for e in s.split(':'))] 修复了这个问题... - dawg

是的，但与正则表达式相比，那真是太可怕和慢了。我也想到过 [x for x in a if not x.replace(":","").isdigit()]，但是它有漏洞。 - Jean-François Fabre

使用模块 re 在哪些方面不是纯 Python？ - guidot

1

你不需要正则表达式，尝试使用：

>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
>>> [i for i in a if ':' not in i and not i[-2:] in ['AM','PM']]
['abd', ' the dog', '1234 total', 'etc...']
>>>

或者使用一个更简单的正则表达式解决方案：

>>> import re
>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
>>> [i for i in a if not re.search('\d+:\d+',i)]
['abd', ' the dog', '1234 total', 'etc...']
>>>

或者一种非正则表达式的版本，也更容易：

>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
>>> [i for i in a if ':' not in i]
['abd', ' the dog', '1234 total', 'etc...']
>>>

- U13-Forward

@cjg123 我添加了更好的解决方案，使用 re.match 很短，还有一个更短的列表推导式。 - U13-Forward

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- vks · Accepted Answer

你可以使用正则表达式\d+(?::\d+)?$并用它过滤。

查看演示。

https://regex101.com/r/HoGZYh/1

import re
a = ['abd', ' the dog', '4:45', '1234 total', '123', '6:31']
print [i for i in a if not re.match(r"\d+(?::\d+)?$", i)]

输出：['abd', ' the dog', '1234 total']