有没有针对字符串自然排序的内置函数？

Question

有没有针对字符串自然排序的内置函数？

414

我有一个字符串列表，希望能够进行自然字母顺序排序。

例如，下面的列表是按照自然顺序排序的（我想要的）：

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

以下是上述列表的“排序”版本（使用sorted()获得）：

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

我正在寻找一个类似于第一个的排序函数。

- snakile

1

相关：Python中natsort函数的类比（使用"自然顺序"算法对列表进行排序） - jfs

24个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- pepoluan · Answer 1

让我提交一下自己对这个需求的看法：

from typing import Tuple, Union, Optional, Generator


StrOrInt = Union[str, int]


# On Python 3.6, string concatenation is REALLY fast
# Tested myself, and this fella also tested:
# https://blog.ganssle.io/articles/2019/11/string-concat.html
def griter(s: str) -> Generator[StrOrInt, None, None]:
    last_was_digit: Optional[bool] = None
    cluster: str = ""
    for c in s:
        if last_was_digit is None:
            last_was_digit = c.isdigit()
            cluster += c
            continue
        if c.isdigit() != last_was_digit:
            if last_was_digit:
                yield int(cluster)
            else:
                yield cluster
            last_was_digit = c.isdigit()
            cluster = ""
        cluster += c
    if last_was_digit:
        yield int(cluster)
    else:
        yield cluster
    return


def grouper(s: str) -> Tuple[StrOrInt, ...]:
    return tuple(griter(s))

现在如果我们有这样的列表：

filelist = [
    'File3', 'File007', 'File3a', 'File10', 'File11', 'File1', 'File4', 'File5',
    'File9', 'File8', 'File8b1', 'File8b2', 'File8b11', 'File6'
]

我们可以简单地使用key=关键字参数来进行自然排序：

>>> sorted(filelist, key=grouper)
['File1', 'File3', 'File3a', 'File4', 'File5', 'File6', 'File007', 'File8', 
'File8b1', 'File8b2', 'File8b11', 'File9', 'File10', 'File11']

这里的缺点显然是，现在函数会在小写字母之前排序大写字母。

我将把不区分大小写的分组实现留给读者自己完成 :-)

- Johny Vaknin · Answer 2

我建议您直接使用sorted的key关键字参数来实现您想要的列表。
例如：

to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
    # ordered should be [E1,e2,e3,E4,e5]

- Varadaraju G · Answer 3

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

致谢：

冒泡排序作业

如何在Python中逐个字母读取字符串

- SilentGhost · Answer 4

-3

>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

- SilentGhost

5

你的实现只解决了数字问题。如果字符串中没有数字，该实现将失败。例如，尝试对 ['silent','ghost'] 进行操作会导致列表索引超出范围的错误。 - snakile

2

@snaklie：你的问题没有提供足够的示例。你没有解释你正在尝试做什么，也没有用这些新信息更新你的问题。你没有发布任何你尝试过的东西，请不要对我的心灵感应尝试如此轻视。 - SilentGhost

5

@SilentGhost: 首先，我给了你一个赞同，因为我认为你的回答是有用的（尽管它并没有解决我的问题）。其次，我不能用例子涵盖所有可能的情况。我认为我已经很清楚地定义了自然排序。我不认为对于这样一个简单的概念，提供一个复杂的例子或长定义是一个好主意。如果你能想到更好的表达方式，欢迎编辑我的问题。 - snakile

1

@SilentGhost：我希望处理这些字符串的方式与Windows按名称排序文件时处理此类文件名的方式相同（忽略大小写等）。对我来说似乎很清楚，但是任何我说的话对我来说都很清楚，所以我不能判断它是否清楚。 - snakile

1

@snakile，你还没有接近定义自然搜索。这很难做到，需要很多细节。如果你想要Windows资源管理器使用的排序方式，你知道有一个简单的API调用可以提供这个功能吗？ - David Heffernan

显示剩余3条评论