将一系列数字转换为范围

12

我有一堆数字,比如以下:

1 2 3 4  6 7 8  20 24 28 32

那里呈现的信息可以在Python中表示为范围:

[range(1, 5), range(6, 9), range(20, 33, 4)]

在我的输出中,我会写1..4, 6..8, 20..32..4,但那只是表现形式的问题。

另一个答案展示了如何处理连续的范围。但我不知道如何轻松处理像上面一样的跨度范围。是否存在类似的技巧?

4个回答

4
这是一个直截了当解决问题的方法。
def get_ranges(ls):
    N = len(ls)
    while ls:
        # single element remains, yield the trivial range
        if N == 1:
            yield range(ls[0], ls[0] + 1)
            break

        diff = ls[1] - ls[0]
        # find the last index that satisfies the determined difference
        i = next(i for i in range(1, N) if i + 1 == N or ls[i+1] - ls[i] != diff)

        yield range(ls[0], ls[i] + 1, diff)

        # update variables
        ls = ls[i+1:]
        N -= i + 1

get_ranges([1,2,4,5,7,9] 最终得到一个范围为 [7, 9]。 - George
@George 你期望什么?上述算法将按预期生成[1,2],[4,5],[7,9],因为它正在贪婪地填充范围。如果你想要一个非贪婪的算法,就需要完全不同的方法,而这个问题中没有任何提示表明它是非贪婪的。 - Jared Goguen
啊,我误解了问题。没关系 :) - George

4
def ranges(data):
    result = []
    if not data:
        return result
    idata = iter(data)
    first = prev = next(idata)
    for following in idata:
        if following - prev == 1:
            prev = following
        else:
            result.append((first, prev + 1))
            first = prev = following
    # There was either exactly 1 element and the loop never ran,
    # or the loop just normally ended and we need to account
    # for the last remaining range.
    result.append((first, prev+1))
    return result

测试:

>>> data = range(1, 5) + range(6, 9) + range(20, 24)
>>> print ranges(data)
[(1, 5), (6, 9), (20, 24)]

作为一项练习,添加一个设置步长的参数。自动检测步长需要进行初步全面扫描。 - 9000
我不相信那是真的。请查看其他答案,包括我的答案。 - Jared Goguen
@JaredGoguen:你的解决方案假设输入数据以一个长度大于1的序列开始,因此第一个差异是步长。考虑输入[10, 15, 5, 6, 7, 30, 31, 32]。步长可以被确定为元素之间最小的差异(需要完全扫描),但当范围相接触时会导致问题:[1, 2, 3, 3, 4, 5],甚至相交:[10, 11, 12, 11, 12, 13]。步长自动检测,作为任何模式识别问题,都不是简单的,并需要明确说明一些假设。 - 9000

1
你可以使用itertools模块中的groupbycount,以及collections模块中的Counter,就像这个例子一样:

更新:请查看注释以了解此解决方案的逻辑和限制。

from itertools import groupby, count
from collections import Counter

def ranges_list(data=list, func=range, min_condition=1):
    # Sort in place the ranges list
    data.sort()

    # Find all the steps between the ranges's elements
    steps = [v-k for k,v in zip(data, data[1:])]

    # Find the repeated items's steps based on condition. 
    # Default: repeated more than once (min_condition = 1)
    repeated = [item for item, count in Counter(steps).items() if count > min_condition]

    # Group the items in to a dict based on the repeated steps
    groups = {k:[list(v) for _,v in groupby(data, lambda n, c = count(step = k): n-next(c))] for k in repeated}

    # Create a dict:
    # - keys are the steps
    # - values are the grouped elements
    sub = {k:[j for j in v if len(j) > 1] for k,v in groups.items()}

    # Those two lines are for pretty printing purpose:
    # They are meant to have a sorted output.
    # You can replace them by:
    # return [func(j[0], j[-1]+1,k) for k,v in sub.items() for j in v]
    # Otherwise:
    final = [(j[0], j[-1]+1,k) for k,v in sub.items() for j in v]
    return [func(*k) for k in sorted(final, key = lambda x: x[0])]

ranges1 = [1, 2, 3, 4, 6, 7, 8, 20, 24, 28, 32]
ranges2 = [1, 2, 3, 4, 6, 7, 10, 20, 24, 28, 50,51,59,60]

print(ranges_list(ranges1))
print(ranges_list(ranges2))

输出:

[range(1, 5), range(6, 9), range(20, 33, 4)]
[range(1, 5), range(6, 8), range(20, 29, 4), range(50, 52), range(59, 61)]

限制:
使用这种输入方式:
ranges3 = [1,3,6,10]
print(ranges_list(ranges3)
print(ranges_list(ranges3, min_condition=0))

将输出:

# Steps are repeated <= 1 with the condition: min_condition = 1
# Will output an empty list
[]
# With min_condition = 0
# Will output the ranges using: zip(data, data[1:])
[range(1, 4, 2), range(3, 7, 3), range(6, 11, 4)]

随意使用此解决方案,并采用或修改它以满足您的需求。


第二个序列不应该产生 range(10, 21, 10) 吗? - Jared Goguen
是的,当我设置条件 min_confirmation = 0 时,它将输出:[range(1, 5), range(4, 7, 2), range(6, 8), range(7, 11, 3), range(10, 21, 10), range(20, 29, 4), range(28, 51, 22), range(50, 52), range(51, 60, 8), range(59, 61)],所以包括 range(10, 21, 10)。这在第三个序列中列出,根据限制,我认为这将产生非期望的输出。我仍在等待 OP 的评论,以保持代码不变或进行修改。 - Chiheb Nexus

0

可能不是非常简短或优雅,但它似乎可以工作:

def ranges(ls):
    li = iter(ls)
    first = next(li)
    while True:
        try:
            element = next(li)
        except StopIteration:
            yield range(first, first+1)
            return
        step = element - first
        last = element
        while True:
            try:
                element = next(li)
            except StopIteration:
                yield range(first, last+step, step)
                return
            if element - last != step:
                yield range(first, last+step, step)
                first = element
                break
            last = element

这将迭代列表的迭代器,并生成范围对象:

>>> list(ranges([1, 2, 3, 4, 6, 7, 8, 20, 24, 28, 32]))
[range(1, 5), range(6, 9), range(20, 33, 4)]

它还处理负范围和只有一个元素的范围:

>>> list(ranges([9,8,7, 1,3,5, 99])
[range(9, 6, -1), range(1, 7, 2), range(99, 100)]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接