我有以下这些范围:
7,10
11,13
11,15
14,20
23,39
我需要对重叠的范围执行联合,以得到不重叠的范围,因此在这个例子中:
7,20
23,39
我已经用 Ruby 完成了这个任务,我将范围的开始和结束推入数组中并进行排序,然后执行重叠范围的并集。有没有在 Python 中快速完成此操作的方法?
我有以下这些范围:
7,10
11,13
11,15
14,20
23,39
我需要对重叠的范围执行联合,以得到不重叠的范围,因此在这个例子中:
7,20
23,39
我已经用 Ruby 完成了这个任务,我将范围的开始和结束推入数组中并进行排序,然后执行重叠范围的并集。有没有在 Python 中快速完成此操作的方法?
假设有(7, 10)
和(11, 13)
两个数对,它们的结果是(7, 13)
:
a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
if b and b[-1][1] >= begin - 1:
b[-1] = (b[-1][0], end)
else:
b.append((begin, end))
b
现在是
[(7, 20), (23, 39)]
编辑:
正如 @CentAu 正确指出的那样,[(2,4), (1,6)]
将返回(1,4)
而不是(1,6)
。 这里是新版本,已经正确处理了这种情况:
a = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
b = []
for begin,end in sorted(a):
if b and b[-1][1] >= begin - 1:
b[-1][1] = max(b[-1][1], end)
else:
b.append([begin, end])
[(2,4),(1,6)]
的结果将是 [(1, 4)]
而不是 [(1, 6)]
。 - CentAub[-1][1] = max(b[-1][1], end)
。它会导致 TypeError: 'tuple' object does not support item assignment
错误。你需要使用可变列表 b.append([begin, end])
。 - tommy.carstensena = ([[7, 10], [7, 13], [11, 15], [6, 19], [6, 7], [14, 20], [23, 39], [40, 4], [200, 1] ])
,它会输出 [[6, 20], [23, 39], [200, 1]]
。难道不应该排除 [200,1]
吗? - JVK这是一个老问题,但为了以后的参考,我想要添加这个答案。 sympy可以用来实现区间的并集:
from sympy import Interval, Union
def union(data):
""" Union of a list of intervals e.g. [(1,2),(3,4)] """
intervals = [Interval(begin, end) for (begin, end) in data]
u = Union(*intervals)
return [list(u.args[:2])] if isinstance(u, Interval) \
else list(u.args)
如果Union
的输出超过两个区间,则为Union
对象,而当只有一个区间时,输出为Interval
对象。这就是在返回行中使用if语句
的原因。
示例:
In [26]: union([(10, 12), (14, 16), (15, 22)])
Out[26]: [[10, 12], [14, 22]]
In [27]: union([(10, 12), (9, 16)])
Out[27]: [[9, 16]]
def yi(li):
gen = (x for a,b in li for x in xrange(a,b+1))
start = p = gen.next()
for x in gen:
if x>p+2:
yield (start,p)
start = p = x
else:
p = x
yield (start,x)
aff
设置为True,则会显示执行步骤。def yi(li):
aff = 0
gen = (x for a,b in li for x in xrange(a,b+1))
start = p = gen.next()
for x in gen:
if aff:
print ('start %s p %d p+2 %d '
'x==%s' % (start,p,p+2,x))
if x>p+2:
if aff:
print 'yield range(%d,%d)' % (start,p+1)
yield (start,p)
start = p = x
else:
p = x
if aff:
print 'yield range(%d,%d)' % (start,x+1)
yield (start,x)
for li in ([(7,10),(23,39),(11,13),(11,15),(14,20),(45,46)],
[(7,10),(23,39),(11,13),(11,15),(14,20),(45,46),(45,45)],
[(7,10),(23,39),(11,13),(11,15),(14,20),(45,45)],
[(7,10),(23,39),(11,13),(11,6),(14,20),(45,46)],
#1 presence of (11, 6)
[(7,10),(23,39),(11,13),(-1,-5),(14,20),(45,45)],
#2 presence of (-1,-5)
[(7,10),(23,39),(11,13),(-9,-5),(14,20),(45,45)],
#3 presence of (-9, -5)
[(7,10),(23,39),(11,13),(-3,10),(14,20),(45,45)]):
#4 presence of (-3, 10)
li.sort()
print 'sorted li %s'%li
print '\n'.join(' (%d,%d) %r' % (a,b,range(a,b))
for a,b in li)
print 'list(yi(li)) %s\n' % list(yi(li))
结果
sorted li [(7, 10), (11, 13), (11, 15), (14, 20),
(23, 39), (45, 46)]
(7,10) [7, 8, 9]
(11,13) [11, 12]
(11,15) [11, 12, 13, 14]
(14,20) [14, 15, 16, 17, 18, 19]
(23,39) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]
(45,46) [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]
sorted li [(7, 10), (11, 13), (11, 15), (14, 20),
(23, 39), (45, 45), (45, 46)]
(7,10) [7, 8, 9]
(11,13) [11, 12]
(11,15) [11, 12, 13, 14]
(14,20) [14, 15, 16, 17, 18, 19]
(23,39) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]
(45,45) []
(45,46) [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]
sorted li [(7, 10), (11, 13), (11, 15), (14, 20),
(23, 39), (45, 45)]
(7,10) [7, 8, 9]
(11,13) [11, 12]
(11,15) [11, 12, 13, 14]
(14,20) [14, 15, 16, 17, 18, 19]
(23,39) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]
(45,45) []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]
sorted li [(7, 10), (11, 6), (11, 13), (14, 20),
(23, 39), (45, 46)]
(7,10) [7, 8, 9]
(11,6) []
(11,13) [11, 12]
(14,20) [14, 15, 16, 17, 18, 19]
(23,39) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]
(45,46) [45]
list(yi(li)) [(7, 20), (23, 39), (45, 46)]
sorted li [(-1, -5), (7, 10), (11, 13), (14, 20),
(23, 39), (45, 45)]
(-1,-5) []
(7,10) [7, 8, 9]
(11,13) [11, 12]
(14,20) [14, 15, 16, 17, 18, 19]
(23,39) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]
(45,45) []
list(yi(li)) [(7, 20), (23, 39), (45, 45)]
sorted li [(-9, -5), (7, 10), (11, 13), (14, 20),
(23, 39), (45, 45)]
(-9,-5) [-9, -8, -7, -6]
(7,10) [7, 8, 9]
(11,13) [11, 12]
(14,20) [14, 15, 16, 17, 18, 19]
(23,39) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]
(45,45) []
list(yi(li)) [(-9, -5), (7, 20), (23, 39), (45, 45)]
sorted li [(-3, 10), (7, 10), (11, 13), (14, 20),
(23, 39), (45, 45)]
(-3,10) [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(7,10) [7, 8, 9]
(11,13) [11, 12]
(14,20) [14, 15, 16, 17, 18, 19]
(23,39) [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38]
(45,45) []
list(yi(li)) [(-3, 20), (23, 39), (45, 45)]
def range_overlap_adjust(list_ranges):
overlap_corrected = []
for start, stop in sorted(list_ranges):
if overlap_corrected and start-1 <= overlap_corrected[-1][1] and stop >= overlap_corrected[-1][1]:
overlap_corrected[-1] = min(overlap_corrected[-1][0], start), stop
elif overlap_corrected and start <= overlap_corrected[-1][1] and stop <= overlap_corrected[-1][1]:
break
else:
overlap_corrected.append((start, stop))
return overlap_corrected
list_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
print(range_overlap_adjust(list_ranges))
# prints [(7, 20), (23, 39)]
functools.reduce
的一行代码(假设(x, 10)和(11, y)重叠):reduce(
lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
if acc[-1][1] >= el[0] - 1
else acc + [el],
ranges[1::],
ranges[0:1]
)
这从第一个范围开始,并使用reduce
遍历其余的范围。它将最后一个元素(acc[-1]
)与下一个范围(el
)进行比较。如果它们重叠,它将用两个范围的最小值和最大值替换最后一个元素(acc[:-1:] + [min, max]
)。如果它们不重叠,它只是将这个新范围放在列表的末尾(acc + [el]
)。
例子:
from functools import reduce
example_ranges = [(7, 10), (11, 13), (11, 15), (14, 20), (23, 39)]
def combine_overlaps(ranges):
return reduce(
lambda acc, el: acc[:-1:] + [(min(*acc[-1], *el), max(*acc[-1], *el))]
if acc[-1][1] >= el[0] - 1
else acc + [el],
ranges[1::],
ranges[0:1],
)
print(combine_overlaps(example_ranges))
输出:
[(7, 20), (23, 39)]
print(combine_overlaps([(7,10), (23,39), (9,20)]))
。这将打印出[(7, 10), (9, 39)]
而不是[(7,20), (23,39)]
。 - Stef