在Perl中循环遍历两个数组并删除重叠部分

Question

在Perl中循环遍历两个数组并删除重叠部分

5

我有两组范围，用 [起始值，终止值] 表示。其中一些范围重叠，意味着一个范围的起始位置在另一个范围的 [起始值，终止值] 之间。我想创建一组没有重叠的新范围，同时也不包含任何范围中的新值。

这些范围看起来像这样：

@starts  @ends
      5    108 
      5    187
     44    187
     44    229 
     44    236 
     64    236 
    104    236
    580    644
    632    770

我期望得到的输出是这样的：

@starts  @ends
      5    236
    580    770

这是因为前七个范围与从5 => 236的区间重叠，而最后两个范围与从632 => 770的区间重叠。

以下是我尝试的代码：

$fix = 0;
foreach (@ends) {  
    if ($starts[$fix + 1] < $ends[$fix]) {
        splice(@ends, $fix, $fix);
        splice(@starts, $fix + 1, $fix + 1);
    } else {
        $fix += 1;
    }
}

我可以打印出数值，只需要帮助合并的算法。

- Orion

@Blender - 它在范围 [5, 236] 内。 - Ted Hopp

2

@Orion - [5, 10]和[10, 12]会合并成[5, 12]吗？[5, 10]和[11, 12]呢？另外，数组是否总是排序的？如果是，按开始还是结束排序？（从您发布的示例中无法确定。） - Ted Hopp

数组按照起始位置排序，是的，[5, 10]和[5, 12]会被合并为[5,12]。 - Orion

@Orion - 我在询问关于 [5, 10] 和 [10, 12]（相邻但不重叠）的内容。 - Ted Hopp

5个回答

1

我认为这就是您想要的。您有一系列形如 [start，stop] 的范围，并且希望合并重叠的范围。下面的方法相当简单。

存在两组范围，原始范围和合并范围。
将第一个范围添加到合并（非重叠）范围集合中。对于原始集合中剩余的每个候选范围，您需要做出选择：
- 如果该候选范围与已经在合并集合中的范围重叠，则适当地扩展合并集合中范围的边界。
- 如果候选范围与合并集合中的任何范围之间没有重叠，则将其添加到合并集合中。

希望这讲得清楚。从您的问题中不太清楚这是否符合您的要求，如果不正确，请告诉我。

#!/usr/bin/perl

use strict;
use warnings;

my @starts = qw/ 5 5 44 44 44 64 104 580 632 /;
my @ends   = qw/ 108 187 187 229 236 236 236 644 770 /;

my @ranges;
while ( @starts && @ends ) {
    my $s = shift @starts;
    my $e = shift @ends;
    push @ranges, [ $s, $e ];
}

my @merged_ranges;
push @merged_ranges, shift @ranges;

foreach my $range (@ranges) {
    my $overlap = 0;
    foreach my $m_range (@merged_ranges) {
        if ( ranges_overlap($range,$m_range) ) {
            $overlap = 1;
            $m_range = merge_range($range,$m_range);
        }
    }
    if ( !$overlap ) {
        push @merged_ranges, $range;
    }
}

print join ' ', qw/ start end /;
print "\n";
foreach my $range (@merged_ranges) {
    print join ' ', ( $range->[0], $range->[1] );
    print "\n";
}

sub ranges_overlap {
    my $r1 = shift;
    my $r2 = shift;

    return ( $r1->[0] <= $r2->[1] && $r2->[0] <= $r1->[1] );
}

sub merge_range {
    my $r1 = shift;
    my $r2 = shift;
    use List::Util qw/ min max/;

    my $merged = [ min($r1->[0],$r2->[0]), max($r1->[1],$r2->[1]) ];
    return $merged;
}

- James Thompson

我该如何将我的数据放入一个数组中，目前它分别在两个数组中，一个包含开始，另一个包含结束。 - Orion

我刚刚添加了一些代码来完成这个。在Perl中，方括号表示数组引用，因此范围由一对 [$start, $end] 值表示。您可以使用 $ref->[0] 和 $ref->[1] 语法对数组引用进行索引。如果您无法理解，请搜索 Perl 中的数组引用或多维数组。 - James Thompson

1

ranges_overlap应该只是：$r1->[0] <= $r2->[1] && $r2->[0] <= $r1->[1]。你错误地返回了false，例如([1,5],[3,4])。 - ysth

@ysth - 感谢你发现了这个问题。我已经修复了代码以反映你的错误修复。 - James Thompson

1

由于数组按照起始位置排序，因此最简单的方法是从末尾开始处理：

# this assumes at least one element in @starts, @ends
my $n = $#starts;
for (my $i = $#starts - 1; $i >= 0; $i--) {
    if ($ends[$i] < $starts[$n]) {
        # new interval
        $n--;
        ($starts[$n], $ends[$n]) = ($starts[$i], $ends[$i]);
    } else {
        # merge intervals - first scan for how far back to go
        while ($n < $#starts && $ends[$i] < $starts[$n+1]) {
            $n++;
        }
        $starts[$n] = $starts[$i];
    }
}
@starts = @starts[$n..$#starts];
@ends   = @ends[$n..$#ends];

- Ted Hopp

0

这个怎么样？

#!perl

use strict;
use warnings;

my @starts = qw(5   5   44  44  44  64  104 580 632);
my @ends =   qw(108 187 187 229 236 236 236 644 770);

my @starts_new;
my @ends_new;

if ((scalar @starts) ne (scalar @ends)) {
    die "Arrays are not of equal length!\n";
}

my %ranges;
my $next_i = 0;
for (my $i=0; $i <= $#starts; $i=$next_i) {
    # If nothing changes below, the next array item we'll visit is the next sequential one.
    $next_i = $i + 1;

    # Init some temp stuff.
    my $start = $starts[$i]; # this one shouldn't change during this "for $i" loop
    my $end = $ends[$i];
    for (my $j=$i+1; $j <= $#ends; $j++) {
        if ($starts[$j] <= $end) {
            # This item further down the @starts array is actually less than
            # (or equal to) the current $end.
            # So, we need to "skip" this item in @starts and update
            # $end to reflect the corresponding entry in @ends.
            $next_i = $j +1;
            $end = $ends[$j] if ($ends[$j] > $end);
        }
    }
    # We have a valid start/end pair.
    push (@starts_new, $start);
    push (@ends_new, $end);
}

for (my $i=0; $i <= $#starts_new; $i++) {
    print "$starts_new[$i], $ends_new[$i]\n";
}

- jimtut

0

我不太擅长PERL，但以下的伪代码解决方案可能很容易适应：

for(i=0; i<N;){
    //we know that the next merged interval starts here:
    start = starts[i]
    end   = ends[i]

    for(i=i+1; i < N && starts[i] < end; i++){  //perhaps you want <= ?
        end = maximum(end, ends[i]);
    }

    add (start, end) to merged array
}

- hugomg

这将创建额外的结果。您将获得每个外部FOR循环的结果。因此，使用此测试数据会产生9个结果，而不是所需的2个结果。我认为我的算法避免了这种情况，特别是对于这个测试数据。 - jimtut

也许我应该使用while循环来更清晰地表达 - 内部循环还更新了i变量，所以最终一切都正常工作。仔细看，它似乎就像我刚才放弃了你的next_i。 - hugomg

是的，我现在可以看到了。谢谢！ - jimtut

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Chad A. Davis · Accepted Answer

这会直接在原数组上进行编辑，当它们重叠时，只是简单地将边界合并。

# Since they're sorted by @starts, accept the 0th interval, start at 1
for (1..$#starts) {
    # extra check on array bounds, since we edit in-place
    last unless $_ < @starts;
    # don't need to collapse if no overlap with previous end
    next unless $starts[$_] <= $ends[$_-1];
    # delete this start and the previous end
    splice(@starts,$_,1);
    splice(@ends,$_-1,1);
    # rerun this loop for the same value of $_ since it was deleted
    redo;
}