未排序数组中最长连续序列

18

给定一个数字数组,它们是未排序/随机顺序的。你需要找到数组中最长的连续数字序列。注意,序列不需要在数组内按排序顺序排列。这里是一个例子:

输入:

A[] = {10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101}  

输出结果是:

{16,17,18,19,20,21,22}  

这个解决方案需要O(n)的复杂度。

我被告知解决方案涉及使用哈希表,我发现一些实现使用了2个哈希表。无法使用排序解决,因为排序需要O(nlgn)的时间复杂度,这不是期望得到的。


2
“最长连续数字序列” - 这将是整个列表。 - Thorbjørn Ravn Andersen
示例 {16, 17,18,19,20,21,22} 的输出不是吗? - Petar Ivanov
这些是排序算法,需要知道数组中元素的起始和结束范围,这是一个很大的限制。如果我的问题不清楚,元素集合将不会给程序员任何关于元素起始/结束范围的先前信息。 - Anoop Menon
2
基数排序只需要元素大小恒定即可获得O(n)的时间复杂度,而它们确实是恒定的。 - harold
@Saeed Amiri - 谢谢。我以后会注意的 :) - Anoop Menon
显示剩余8条评论
8个回答

14
您可以拥有两个表格:
起始表格:(起始点,长度)
结束表格:(结束点,长度)
添加新项时,您需要检查:
1. 起始表格中是否存在值+1?如果是,则删除它,并创建一个新的条目(value,length+1),其中length是“当前”长度。您还需要更新结束表格,使其具有相同的结束点但更大的长度。
2. 结束表格中是否存在值-1?如果是,则删除它,并创建一个新的条目(value,length+1),这次要更新起始表格(起始位置将保持不变,但长度将增加)。
如果两个条件都成立,则实际上您正在将两个现有序列缝合在一起 - 用表示单个较长序列的两个新条目替换四个现有条目。
如果两个条件都不成立,则只需在两个表格中创建长度为1的新条目。
在添加完所有值后,您只需遍历起始表格以找到具有最大值的键。
我认为这样做是可行的,如果我们假设哈希查找/添加/删除是O(1),那么时间复杂度是O(n)。
C#实现如下。花了一点时间才弄对,但我认为它可以工作 :)
using System;
using System.Collections.Generic;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        Dictionary<int, int> starts = new Dictionary<int, int>();
        Dictionary<int, int> ends = new Dictionary<int, int>();

        foreach (var value in input)
        {
            int startLength;
            int endLength;
            bool extendsStart = starts.TryGetValue(value + 1,
                                                   out startLength);
            bool extendsEnd = ends.TryGetValue(value - 1,
                                               out endLength);

            // Stitch together two sequences
            if (extendsStart && extendsEnd)
            {
                ends.Remove(value + 1);
                starts.Remove(value - 1);
                int start = value - endLength;
                int newLength = startLength + endLength + 1;
                starts[start] = newLength;
                ends[start + newLength - 1] = newLength;
            }
            // Value just comes before an existing sequence
            else if (extendsStart)
            {
                int newLength = startLength + 1;
                starts[value] = newLength;
                ends[value + newLength - 1] = newLength;
                starts.Remove(value + 1);
            }
            else if (extendsEnd)
            {
                int newLength = endLength + 1;
                starts[value - newLength + 1] = newLength;
                ends[value] = newLength;
                ends.Remove(value - 1);
            }
            else
            {
                starts[value] = 1;
                ends[value] = 1;
            }
        }

        // Just for diagnostics - could actually pick the longest
        // in O(n)
        foreach (var sequence in starts)
        {
            Console.WriteLine("Start: {0}; Length: {1}",
                              sequence.Key, sequence.Value);
        }
    }
}

编辑:这里也有用C#实现的单个哈希集合答案 - 我同意,它比上面的答案更简单,但我会留下我的原始答案给后人。
using System;
using System.Collections.Generic;
using System.Linq;

class Test
{
    static void Main(string[] args)
    {
        int[] input = {10,21,45,22,7,2,67,19,13,45,12,
                11,18,16,17,100,201,20,101};

        HashSet<int> values = new HashSet<int>(input);

        int bestLength = 0;
        int bestStart = 0;
        // Can't use foreach as we're modifying it in-place
        while (values.Count > 0)
        {
            int value = values.First();
            values.Remove(value);
            int start = value;
            while (values.Remove(start - 1))
            {
                start--;
            }
            int end = value;
            while (values.Remove(end + 1))
            {
                end++;
            }

            int length = end - start + 1;
            if (length > bestLength)
            {
                bestLength = length;
                bestStart = start;
            }
        }
        Console.WriteLine("Best sequence starts at {0}; length {1}",
                          bestStart, bestLength);
    }
}

@Jon - 现在你提到的拼接操作是指删除4个条目并用2个替换它。假设在你对20的开始和结束表格完成后,对于DaveBall来说,这意味着{end = 19,length = 5}和{start = 20,length = 3}。那么拼接操作的成本是多少?这可能是O(n + k),其中k与原始数组中连续元素的数量成比例,对吧? - Anoop Menon
@rrenaud:我不明白这是如何工作的,因为哈希集是无序的。但如果有证据证明我错了,我会很高兴等待你的答案 :) - Jon Skeet
1
我在答案中发布了我的解决方案和可工作的代码。 - Rob Neuhaus
@Jon - 我并不是在质疑你的解决方案是否有效,只是想澄清我的疑惑。实际上,我应该注意到你正在使用'TryGetValue',这不会改变endLength和startLength在函数调用之间的值 - 这是我的错误 :) 是的,我注意到rrenaud坚持认为一个哈希就足够了。 - Anoop Menon
@Simply Aj:TryGetValue总是会设置endLength和startLength,因为我正在使用它们作为输出参数-但在每次迭代中,我只进入四个块中的一个,只使用已正确设置的值。 - Jon Skeet
显示剩余15条评论

7

将所有内容转储到一个哈希集中。

现在遍历哈希集。对于每个元素,查找当前值相邻的所有值。在删除找到的元素的同时,跟踪您可以找到的最大序列。保存计数以进行比较。

重复此过程,直到哈希集为空为止。

假设查找、插入和删除都是O(1)时间,这个算法的时间复杂度将为O(N)。

伪代码:

 int start, end, max
 int temp_start, temp_end, count

 hashset numbers

 for element in array:
     numbers.add(element)

 while !numbers.empty():
     number = numbers[0]
     count = 1
     temp_start, temp_end = number 

     while numbers.contains(number - 1):
         temp_start = number - 1; count++
         numbers.remove(number - 1)

     while numbers.contains(number + 1):
         temp_end = number + 1; count++
         numbers.remove(number + 1)

     if max < count:
         max = count
         start = temp_start; end = temp_end

 max_range = range(start, end)

嵌套的while看起来不太美观,但每个数字只能使用一次,因此时间复杂度应该为O(N)。

非常整洁,除了一个我遗漏的部分。如果我在序列中继续删除“n-1”元素,因为我已经在哈希表中找到了“n”,那么我该如何找到序列的起始位置呢? - Anoop Menon
@Anoop,我添加了伪代码,希望现在更清晰了。 - Henry Z
哈希表的第一个添加是O(n)。现在,对于哈希表中的每个数字,您都要向减量方向迭代“x”次,并向增量方向迭代“y”次。在您的解决方案中,while内嵌while几乎是不可避免的,这使得它> O(n),尽管常数k与原始集合中存在的连续元素数量成比例。然而,Jon的解决方案中,“k”的限制似乎受到“start”和“end”表操作均计算为true的影响。 - Anoop Menon
1
你需要更新擦除循环中的数字。 - Rob Neuhaus
@renaud - 如果你指的是count++在循环内部执行,那么是这样的。 - Anoop Menon
显示剩余2条评论

5

这里有一个使用单个哈希集合的Python解决方案,不需要进行任何复杂的区间合并。

def destruct_directed_run(num_set, start, direction):
  while start in num_set:
    num_set.remove(start)
    start += direction
  return start

def destruct_single_run(num_set):
  arbitrary_member = iter(num_set).next()
  bottom = destruct_directed_run(num_set, arbitrary_member, -1) 
  top = destruct_directed_run(num_set, arbitrary_member + 1, 1)
  return range(bottom + 1, top)

def max_run(data_set):
  nums = set(data_set)
  best_run = []
  while nums:
    cur_run = destruct_single_run(nums)
    if len(cur_run) > len(best_run):
      best_run = cur_run
  return best_run

def test_max_run(data_set, expected):
  actual = max_run(data_set)
  print data_set, actual, expected, 'Pass' if expected == actual else 'Fail'

print test_max_run([10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101], range(16, 23))
print test_max_run([1,2,3], range(1, 4))
print max_run([1,3,5]), 'any singleton output fine'

我在理解Python代码和您选择传递的range()参数方面遇到了一些困难。 - Anoop Menon
@AnoopMenon test_max_run 是一个单元测试,范围是数据集应返回的 max_run 的预期范围。 - hgf

2
另一种解决方案是使用哈希搜索,其运行时间为O(n)。
int maxCount = 0;
for (i = 0; i<N; i++) 
{ 
    // Search whether a[i] - 1 is present in the list.If it is present, 
    // you don't need to initiate count since it  will be counted when 
    // (a[i] - 1) is traversed.
    if (hash_search(a[i]-1))
        continue;

    // Now keep checking if a[i]++ is present in the list, increment the count
    num = a[i]; 
    while (hash_search(++num)) 
        count++;

    // Now check if this count is greater than the max_count got previously 
    // and update if it is
    if (count > maxCount)
    {
        maxIndex = i;
        count = maxCount;
    }
}

1

这里是实现:

static int[] F(int[] A)
{
    Dictionary<int, int> low = new Dictionary<int, int>();
    Dictionary<int, int> high = new Dictionary<int, int>();

    foreach (int a in A)
    {
        int lowLength, highLength;

        bool lowIn = low.TryGetValue(a + 1, out lowLength);
        bool highIn = high.TryGetValue(a - 1, out highLength);

        if (lowIn)
        {
            if (highIn)
            {
                low.Remove(a + 1);
                high.Remove(a - 1);
                low[a - highLength] = high[a + lowLength] = lowLength + highLength + 1;
            }
            else
            {
                low.Remove(a + 1);
                low[a] = high[a + lowLength] = lowLength + 1;
            }
        }
        else
        {
            if (highIn)
            {
                high.Remove(a - 1);
                high[a] = low[a - highLength] = highLength + 1;
            }
            else
            {
                high[a] = low[a] = 1;
            }
        }
    }

    int maxLow = 0, maxLength = 0;
    foreach (var pair in low)
    {
        if (pair.Value > maxLength)
        {
            maxLength = pair.Value;
            maxLow = pair.Key;
        }
    }

    int[] ret = new int[maxLength];
    for (int i = 0; i < maxLength; i++)
    {
        ret[i] = maxLow + i;
    }

    return ret;
}

谢谢!不错的解决方案,但是使用单个哈希桶的解决方案相对更好 :) 请查看Jon和rrenaud的解决方案。再次感谢Peter。 - Anoop Menon

1
class Solution {
public:
    struct Node{
        int lower;
        int higher;
        Node(int l, int h):lower(l),higher(h){

    }
};
int longestConsecutive(vector<int> &num) {
    // Start typing your C/C++ solution below
    // DO NOT write int main() function

    map<int,Node> interval_map;
    map<int,Node>::iterator curr_iter,inc_iter,des_iter;

    //first collect
    int curr = 0;
    int max = -1;
    for(size_t i = 0; i < num.size(); i++){
        curr = num[i];
        curr_iter = interval_map.find(curr);
        if (curr_iter == interval_map.end()){
            interval_map.insert(make_pair(curr,Node(curr,curr)));
        }
    } 
    //the next collect    
    for(curr_iter = interval_map.begin(); curr_iter != interval_map.end(); curr_iter++)
    {
        int lower = curr_iter->second.lower;
        int higher = curr_iter->second.higher;
        int newlower = lower, newhigher = higher;

        des_iter = interval_map.find(lower - 1);
        if (des_iter != interval_map.end())
        {
            curr_iter->second.lower = des_iter->second.lower;
            newlower = des_iter->second.lower;
        }

        inc_iter = interval_map.find(higher + 1);
        if (inc_iter != interval_map.end()){
            curr_iter->second.higher = inc_iter->second.higher;
            newhigher = inc_iter->second.higher;
        }

        if (des_iter != interval_map.end()){
            des_iter->second.higher = newhigher;
        }
        if (inc_iter != interval_map.end()){
            inc_iter->second.lower = newlower;
        }
        if (curr_iter->second.higher - curr_iter->second.lower + 1> max){
             max = curr_iter->second.higher - curr_iter->second.lower + 1;
         }
    }   
    return max;
}
};

感谢您发布答案!虽然代码片段可以回答问题,但添加一些额外的信息,如解释等,仍然很棒。 - j0k

0

这是 Grigor Gevorgyan 的解决方案,来自于这个问题的一个副本,但我认为它更简化了:

data = [1,3,5,7,4,6,10,3]

# other_sides[x] == other end of interval starting at x
# unknown values for any point not the end of an interval
other_sides = {}
# set eliminates duplicates, and is assumed to be an O(n) operation
for element in set(data):
    # my intervals left hand side will be the left hand side
    # of an interval ending just before this element
    try:
        left = other_sides[element - 1]
    except KeyError:
        left = element

    # my intervals right hand side will be the right hand side
    # of the interval starting just after me
    try:
        right = other_sides[element + 1]
    except KeyError:
        right = element

    # satisfy the invariants
    other_sides[left] = right
    other_sides[right] = left

# convert the dictionary to start, stop segments
# each segment is recorded twice, so only keep half of them
segments = [(start, stop) for start, stop in other_sides.items() if start <= stop]
# find the longest one
print max(segments, key = lambda segment: segment[1] - segment[0])

0

这是基于Grigor Gevorgyan的答案编写的Python代码,针对类似问题,我认为这是一个非常优雅的解决方案。

l = [10,21,45,22,7,2,67,19,13,45,12,11,18,16,17,100,201,20,101]
d = {x:None for x in l}
print d
for (k, v) in d.iteritems():
    if v is not None: continue
    a, b = d.get(k - 1), d.get(k + 1)
    if a is not None and b is not None: d[k], d[a], d[b] = k, b, a
    elif a is not None: d[a], d[k] = k, a
    elif b is not None: d[b], d[k] = k, b
    else: d[k] = k
    print d

m = max(d, key=lambda x: d[x] - x)
print m, d[m]

输出:

{2: 2, 67: None, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: None, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 100, 101: None, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: None, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: None, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: None, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 10, 11: None, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 11, 11: 10, 12: None, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: None, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 12, 11: 10, 12: 10, 45: 45, 13: None, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 16, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 17, 17: 16, 18: None, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 18, 17: 16, 18: 16, 19: None, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 19, 17: 16, 18: 16, 19: 16, 20: None, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 20, 17: 16, 18: 16, 19: 16, 20: 16, 21: None, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 21, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: None}
{2: 2, 67: 67, 100: 101, 101: 100, 7: 7, 201: 201, 10: 13, 11: 10, 12: 10, 45: 45, 13: 10, 16: 22, 17: 16, 18: 16, 19: 16, 20: 16, 21: 16, 22: 16}
16 22

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接