从给定的列表中检测至少三个连续数字的序列

25

我有一个数字列表,例如21,4,7,9,12,22,17,8,2,20,23。

我想要挑出连续数字序列(长度至少为3项),因此从上面的例子中,答案应该是7,8,9和20,21,22,23。

我已经尝试了一些复杂且冗长的函数,但我想知道是否有一种简洁的LINQ-ish方式来实现它。

有什么建议吗?

更新:

非常感谢所有回复,十分感激。我目前正在测试它们,看哪个最适合我们的项目。


该列表允许有重复数字吗? - Ryan Lundy
@Kyralessa 不,该列表永远不会包含重复项。 - David
12个回答

27

我认为你应该首先对列表进行排序。之后,只需要遍历它,记住当前序列的长度并检测何时它已结束。说实话,我 怀疑 一个简单的foreach循环将是最简单的方法 - 我不能立即想到任何非常简洁的类似于LINQ的方法。如果你真的想这样做,可以在迭代器块中完成,但请记住,首先对列表进行排序意味着你已经有了相当“前期”的成本。所以我的解决方案会像这样:

var ordered = list.OrderBy(x => x);
int count = 0;
int firstItem = 0; // Irrelevant to start with
foreach (int x in ordered)
{
    // First value in the ordered list: start of a sequence
    if (count == 0)
    {
        firstItem = x;
        count = 1;
    }
    // Skip duplicate values
    else if (x == firstItem + count - 1)
    {
        // No need to do anything
    }
    // New value contributes to sequence
    else if (x == firstItem + count)
    {
        count++;
    }
    // End of one sequence, start of another
    else
    {
        if (count >= 3)
        {
            Console.WriteLine("Found sequence of length {0} starting at {1}",
                              count, firstItem);
        }
        count = 1;
        firstItem = x;
    }
}
if (count >= 3)
{
    Console.WriteLine("Found sequence of length {0} starting at {1}",
                      count, firstItem);
}

编辑:好吧,我刚想到一种更符合LINQ思路的方法。我现在没有时间完全实现它,但是:

  • 对序列进行排序
  • 使用类似于SelectWithPrevious(可能更好地命名为SelectConsecutive)的东西来获取连续的元素对
  • 使用包含索引的Select重载以获取元组(索引、当前元素、前一个元素)
  • 过滤掉任何其中(current=previous + 1)的项目,从而获得任何计数为序列开始的位置(特殊情况下的索引=0)
  • 在结果上使用SelectWithPrevious,以获取两个起始点之间序列的长度(从前面减去一个索引)
  • 过滤掉长度小于3的任何序列

猜测你需要在有序序列上连接int.MinValue,以确保最后一个项目被正确使用。

编辑:好吧,我已经实现了这个方法。这大概是我能想到的最LINQ的方式...我使用空值作为“哨兵”值强制启动和结束序列-有关详细信息,请参见注释。

总的来说,我不建议使用这个解决方案。它很难理解,虽然我相当有信心它是正确的,但我还是花了一段时间思考可能出现的偏移错误等问题。这是进入LINQ可以做什么的有趣之旅...以及您可能不应该做什么的例子。

哦,还要注意,我将“长度至少为3”部分上推到了调用者 - 当你有这样的元组序列时,单独过滤它更清洁,我认为。

using System;
using System.Collections.Generic;
using System.Linq;

static class Extensions
{
    public static IEnumerable<TResult> SelectConsecutive<TSource, TResult>
        (this IEnumerable<TSource> source,
         Func<TSource, TSource, TResult> selector)
    {
        using (IEnumerator<TSource> iterator = source.GetEnumerator())
        {
           if (!iterator.MoveNext())
           {
               yield break;
           }
           TSource prev = iterator.Current;
           while (iterator.MoveNext())
           {
               TSource current = iterator.Current;
               yield return selector(prev, current);
               prev = current;
           }
        }
    }
}

class Test
{
    static void Main()
    {
        var list = new List<int> {  21,4,7,9,12,22,17,8,2,20,23 };

        foreach (var sequence in FindSequences(list).Where(x => x.Item1 >= 3))
        {
            Console.WriteLine("Found sequence of length {0} starting at {1}",
                              sequence.Item1, sequence.Item2);
        }
    }

    private static readonly int?[] End = { null };

    // Each tuple in the returned sequence is (length, first element)
    public static IEnumerable<Tuple<int, int>> FindSequences
         (IEnumerable<int> input)
    {
        // Use null values at the start and end of the ordered sequence
        // so that the first pair always starts a new sequence starting
        // with the lowest actual element, and the final pair always
        // starts a new one starting with null. That "sequence at the end"
        // is used to compute the length of the *real* final element.
        return End.Concat(input.OrderBy(x => x)
                               .Select(x => (int?) x))
                  .Concat(End)
                  // Work out consecutive pairs of items
                  .SelectConsecutive((x, y) => Tuple.Create(x, y))
                  // Remove duplicates
                  .Where(z => z.Item1 != z.Item2)
                  // Keep the index so we can tell sequence length
                  .Select((z, index) => new { z, index })
                  // Find sequence starting points
                  .Where(both => both.z.Item2 != both.z.Item1 + 1)
                  .SelectConsecutive((start1, start2) => 
                       Tuple.Create(start2.index - start1.index, 
                                    start1.z.Item2.Value));
    }
}

又一个 bug:它不会找到以排序列表开头开始的任何序列,除非它以 0 开头,例如 [5, 1, 2, 3] 不会找到 1,2,3 - Timwi
@Timwi:谢谢,已经修复了。我在实现过程中改变了一半的轨迹,忘记了初始值:( - Jon Skeet
1
仍然存在第一个错误,并且也没有重叠序列(我的有)。 - Timwi
@Timwi:通过调用Distinct()方法进行修复。 - Jon Skeet
@Timwi:现在已经修复,而且没有调用Distinct(),只是通过检测“与之前相同的值”来解决问题。 - Jon Skeet
显示剩余9条评论

14

Jon Skeet和Timwi的解决方案是正确的选择。

为了好玩,这里有一个执行此任务的LINQ查询(非常低效):

var sequences = input.Distinct()
                     .GroupBy(num => Enumerable.Range(num, int.MaxValue - num + 1)
                                               .TakeWhile(input.Contains)
                                               .Last())  //use the last member of the consecutive sequence as the key
                     .Where(seq => seq.Count() >= 3)
                     .Select(seq => seq.OrderBy(num => num)); // not necessary unless ordering is desirable inside each sequence.
查询的性能可以通过将输入加载到 HashSet 中(以改善Contains)稍微提高,但这仍然不能产生任何接近高效的解决方案。
我所知道的唯一错误是,如果序列包含大量负数(我们无法表示Rangecount参数),可能会出现算术溢出。这可以通过自定义的 static IEnumerable<int> To(this int start, int end) 扩展方法轻松修复。如果有人能想出其他简单的规避溢出的技巧,请让我知道。
编辑:以下是一个略微冗长(但同样低效)的变体,没有溢出问题。
var sequences = input.GroupBy(num => input.Where(candidate => candidate >= num)
                                          .OrderBy(candidate => candidate)
                                          .TakeWhile((candidate, index) => candidate == num + index)
                                          .Last())
                     .Where(seq => seq.Count() >= 3)
                     .Select(seq => seq.OrderBy(num => num));

1
关于负数,即使是-1也会因为溢出而失败。实际上,由于调用了Enumerable.Range(num, int.MaxValue - num + 1),所以0也会失败。 - configurator
1
为什么不在那里使用int.MaxValue呢? - configurator
@configurator:你关于负值的观点是正确的。已经进行了编辑。至于在count中使用int.MaxValue,这是不可能的,因为它会无法处理正数。还有其他建议吗?我认为Enumerable.Range将始终存在此错误,因为无法将Int32的范围表示为Int32。 - Ani

4

我认为我的解决方案更加优雅和简洁,因此更容易验证其正确性:

/// <summary>Returns a collection containing all consecutive sequences of
/// integers in the input collection.</summary>
/// <param name="input">The collection of integers in which to find
/// consecutive sequences.</param>
/// <param name="minLength">Minimum length that a sequence should have
/// to be returned.</param>
static IEnumerable<IEnumerable<int>> ConsecutiveSequences(
    IEnumerable<int> input, int minLength = 1)
{
    var results = new List<List<int>>();
    foreach (var i in input.OrderBy(x => x))
    {
        var existing = results.FirstOrDefault(lst => lst.Last() + 1 == i);
        if (existing == null)
            results.Add(new List<int> { i });
        else
            existing.Add(i);
    }
    return minLength <= 1 ? results :
        results.Where(lst => lst.Count >= minLength);
}

与其他解决方案相比的优点:

  • 它可以找到重叠的序列。
  • 它是可重复使用和有文档记录的。
  • 我没有发现任何错误 ;-)

你能举个“重叠序列”的例子吗?你是指一个值出现多次的情况吗? - Jon Skeet
好的,我现在明白了,但是我仍然觉得它不像我的那么简单。 - Jon Skeet
出于兴趣,我觉得在这种情况下你实际上并不需要转换为数组 - 除非我漏掉了什么,因为你只是用 i 来表示 arr[i] ... 所以你能否只使用 foreach 循环?在我看来,这肯定会简化事情。 - Jon Skeet
@Timwi:我很想听听你对我在答案的后半部分使用的“LINQy”方法的想法。我不认为它实际上是好的,但它很有趣。 - Jon Skeet
你为什么选择不使用一个迭代器方法,而是用一个保留了List<T>本地变量并持续yield的方法呢? - Ani
显示剩余2条评论

2
以下是以“LINQish”方式解决问题的方法:

以下是以“LINQish”方式解决问题的方法:

int[] arr = new int[]{ 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
IOrderedEnumerable<int> sorted = arr.OrderBy(x => x);
int cnt = sorted.Count();
int[] sortedArr = sorted.ToArray();
IEnumerable<int> selected = sortedArr.Where((x, idx) =>
    idx <= cnt - 3 && sortedArr[idx + 1] == x + 1 && sortedArr[idx + 2] == x + 2);
IEnumerable<int> result = selected.SelectMany(x => new int[] { x, x + 1, x + 2 }).Distinct();

Console.WriteLine(string.Join(",", result.Select(x=>x.ToString()).ToArray()));

由于需要进行数组复制和重构,这种解决方案当然比传统的循环解决方案效率低。

与 Jon 相同的 Bug:给定输入 [1, 2, 3, 2],它无法找到 1, 2, 3。在你的情况下,修复方法更简单:只需在 .OrderBy() 后添加 .Distinct()(并将 IOrderedEnumerable 更改为 IEnumerable,甚至更好的是,将所有内容都更改为 var)。 - Timwi
嗯,这种方法的另一个问题是,您只会在最后得到一个列表,而不是一个正确划分的序列集。 - Timwi

1

虽然不是100%的Linq,但这是一个通用的变体:

static IEnumerable<IEnumerable<TItem>> GetSequences<TItem>(
        int minSequenceLength, 
        Func<TItem, TItem, bool> areSequential, 
        IEnumerable<TItem> items)
    where TItem : IComparable<TItem>
{
    items = items
        .OrderBy(n => n)
        .Distinct().ToArray();

    var lastSelected = default(TItem);

    var sequences =
        from startItem in items
        where startItem.Equals(items.First())
            || startItem.CompareTo(lastSelected) > 0
        let sequence =
            from item in items
            where item.Equals(startItem) || areSequential(lastSelected, item)
            select (lastSelected = item)
        where sequence.Count() >= minSequenceLength
        select sequence;

    return sequences;
}

static void UsageInt()
{
    var sequences = GetSequences(
            3,
            (a, b) => a + 1 == b,
            new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 });

    foreach (var sequence in sequences)
        Console.WriteLine(string.Join(", ", sequence.ToArray()));
}

static void UsageChar()
{
    var list = new List<char>(
        "abcdefghijklmnopqrstuvwxyz".ToCharArray());

    var sequences = GetSequences(
            3,
            (a, b) => (list.IndexOf(a) + 1 == list.IndexOf(b)),
            "PleaseBeGentleWithMe".ToLower().ToCharArray());

    foreach (var sequence in sequences)
        Console.WriteLine(string.Join(", ", sequence.ToArray()));
}

1

What about sorting the array then create another array that is the difference between each element the previous one


sortedArray = 8, 9, 10, 21, 22, 23, 24, 27, 30, 31, 32
diffArray   =    1,  1, 11,  1,  1,  1,  3,  3,  1,  1
Now iterate through the difference array; if the difference equlas 1, increase the count of a variable, sequenceLength, by 1. If the difference is > 1, check the sequenceLength if it is >=2 then you have a sequence of at at least 3 consecutive elements. Then reset sequenceLenght to 0 and continue your loop on the difference array.


1

这是我的尝试:

public static class SequenceDetector
{
    public static IEnumerable<IEnumerable<T>> DetectSequenceWhere<T>(this IEnumerable<T> sequence, Func<T, T, bool> inSequenceSelector)
    {
        List<T> subsequence = null;
        // We can only have a sequence with 2 or more items
        T last = sequence.FirstOrDefault();
        foreach (var item in sequence.Skip(1))
        {
            if (inSequenceSelector(last, item))
            {
                // These form part of a sequence
                if (subsequence == null)
                {
                    subsequence = new List<T>();
                    subsequence.Add(last);
                }
                subsequence.Add(item);
            }
            else if (subsequence != null)
            {
                // We have a previous seq to return
                yield return subsequence;
                subsequence = null;
            }
            last = item;
        }
        if (subsequence != null)
        {
            // Return any trailing seq
            yield return subsequence;
        }
    }
}

public class test
{
    public static void run()
    {
        var list = new List<int> { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
        foreach (var subsequence in list
            .OrderBy(i => i)
            .Distinct()
            .DetectSequenceWhere((first, second) => first + 1 == second)
            .Where(seq => seq.Count() >= 3))
        {
            Console.WriteLine("Found subsequence {0}", 
                string.Join(", ", subsequence.Select(i => i.ToString()).ToArray()));
        }
    }
}

这将返回形成子序列的特定项,并允许任何类型的项和任何标准的定义,只要可以通过比较相邻项来确定。


0

这里是我用F#想出来的一个解决方案,将其转换成C# LINQ查询应该很容易,因为fold基本上等同于LINQ的aggregate运算符。

#light

let nums = [21;4;7;9;12;22;17;8;2;20;23]

let scanFunc (mainSeqLength, mainCounter, lastNum:int, subSequenceCounter:int, subSequence:'a list, foundSequences:'a list list) (num:'a) =
        (mainSeqLength, mainCounter + 1,
         num,
         (if num <> lastNum + 1 then 1 else subSequenceCounter+1),
         (if num <> lastNum + 1 then [num] else subSequence@[num]),
         if subSequenceCounter >= 3 then
            if mainSeqLength = mainCounter+1
                then foundSequences @ [subSequence@[num]]
            elif num <> lastNum + 1
                then foundSequences @ [subSequence]
            else foundSequences
         else foundSequences)

let subSequences = nums |> Seq.sort |> Seq.fold scanFunc (nums |> Seq.length, 0, 0, 0, [], []) |> fun (_,_,_,_,_,results) -> results

0

Linq并不是万能的解决方案,有时候使用简单的循环会更好。这里提供了一个解决方案,只需使用少量的Linq来对原始序列进行排序和过滤结果。

void Main()
{
    var numbers = new[] { 21,4,7,9,12,22,17,8,2,20,23 };
    var sequences =
        GetSequences(numbers, (prev, curr) => curr == prev + 1);
        .Where(s => s.Count() >= 3);
    sequences.Dump();
}

public static IEnumerable<IEnumerable<T>> GetSequences<T>(
    IEnumerable<T> source,
    Func<T, T, bool> areConsecutive)
{
    bool first = true;
    T prev = default(T);
    List<T> seq = new List<T>();
    foreach (var i in source.OrderBy(i => i))
    {
        if (!first && !areConsecutive(prev, i))
        {
            yield return seq.ToArray();
            seq.Clear();
        }
        first = false;
        seq.Add(i);
        prev = i;
    }
    if (seq.Any())
        yield return seq.ToArray();
}

0

我和Jon想到了同样的事情:要表示一系列连续的整数,你只需要两个微不足道的整数!所以我会从这里开始:

struct Range : IEnumerable<int>
{
    readonly int _start;
    readonly int _count;

    public Range(int start, int count)
    {
        _start = start;
        _count = count;
    }

    public int Start
    {
        get { return _start; }
    }

    public int Count
    {
        get { return _count; }
    }

    public int End
    {
        get { return _start + _count - 1; }
    }

    public IEnumerator<int> GetEnumerator()
    {
        for (int i = 0; i < _count; ++i)
        {
            yield return _start + i;
        }
    }

    // Heck, why not?
    public static Range operator +(Range x, int y)
    {
        return new Range(x.Start, x.Count + y);
    }

    // skipping the explicit IEnumerable.GetEnumerator implementation
}

从那里开始,您可以编写一个静态方法,返回一堆这些Range值,对应于您的序列连续数字。

static IEnumerable<Range> FindRanges(IEnumerable<int> source, int minCount)
{
    // throw exceptions on invalid arguments, maybe...

    var ordered = source.OrderBy(x => x);

    Range r = default(Range);

    foreach (int value in ordered)
    {
        // In "real" code I would've overridden the Equals method
        // and overloaded the == operator to write something like
        // if (r == Range.Empty) here... but this works well enough
        // for now, since the only time r.Count will be 0 is on the
        // first item.
        if (r.Count == 0)
        {
            r = new Range(value, 1);
            continue;
        }

        if (value == r.End)
        {
            // skip duplicates
            continue;
        }
        else if (value == r.End + 1)
        {
            // "append" consecutive values to the range
            r += 1;
        }
        else
        {
            // return what we've got so far
            if (r.Count >= minCount)
            {
                yield return r;
            }

            // start over
            r = new Range(value, 1);
        }
    }

    // return whatever we ended up with
    if (r.Count >= minCount)
    {
        yield return r;
    }
}

演示:

int[] numbers = new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };

foreach (Range r in FindConsecutiveRanges(numbers, 3))
{
    // Using .NET 3.5 here, don't have the much nicer string.Join overloads.
    Console.WriteLine(string.Join(", ", r.Select(x => x.ToString()).ToArray()));
}

输出:

7、8、9
20、21、22、23

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接