两数之和算法变种 - 优化

Question

两数之和算法变种 - 优化

4

我需要计算目标值t在区间[-10000,10000] （包括端点）内的数量，使输入文件中存在不同的数字x、y满足x+y=t。我有能够工作的代码，并希望对其进行优化以提高运行速度。

此外，我们只会计算和t一次。如果有两个x,y对加起来等于55，你只会将结果增加一次。输入数字可能是正整数和负整数。

public static int Calculate(int start, int finish, HashSet<long> numbers)
{
    int result = 0;
    for (int sum = start; sum <= finish; sum++)
    {
        foreach (long n in numbers)
        {
            if (numbers.Contains(sum - n) && n != (sum - n))
            {
                result++;
                break;
            }
        }
    }

    return result;
}

这是一个任务，我已经拿到了满分。我的代码针对1百万个数字的数据集运行需要约30分钟。我尝试着思考如何优化我的代码，但未能找到正确的想法，希望得到一些帮助。

- PushCode

2

在这里，优化工作正题不属于讨论范围。本站旨在修复错误的代码。你可以尝试 codereview.stackexchange 来进行代码优化。 - deathismyfriend

3

有时候优化意味着使用完全不同的算法，这在本网站的范围内。尽管在 [c#] 标签下可能不太常见，但这是 [algorithm] 标签的主要目的之一。 - Juan Lopes

我投票关闭此问题，因为建议在codereview.stackexchange.com上发布。 - PushCode

2

@deathismyfriend 这个网站不仅仅是为了修复破损的代码。 - Jashaszun

2

我认为这个问题就放在原地不动就好了。 - Carlos Rodriguez

显示剩余17条评论

5个回答

1

请看下面的解决方案：

    public static int Calculate(int start, int finish, List<long> numbers)
    {
        HashSet<long> resultSet = new HashSet<long>();            
        for (int indexA = 0; indexA < numbers.Count - 1; indexA++)
        {
            for (int indexB = indexA+1; indexB < numbers.Count; indexB++)
            {
                long tempSum = numbers[indexA] + numbers[indexB];
                if ((tempSum >= start) && (tempSum <= finish))
                {
                    resultSet.Add(tempSum);
                }
            }
        }
        return resultSet.Count();
    }

- Vitali Donghvani

你能解释一下你和其他人做的不同之处，为什么这样做以及它有什么影响吗？ - Jeroen Vannevel

是的，这个想法是在一组数字中找到x和y，它们必须等于数字t。数字t是在start和finish之间的任何数字。为了避免重复的目标总和，我使用HashSet。 - Vitali Donghvani

1

不，不要... :) 请仔细阅读代码... Start 和 finish 不是索引。它们是可能的 t 值。 t 可能是：Start，Start+1，**...，Start+n，...，finish-1，finish**。 - Vitali Donghvani

1

英特尔Core i5-2500K（四核）的MIPS为83000万（每秒百万条指令）逻辑的Big O是O(n^2) 如果列表中的数字数量为100万，则Big O等于1万亿因此，Core i5需要：1000000000000 / 83000000000 = 12.05秒 - Vitali Donghvani

1

开始 <= t <= 结束;
t = x + y;
x 和 y 是列表中不同的数字;

- Vitali Donghvani

显示剩余7条评论

0

我相信这可以在O(NlogN)的时间复杂度内完成，其中N是用于构建总和的点数，并假设M，要查找的总和范围与之相比较小。

private int CalcSlide(int first, int last, HashSet<long> numbers)
{
  if (last < first)
    return 0;

  // Strategy:
  // running over the input numbers, in ordered fashion from below, we'll scan an "appropriate" sliding window
  // in that same collection to see if we find any sums that we didn't find before.
  // We start out with the lowest number as an "anchor" and determine the HIGHEST number that might be relevant,
  // i.e. that will not sum to beyond "last".
  // Saving that separately, for later use, we scan down until we sum (always with the "anchor") to below "first".
  //
  // We maintain the sums-to-be-found in a hash.
  // As soon as our scan hits a yet-to-be-found sum, we remove it from the hash and update first/last as appropriate.
  // Having completed this cycle we increase the anchor, redetermine the hightest relevant number to start a new scan,
  // (this redetermination starts at the saved previous value, walking down), and then we scan again.
  // Exits occur whever the obvious conditions are fullfilled: there's nothing left to search for, we've nothing left to scan.
  //
  // Time-complexity:
  // Given an input hash set we first construct a sorted array from that: O(nLogn)
  // The subsequent sliding-window-scan procedure is O(N) if we look at scnario's with N (the number of input points) being much larger
  // than the range of sums-to-find. More precisely the worst-case behaviour is O(N M logM), where M is the range of sums to find.
  // (It would be somewhat difficult to construct an scenario where we really have such "worst case behaviour").
  // 1. Obviously, the "outer loop" over the anchor is just "from 1 to N".
  // 2. Computing the upperbound of the scan, for each anchor again, is also linear: it just runs from N-1 down to the anchor
  //    (in separate strides for each anchor).
  // 3. The scan, at each round, is somewhat more complicated. It is potentially long - over all numbers down to the anchor.
  //    However, since all numbers are different (coming from a hashset), the scanwindow can never be larger than the range of sums (yet) to find.
  //    The worst-case behaviour is then O(N*MlogM), where the logM factor arises from having to search for a sum in the hashSet of sums.
  //    Of course, as we find "some" sums on the way, both the search-in-sumHashSet diminishes and the scanwindow gets smaller whenever we find
  //    the current "first" or "last".

  int sumCountOriginal = last - first + 1;

  // Get a sorted array from the provided numbers.
  long[] sortedSet = new long[this.HNumbers.Count];
  int j = 0;
  foreach (long value in HNumbers)
    sortedSet[j++] = value;
  Array.Sort(sortedSet);

  // Create a (relatively small) HashSet from the sums sought.
  // This serves to easily remove items that we find, and then verify if there are any left.
  HashSet<int> HSums = new HashSet<int>();
  for (int sum = first; sum <= last; ++sum)
    HSums.Add(sum);

  int N = sortedSet.Length;

  int jLow = 0;
  int jScanStart = sortedSet.Length - 1;

  while (HSums.Count > 0)
  {
    long xLow = sortedSet[jLow];
    long xScan = sortedSet[jScanStart];

    // decrease jScanStart to get maxSum or less.
    while (jScanStart > jLow + 1 && xScan + xLow > last)
      xScan = sortedSet[--jScanStart];

    // scan until we get below minSum.
    int jScan = jScanStart;
    while (jScan > jLow && xLow + xScan >= first)
    {
      int sum = (int)(xLow + xScan);
      if (HSums.Contains(sum))
      {
        HSums.Remove(sum);

        if (sum == last)
          while (last >= first && !HSums.Contains(last))
            last--;

        if (sum == first)
          while (first <= last && !HSums.Contains(first))
            first++;

        if (last < first)
          return sumCountOriginal;
      }

      xScan = sortedSet[--jScan];
    } // scan

    // next xLow.
    if (++jLow == N - 1 || jScanStart <= jLow)
      return sumCountOriginal - HSums.Count;

  } // there are still unresolved sums.

  // unexpected fall-thru - we should never get here.
  return sumCountOriginal - HSums.Count;
}

- Bert te Velde

0

这段代码可以通过使用更好的数据结构进行大量优化。使用SortedSet而不是HashSet。现在代码可以轻松优化：

public static int calculate(int start , int finish , SortedSet<long> numbers){
    int count = 0;

    for(long l in numbers){
        SortedSet<long> sub = numbers.GetViewBetween(l - start , l - finish);
        count += sub.Count() - (sub.Contains(l) ? 1 : 0);
    }

    return count;
}

请注意，由于我已经有一段时间没有使用C#了，所以这段代码可能无法编译。您甚至可以通过简单地忽略满足number.min() + n > finish或number.max() + n < start的值来进一步优化此代码。

- user4668606

我对C#不太熟悉 - 这段代码是如何避免计算重复的目标和的？我认为问题要求计算在start和finish之间可达到的不同和的数量。 - גלעד ברקן

在“x + y = t”中，只有“x”和“y”必须不同。或者至少这是我理解问题的方式，但我找不到任何禁止多个出现相同“t”的内容。如果您的意思是“这段代码如何避免'x = y'”，那么可以通过“-（sub.Contains(l)？1：0）”实现。 - user4668606

那么你的代码是如何确保下一个l不会再计算另外的55呢？ - גלעד ברקן

他说他得到了满分的代码只是计算不同的总和。 - גלעד ברקן

1

@PushCode，你应该在问题中添加这个约束条件，到目前为止这是不清晰的。 - user4668606

显示剩余4条评论

0

感谢David Eisenstat的贡献，将所有内容整合起来应用于具体案例中：

public static int Calculate(this IEnumerable<long> input, int minSum, int maxSum)
{
    var sortedSet = input.AsParallel().Distinct().OrderBy(n => n).ToArray();
    for (int lo = 0, hi = sortedSet.Length - 1; lo < hi;)
    {
        var sum = sortedSet[lo] + sortedSet[hi];
        if (sum < minSum) lo++;
        else if (sum > maxSum) hi--;
        else return 1 + Calculate(sortedSet, lo, hi, minSum, (int)sum - 1) + Calculate(sortedSet, lo, hi, (int)sum + 1, maxSum);
    }
    return 0;
}
private static int Calculate(long[] sortedSet, int lo, int hi, int minSum, int maxSum)
{
    int count = 0;
    for (int targetSum = minSum; targetSum <= maxSum; targetSum++)
    {
        for (int i = lo, j = hi; i < j;)
        {
            var sum = sortedSet[i] + sortedSet[j];
            if (sum < targetSum) i++;
            else if (sum > targetSum) j--;
            else { count++; break; }
        }
    }
    return count;
}

编辑更好的是：

public static int Calculate(this IEnumerable<long> input, int minSum, int maxSum)
{
    if (minSum > maxSum) return 0;
    var sortedSet = input.AsParallel().Distinct().OrderBy(n => n).ToArray();
    return CalculateCore(sortedSet, 0, sortedSet.Length - 1, minSum, maxSum);
}
private static int CalculateCore(long[] sortedSet, int lo, int hi, int minSum, int maxSum)
{
    if (minSum > maxSum) return 0;
    int count = 0;
    while (lo < hi)
    {
        var sum = sortedSet[lo] + sortedSet[hi];
        if (sum < minSum) lo++;
        else if (sum > maxSum) hi--;
        else
        {
            count++;
            if (minSum == maxSum) break;
            if (sum - minSum <= maxSum - sum)
            {
                count += CalculateCore(sortedSet, lo, hi - 1, minSum, (int)sum - 1);
                minSum = (int)sum + 1;
                lo++;
            }
            else
            {
                count += CalculateCore(sortedSet, lo + 1, hi, (int)sum + 1, maxSum);
                maxSum = (int)sum - 1;
                hi--;
            }
        };
    }
    return count;
}

- Ivan Stoev

运行时间还不是O(n^2)吗？ - PushCode

@PushCode 我认为这是 *O(NM)**。 - Ivan Stoev

在我的笔记本电脑上（当然是在发布模式下并且在VS之外运行），它在1M个数字的情况下完成不到一秒钟。 - Ivan Stoev

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- David Eisenstat · Accepted Answer

有一种经典的线性时间算法，可以在已排序数组A中找到两个数字，使它们的和等于指定的目标。将i初始化为第一个索引，将j初始化为最后一个索引。如果A[i]+A[j]小于目标，则增加i。如果小于目标，则减少j。当i>=j时停止。

使用比较排序对数组进行排序的渐近成本比哈希更高，但这种设置成本是可以忽略的，并且数组的改进内存布局将产生显着的性能提升。

为了进一步优化，您可以倒置循环结构，并为数组中的每个数字计算它所组成的组合。

要进一步优化（至少在渐近意义下），可以使用傅里叶变换将数组（部分）与彼此卷积。由于您的数字是不同的，我不确定实际上这是否会有所改善。