JavaScript函数中查找数组中最常见元素的速度差异问题

Question

JavaScript函数中查找数组中最常见元素的速度差异问题

3

我正在准备面试，正在做一些练习题。问题是：

在一个数组中找到重复次数最多的整数。

这是我创建的函数和他们创建的函数。它们被适当地命名了。

var arr = [3, 6, 6, 1, 5, 8, 9, 6, 6]

function mine(arr) {
  arr.sort()
  var count = 0;
  var integer = 0;
  var tempCount = 1;
  var tempInteger = 0;
  var prevInt = null
  for (var i = 0; i < arr.length; i++) {
    tempInteger = arr[i]
    if (i > 0) {
      prevInt = arr[i - 1]
    }
    if (prevInt == arr[i]) {
      tempCount += 1
      if (tempCount > count) {
        count = tempCount
        integer = tempInteger
      }
    } else {
      tempCount = 1
    }
  }
  console.log("most repeated is: " + integer)
}

function theirs(a) {
  var count = 1,
    tempCount;
  var popular = a[0];
  var temp = 0;
  for (var i = 0; i < (a.length - 1); i++) {
    temp = a[i];
    tempCount = 0;
    for (var j = 1; j < a.length; j++) {
      if (temp == a[j])
        tempCount++;
    }
    if (tempCount > count) {
      popular = temp;
      count = tempCount;
    }
  }
  console.log("most repeated is: " + popular)
}
console.time("mine")
mine(arr)
console.timeEnd("mine")
console.time("theirs")
theirs(arr)
console.timeEnd("theirs")

以下是结果：

most repeated is: 6
mine: 16.929ms
most repeated is: 6
theirs: 0.760ms

我的函数为什么比他们的慢？

- user3347300

我本以为我的更有效率，因为我没有额外的循环。 - user3347300

2

你在 mine 函数中使用了 sort，而 their 已经是排序好的数组。 - Grundy

2

我注释掉了 console.log("most repeated is: " + integer) 和 console.log("most repeated is: " + popular) 这两行代码，然后得到了 mine: 0.000ms, theirs 0.000ms 的结果。:-D - Grundy

1

我认为为了测试，数组应该稍微大一些。 - Grundy

1

@grundy 是的，大概要大上1000倍左右。 - Pointy

显示剩余3条评论

3个回答

2

我的测试结果

当我对一个包含50,000个元素的随机数组进行测试时，我得到了以下结果（JSFiddle）：

mine:     28.18 ms
theirs: 5374.69 ms

换句话说，你的算法似乎要快得多。这是预期的。

为什么你的算法更快呢？

您首先对数组进行排序，然后循环一次。Firefox使用归并排序，而Chrome使用变体的快速排序（根据此问题）。两者平均需要O(n*log(n))时间。然后您循环遍历数组，需要O(n)时间。总共可以简化为只有O(n*log(n))。

另一方面，他们的解决方案有一个嵌套循环，其中外部和内部循环都迭代所有元素。应该需要O(n^2)。换句话说，它更慢。

为什么你的测试结果不同呢？

那么，为什么您的测试结果与我的不同？我看到了许多可能性：

您使用了太小的样本。如果您只在代码中使用了九个数字，那肯定是这种情况。当您在测试中使用短数组时，开销（例如运行Gundy在评论中建议的console.log）占用所需时间的主导地位。这会使结果看起来完全随机。

neuronaut认为，这与他们的代码操作已由您的代码排序的数组有关。虽然这是一种不好的测试方式，但我无法看出它如何影响结果。

浏览器差异之类的东西。

关于.sort()的说明

进一步说明：您不应该使用.sort()对数字进行排序，因为它按字母顺序对事物进行排序。相反，请使用.sort(function(a, b){return a-b})。在此处阅读更多。

关于进一步说明的另一个说明：在这种特殊情况下，仅使用.sort()实际上可能更聪明。因为您不关心排序，只关心分组，所以它无论如何都会将具有相同值的元素分组在一起。如果没有比较函数更快（我认为是），那么无需使用它来排序。

一个更快的算法

您用O(n*log(n))解决了问题，但您可以只用O(n)来解决。实现这一点的算法非常直观。循环遍历数组，并跟踪每个数字出现的次数。然后选择出现最多次数的数字。

假设数组中有 m 个不同的数字。循环遍历数组需要 O(n) 的时间，查找最大值需要 O(m) 的时间。这样总共需要的时间为 O(n) + O(m)，但由于 m < n，所以简化后的时间复杂度为 O(n)。

以下是代码：

function anders(arr) {

    //Instead of an array we use an object and properties.
    //It works like a dictionary in other languages.
    var counts = new Object();

    //Count how many of each number there is.
    for(var i=0; i<arr.length; i++) {
        //Make sure the property is defined.
        if(typeof counts[arr[i]] === 'undefined')
            counts[arr[i]] = 0;
        //Increase the counter.
        counts[arr[i]]++;
    }

    var max;             //The number with the largest count.
    var max_count = -1;  //The largest count.

    //Iterate through all of the properties of the counts object
    //to find the number with the largerst count.
    for (var num in counts) {
        if (counts.hasOwnProperty(num)) {
            if(counts[num] > max_count) {
                max_count = counts[num];
                max = num;
            }
        }
    }

    //Return the result.
    return max;

}

在我的电脑上，对一个由 0 到 49 之间的 50,000 个元素组成的随机数组执行此操作仅需要 3.99 毫秒。换句话说，这是最快的方法。缺点是你需要 O(m) 的内存来存储每个数字出现的次数。

- Anders

1

他得到不同结果的原因是他尝试了太小的样本。 - Barmar

我不明白它如何会影响结果 - 也许是分支预测？ - ColBeseder

@ColBeseder 很好的观点。毕竟，这是 SO上投票最多的答案。 - Anders

0

这里的其他答案已经很好地解释了为什么theirs更快，以及如何优化你的代码。在处理大型数据集时，你的代码实际上更好（@Anders）。我设法优化了theirs的解决方案；也许这里有一些有用的东西。

通过使用一些基本的JS微优化，我可以获得一致更快的结果。这些优化也可以应用于您的原始函数，但我将它们应用于theirs。

预增量比后增量稍微快一点，因为不需要先将值读入内存
反向while循环非常快（在我的机器上），比我尝试过的任何其他东西都要快，因为JS被翻译成操作码，并且保证>= 0非常快。对于这个测试，我的电脑得分514,271,438 ops/sec，而下一个最快的得分198,959,074。
缓存length的结果-对于较大的数组，这将使better比theirs更明显地更快

代码：

function better(a) {
    var top = a[0],
        count = 0,
        i = len = a.length - 1;
    while (i--) {
        var j = len,
            temp = 0;
        while (j--) {
            if (a[j] == a[i]) ++temp;
        }
        if (temp > count) {
            count = temp;
            top = a[i];
        }
    }
    console.log("most repeated is " + top);
}

[fiddle]

如果不是完全相同的话，它与theirs非常相似，但具有上述微观优化。

在运行每个函数500次后，这里是结果。数组在运行任何函数之前预先排序，并从mine()中删除了sort。

mine: 44.076ms
theirs: 35.473ms
better: 32.016ms

- Scott

1

您IP地址为143.198.54.68，由于运营成本限制，当前对于免费用户的使用频率限制为每个IP每72小时10次对话，如需解除限制，请点击左下角设置图标按钮（手机用户先点击左上角菜单按钮）。 - Anders

1

真奇怪！我猜你只能选择适用于你特定元素集的更快的方法。 - Scott

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- neuronaut · Accepted Answer

看起来这不是一个公平的测试。当您首先运行您的函数时，它会对数组进行排序。这意味着他们的函数最终使用已经排序好的数据，但不会承受执行排序操作的时间成本。我尝试交换测试运行的顺序，并且得到了几乎相同的时间：

console.time("theirs")
theirs(arr)
console.timeEnd("theirs")
console.time("mine")
mine(arr)
console.timeEnd("mine")

most repeated is: 6
theirs: 0.307ms
most repeated is: 6
mine: 0.366ms

此外，如果您使用两个单独的数组，您会发现您的函数和他们的函数运行时间大致相同。

最后，请参考Anders的答案--它展示了更大的数据集揭示了您的函数的O(n*log(n))+O(n)性能与他们的函数的O(n^2)性能。