如何加速我的数组搜索函数？

Question

如何加速我的数组搜索函数？

3

我正在开发一个用react-native编写的字典应用。

当我想要从搜索框中过滤数组时，我编写了下面的函数。在我使用2000个单词列表进行测试时，这个函数运行得非常好。但是，当单词列表增加到成千上万个单词时，搜索速度变得非常慢。

那么，我应该如何改进这个搜索函数呢？

//Filter array when input text (Search)

let filteredWords = []
if(this.state.searchField != null)
{
  filteredWords = this.state.glossaries.filter(glossary => {
    return glossary.word.toLowerCase().includes(this.state.searchField.toLowerCase());
  })
}

- Sras

此外，Array 函数（例如 filter）与常规的 for 循环相比速度较慢。如果您想让代码更快，则可以使用 for 循环进行重构，这并不难做到。 - ibrahim mahrir

1

@ibrahimmahrir 这个问题缺乏很多上下文，以目前的形式在代码审查上不符合主题。请参阅*Stack Overflow用户的代码审查指南* 以获取更多信息。 - Zeta

1

@ibrahimmahrir 我不同意。也许问题的措辞可以更好，但这不是关于提出如何改进代码质量的建议。这个问题是关于一个问题：性能差。 - Wazner

糟糕。将includes()误读为Array.prototype.includes变体。 - Zeta

一个零成本的解决方案是使用每个单词的第一个字母创建索引。例如，filteredWords['a'] = []; filteredWords['b'] = []; filteredWords['c'] = []; 这样，您可以将搜索应用于比原始数组小的数组。当然，这种方法并不可扩展，但我猜您的词汇表也有一个定义好的增长上限。 - Joe Aspara

显示剩余5条评论

2个回答

2

由于这个问题似乎不适合在 CodeReview 上，我认为有几件事情可以让您的代码大大加快速度[citation needed]：

将对 this.state.searchField.toLowerCase() 的调用缓存起来，因为您不需要在每次迭代时都调用它。
使用普通的 for 循环而不是花哨但较慢的 Array 函数。

这就是最终结果：

let filteredWords = []
if(this.state.searchField != null) {
    let searchField = this.state.searchField.toLowerCase(),
        theArray = this.state.glossaries;                          // cache this too

    for(let i = 0, l = theArray.length; i < l; ++i) {
        if(theArray[i].word.toLowerCase().includes(searchField)) {
            filteredWords.push(theArray[i]);
        }
    }
}

编辑：

如果您想搜索以 searchField 开头的词汇表，则使用 indexOf === 0 作为条件，而不是使用 includes，代码如下：

if(theArray[i].word.toLowerCase().indexOf(searchField) === 0) {

- ibrahim mahrir

1

非常感谢你！你的代码比我写的快太多了。还有一件事，我发现这个搜索是“searchField = 'he'”。它会选择“{ahello, ahellob, hello, aahello, hellob, helloc}”。但是我认为如果搜索算法是“{searchField ='he'}”，那应该会更快。它应该只给出结果“{he, hell, hello, hellob, helloc}”。你能否给我一些想法来编写这个搜索？再次感谢你的帮助。 - Sras

我发现的是这样的：搜索词 'A' 结果为 'Aa，ab，ac，abbb，aee。' 不包括 'bAa，Aa，ab，Bac,.----' - Sras

@user3036876 请查看编辑！ - ibrahim mahrir

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Wazner · Accepted Answer

这段代码运行缓慢的原因有多个：

你在使用lambda表达式时使用了filter()方法。这会为每个被搜索的项添加一个函数调用开销。
在调用includes()之前，你对两个字符串都调用了toLowercase()方法。这将为每次比较分配两个新的字符串对象。
你在调用includes()方法。由于某些浏览器优化不足，includes()方法并不像indexOf()方法那样高效。

`for`循环（-11%）

我建议你不要使用filter()方法，而是创建一个新的Array并使用for循环来填充它。

const glossaries = this.state.glossaries;
const searchField = this.state.searchField;
const filteredWords = [];   

for (let i = 0; i < glossaries.length; i++) {
  if (glossaries[i].toLowerCase().includes(searchField.toLowerCase())) {
    filteredWords.push(glossaries[i]);
  }
}

toLowerCase()分配内存减少45%

由于JavaScript使用垃圾回收机制释放已使用的内存，因此内存分配非常昂贵。当进行垃圾回收时，整个程序会暂停，同时尝试查找不再使用的内存。

您可以通过在每次更新词汇表时完全消除搜索循环中的toLowerCase()（我假设这不经常发生）来摆脱它。

// When you build the glossary
this.state.glossaries = ...;
this.state.searchGlossaries = this.state.glossaries.map(g => g.toLowerCase());

你还可以在循环之前调用 toLowerCase() 以删除 searchText 上的该方法。经过这些更改，代码将如下所示：

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].includes(searchField)) {
    filteredWords.push(glossaries[i]);
  }
}

`indexOf()`代替`includes()`（-13%）

我并不确定为什么会这样，但测试显示indexOf比includes快得多。

const glossaries = this.state.glossaries;
const searchGlassaries = this.state.searchGlossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = []; 

for (let i = 0; i < glossaries.length; i++) {
  if (searchGlassaries[i].indexOf(searchField) !== -1) {
    filteredWords.push(glossaries[i]);
  }
}

总体表现有所提升，提升了70%。我从 https://jsperf.com/so-question-perf 获取了性能百分比。

优化算法

在评论中，您说您想要一个示例，展示在仅匹配以搜索文本开头的单词时可以进行的优化。一种方法是使用二分搜索算法。

让我们以上面的代码为起点。在存储之前，我们对术语表进行排序。为了不区分大小写地排序，JavaScript公开了 Intl.Collator 构造函数。它提供了compare(x, y) 方法，返回如下结果:

negative value  | X is less than Y
zero            | X is equal to Y
positive value  | X is greater than Y

最终的代码如下：

// Static in the file
const collator = new Intl.Collator(undefined, {
  sensitivity: 'base'
});

function binarySearch(glossaries, searchText) {
  let lo = 0;
  let hi = glossaries.length - 1;

  while (lo <= hi) {
    let mid = (lo + hi) / 2 | 0;
    let comparison = collator.compare(glossaries[mid].word, searchText);

    if (comparison < 0) {
      lo = mid + 1;
    }
    else if (comparison > 0) {
      hi = mid - 1;
    }
    else {
      return mid;
    }
  }

  return -1;
}

// When you build the glossary
this.state.glossaries = ...;
this.state.glossaries.sort(function(x, y) {
  return collator.compare(x.word, y.word);
});

// When you search
const glossaries = this.state.glossaries;
const searchField = this.state.searchField.toLowerCase();
const filteredWords = [];

const idx = binarySearch(glossaries, searchField);

if (idx != -1) {
  // Find the index of the first matching word, seeing as the binary search
  // will end up somewhere in the middle
  while (idx >= 0 && collator.compare(glossaries[idx].word, searchField) < 0) {
    idx--;
  }

  // Add each matching word to the filteredWords
  while (idx < glossaries.length && collator.compare(glossaries[idx].word, searchField) == 0) {
    filteredWords.push(glossaries[idx]);
  }
}

如何加速我的数组搜索函数？

for循环（-11%）

toLowerCase()分配内存减少45%

indexOf()代替includes()（-13%）

优化算法

`for`循环（-11%）

`indexOf()`代替`includes()`（-13%）