为什么startswith比切片操作慢？

Question

为什么startswith比切片操作慢？

55

为什么实现 startwith 比切片慢？

In [1]: x = 'foobar'

In [2]: y = 'foo'

In [3]: %timeit x.startswith(y)
1000000 loops, best of 3: 321 ns per loop

In [4]: %timeit x[:3] == y
10000000 loops, best of 3: 164 ns per loop

令人惊讶的是，即使包括长度计算在内，切片仍然明显更快:

In [5]: %timeit x[:len(y)] == y
1000000 loops, best of 3: 251 ns per loop

注意：这种行为的第一部分在Python数据分析（第三章）中有提到，但没有解释。

.

如果有帮助的话：这里是startswith的C代码；以及dis.dis的输出结果：

In [6]: import dis

In [7]: dis_it = lambda x: dis.dis(compile(x, '<none>', 'eval'))

In [8]: dis_it('x[:3]==y')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_CONST               0 (3)
              6 SLICE+2             
              7 LOAD_NAME                1 (y)
             10 COMPARE_OP               2 (==)
             13 RETURN_VALUE        

In [9]: dis_it('x.startswith(y)')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_ATTR                1 (startswith)
              6 LOAD_NAME                2 (y)
              9 CALL_FUNCTION            1
             12 RETURN_VALUE

- Andy Hayden

3

切片操作在C层面可能会被更加优化。 - Jakob Bowyer

4

这不是等价的代码。你可以将元组传递给 startswith，你也可以传递 start 和 end 参数，但要注意这需要一些时间来处理。 - SilentGhost

当谈到优化时，底层操作系统和CPU架构非常重要。这是x86（英特尔或AMD）还是x86-64位？还是ARM核心，在Linux或Windows上...？ - Louis Ricci

@LastCoder 这是一台运行osx的Intel x86-64计算机。我曾经认为（大体上）C（因此也包括CPython）在不同平台上的实现应该是可比较的（即如果一个架构运行代码较慢，则其他架构也会如此）...？ - Andy Hayden

1

@hayden - 操作系统标准库的版本差异，使用的编译器版本，运行测试时可用内存量，正在运行的后台任务/进程，处理器是Intel还是AMD等等都可能导致基准测试结果的变化。特别是在处理重复测试（数值相同且进行纳秒测量比较）时，您可能会遇到架构级别上的缓存和分支预测优化，这可能会扭曲您的结果。 - Louis Ricci

@LastCoder 我同意数字本身会不同 *(如果我运行它们两次，它们就会不同！)*，但是考虑到C编译器的成熟度，如果在所有平台上没有反映出相当大的差异（即一个比另一个更快），我会感到惊讶。我之前没有考虑过架构层面的“缓存和分支预测”...很有趣。 - Andy Hayden

5个回答

36

这个比较并不公平，因为你只测量了startswith返回True的情况。

>>> x = 'foobar'
>>> y = 'fool'
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 221 ns per loop
>>> %timeit x[:3] == y  # note: length mismatch
10000000 loops, best of 3: 122 ns per loop
>>> %timeit x[:4] == y
10000000 loops, best of 3: 158 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 210 ns per loop
>>> sw = x.startswith
>>> %timeit sw(y)
10000000 loops, best of 3: 176 ns per loop

此外，对于更长的字符串，startswith 的速度要快得多：

>>> import random
>>> import string
>>> x = '%030x' % random.randrange(256**10000)
>>> len(x)
20000
>>> y = r[:4000]
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 211 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 469 ns per loop
>>> sw = x.startswith
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop

即使没有匹配，这仍然是正确的。

# change last character of y
>>> y = y[:-1] + chr((ord(y[-1]) + 1) % 256)
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 210 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 470 ns per loop
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop
# change first character of y
>>> y = chr((ord(y[0]) + 1) % 256) + y[1:]
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 210 ns per loop
>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 442 ns per loop
>>> %timeit sw(y)
10000000 loops, best of 3: 168 ns per loop

所以，startswith 在短字符串上可能会比较慢，因为它是为长字符串优化的。

（从这个答案中获取随机字符串的技巧。）

- Fred Foo

9

startswith比切片复杂...

2924 result = _string_tailmatch(self,
2925 PyTuple_GET_ITEM(subobj, i),
2926 start, end, -1);

这不是一个简单的字符比较循环，用于在Haystack的开头找到Needle。我们正在查看一个for循环，它正在迭代一个向量/元组(subobj)，并在其上调用另一个函数(_string_tailmatch)。多个函数调用与堆栈、参数检查等方面有关的开销...

startswith是一个库函数，而切片似乎是内置于语言中的。

2919 if (!stringlib_parse_args_finds("startswith", args, &subobj, &start, &end))
2920 return NULL;

- Louis Ricci

3

"StartsWith可以处理向量。" - 我不清楚您的意思。 - Eric

@senderle - 没有C#...哈哈...无论如何，你在技术上是正确的，所以我已经适当地将其大小写转换了。 - Louis Ricci

@Eric - 看看源代码，它可以处理多个参数，而切片是固有的。 - Louis Ricci

好的，+1，现在更有意义了。 - senderle

@LastCoder：多参数使用是什么样子的？ - Eric

@Eric - 我的措辞不太恰当，我的意思只是由于额外的处理过程，可选参数 str.startswith(prefix[, start[, end]]) 增加了更多的复杂性。源代码使人们看起来好像 startsWith 能够处理多个“前缀”，但文档没有概述这种情况。 - Louis Ricci

8

引用文档的说法，startswith的功能超出你的想象：

str.startswith(prefix[, start[, end]])

返回True如果字符串以前缀开头，否则返回False。前缀也可以是要查找的前缀元组。使用可选的start参数，从该位置开始测试字符串。使用可选的end参数，在该位置停止比较字符串。

- Eric

0

调用函数是非常昂贵的。然而，我不知道对于用C语言编写的内置函数是否也是如此。

请注意，根据所使用的对象，切片可能也涉及到函数调用。

- glglgl

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- senderle · Accepted Answer

部分性能差异可以通过考虑.运算符所需的时间来解释：

>>> x = 'foobar'
>>> y = 'foo'
>>> sw = x.startswith
>>> %timeit x.startswith(y)
1000000 loops, best of 3: 316 ns per loop
>>> %timeit sw(y)
1000000 loops, best of 3: 267 ns per loop
>>> %timeit x[:3] == y
10000000 loops, best of 3: 151 ns per loop

差异的另一部分可以解释为startswith是一个函数，即使是没有操作的函数调用也需要一点时间：

>>> def f():
...     pass
... 
>>> %timeit f()
10000000 loops, best of 3: 105 ns per loop

这并不完全解释了两种方法的差异，因为使用切片和 len 的版本仍然调用了一个函数并且速度更快（与上面的 sw(y) 相比 - 267 纳秒）：

这并不能完全说明两种方法的差异，因为使用切片和 len 的版本仍然调用一个函数且速度更快（与上面的 sw(y) 比较 -- 267 ns）：

>>> %timeit x[:len(y)] == y
1000000 loops, best of 3: 213 ns per loop

我唯一的猜测是Python可能针对内建函数优化了查找时间，或者len调用已经被严重优化(这很可能是真的)。也许可以通过自定义的len函数来测试。或者可能是由LastCoder指出的差异造成的。请注意larsmans的结果表明，对于长字符串，startswith实际上更快。上述推理仅适用于那些实际影响性能的情况。