我一直在做一个管理大量单词列表并通过许多测试来验证列表中每个单词的项目。有趣的是,每次我使用像 itertools
模块这样“更快”的工具时,它们似乎反而更慢。
最终我决定提出这个问题,因为我可能做错了什么。以下代码将尝试测试 any()
函数与循环使用的性能差异。
#!/usr/bin/python3
#
import time
from unicodedata import normalize
file_path='./tests'
start=time.time()
with open(file_path, encoding='utf-8', mode='rt') as f:
tests_list=f.read()
print('File reading done in {} seconds'.format(time.time() - start))
start=time.time()
tests_list=[line.strip() for line in normalize('NFC',tests_list).splitlines()]
print('String formalization, and list strip done in {} seconds'.format(time.time()-start))
print('{} strings'.format(len(tests_list)))
unallowed_combinations=['ab','ac','ad','ae','af','ag','ah','ai','af','ax',
'ae','rt','rz','bt','du','iz','ip','uy','io','ik',
'il','iw','ww','wp']
def combination_is_valid(string):
if any(combination in string for combination in unallowed_combinations):
return False
return True
def combination_is_valid2(string):
for combination in unallowed_combinations:
if combination in string:
return False
return True
print('Testing the performance of any()')
start=time.time()
for string in tests_list:
combination_is_valid(string)
print('combination_is_valid ended in {} seconds'.format(time.time()-start))
start=time.time()
for string in tests_list:
combination_is_valid2(string)
print('combination_is_valid2 ended in {} seconds'.format(time.time()-start))
前面的代码很好地代表了我所做的测试类型,如果我们看一下结果:
File reading done in 0.22988605499267578 seconds
String formalization, and list strip done in 6.803032875061035 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 80.74802565574646 seconds
combination_is_valid2 ended in 41.69514226913452 seconds
File reading done in 0.24268722534179688 seconds
String formalization, and list strip done in 6.720442771911621 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 79.05265760421753 seconds
combination_is_valid2 ended in 42.24800777435303 seconds
我发现使用循环比使用any()
快了一半,这真的很惊人。有什么解释吗?是我做错了什么吗?
(我在GNU-Linux下使用Python3.4)
True
? - Ignacio Vazquez-Abramsany
也会提前退出(只迭代到第一个真值),所以这不是区别。 - interjay