如何优化使用Python字典？

Question

如何优化使用Python字典？

pythonoptimizationdictionarysparse-matrix

60

问题：

我对我的Python程序进行了详细的分析，有一个函数使得整个程序变得缓慢。它大量使用了Python字典，所以我可能没有最好地使用它们。如果我无法让它运行得更快，我将不得不用C++重新编写它，那么有谁能帮助我优化Python代码呢？

我希望我已经给出了正确的解释，并且您可以理解我的代码！非常感谢您提供任何帮助。

我的代码：

这是有问题的函数，使用line_profiler和kernprof进行了分析。我正在运行Python 2.7

我特别困惑于像363、389和405行这样的地方，其中if语句与两个变量的比较似乎需要很长时间。

我考虑过使用NumPy（因为它可以处理稀疏矩阵），但我认为它不适用于以下原因：（1）我没有使用整数索引矩阵（我使用对象实例）；（2）我没有在矩阵中存储简单的数据类型（我存储了一个浮点数和一个对象实例的元组）。但我愿意接受关于NumPy的说服。如果有人了解NumPy的稀疏矩阵性能与Python哈希表的比较，我会很感兴趣。

对不起，我没有提供一个简单的示例供您测试，因为这个函数涉及到一个更大的项目，而且我无法想出如何设置一个简单的示例来测试它，而不给您我的代码库的一半！

Timer unit: 3.33366e-10 s
File: routing_distances.py
Function: propagate_distances_node at line 328
Total time: 807.234 s

Line #   Hits         Time  Per Hit   % Time  Line Contents
328                                               @profile
329                                               def propagate_distances_node(self, node_a, cutoff_distance=200):
330                                                       
331                                                   # a makes sure its immediate neighbours are correctly in its distance table
332                                                   # because its immediate neighbours may change as binds/folding change
333    737753   3733642341   5060.8      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
334    512120   2077788924   4057.2      0.1              use_neighbour_link = False
335                                                       
336    512120   2465798454   4814.9      0.1              if(node_b not in self.node_distances[node_a]): # a doesn't know distance to b
337     15857     66075687   4167.0      0.0                  use_neighbour_link = True
338                                                       else: # a does know distance to b
339    496263   2390534838   4817.1      0.1                  (node_distance_b_a, next_node) = self.node_distances[node_a][node_b]
340    496263   2058112872   4147.2      0.1                  if(node_distance_b_a > neighbour_distance_b_a): # neighbour distance is shorter
341        81       331794   4096.2      0.0                      use_neighbour_link = True
342    496182   2665644192   5372.3      0.1                  elif((None == next_node) and (float('+inf') == neighbour_distance_b_a)): # direct route that has just broken
343        75       313623   4181.6      0.0                      use_neighbour_link = True
344                                                               
345    512120   1992514932   3890.7      0.1              if(use_neighbour_link):
346     16013     78149007   4880.3      0.0                  self.node_distances[node_a][node_b] = (neighbour_distance_b_a, None)
347     16013     83489949   5213.9      0.0                  self.nodes_changed.add(node_a)
348                                                           
349                                                           ## Affinity distances update
350     16013     86020794   5371.9      0.0                  if((node_a.type == Atom.BINDING_SITE) and (node_b.type == Atom.BINDING_SITE)):
351       164      3950487  24088.3      0.0                      self.add_affinityDistance(node_a, node_b, self.chemistry.affinity(node_a.data, node_b.data))     
352                                                   
353                                                   # a sends its table to all its immediate neighbours
354    737753   3549685140   4811.5      0.1          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
355    512120   2129343210   4157.9      0.1              node_b_changed = False
356                                               
357                                                       # b integrates a's distance table with its own
358    512120   2203821081   4303.3      0.1              node_b_chemical = node_b.chemical
359    512120   2409257898   4704.5      0.1              node_b_distances = node_b_chemical.node_distances[node_b]
360                                                       
361                                                       # For all b's routes (to c) that go to a first, update their distances
362  41756882 183992040153   4406.3      7.6              for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok)
363  41244762 172425596985   4180.5      7.1                  if(node_after_b == node_a):
364                                                               
365  16673654  64255631616   3853.7      2.7                      try:
366  16673654  88781802534   5324.7      3.7                          distance_b_a_c = neighbour_distance_b_a + self.node_distances[node_a][node_c][0]
367    187083    929898684   4970.5      0.0                      except KeyError:
368    187083   1056787479   5648.8      0.0                          distance_b_a_c = float('+inf')
369                                                                   
370  16673654  69374705256   4160.7      2.9                      if(distance_b_c != distance_b_a_c): # a's distance to c has changed
371    710083   3136751361   4417.4      0.1                          node_b_distances[node_c] = (distance_b_a_c, node_a)
372    710083   2848845276   4012.0      0.1                          node_b_changed = True
373                                                                   
374                                                                   ## Affinity distances update
375    710083   3484577241   4907.3      0.1                          if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
376     99592   1591029009  15975.5      0.1                              node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
377                                                                   
378                                                               # If distance got longer, then ask b's neighbours to update
379                                                               ## TODO: document this!
380  16673654  70998570837   4258.1      2.9                      if(distance_b_a_c > distance_b_c):
381                                                                   #for (node, neighbour_distance) in node_b_chemical.neighbours[node_b].iteritems():
382   1702852   7413182064   4353.4      0.3                          for node in node_b_chemical.neighbours[node_b]:
383   1204903   5912053272   4906.7      0.2                              node.chemical.nodes_changed.add(node)
384                                                       
385                                                       # Look for routes from a to c that are quicker than ones b knows already
386  42076729 184216680432   4378.1      7.6              for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems():
387                                                           
388  41564609 171150289218   4117.7      7.1                  node_b_update = False
389  41564609 172040284089   4139.1      7.1                  if(node_c == node_b): # a-b path
390    512120   2040112548   3983.7      0.1                      pass
391  41052489 169406668962   4126.6      7.0                  elif(node_after_a == node_b): # a-b-a-b path
392  16251407  63918804600   3933.1      2.6                      pass
393  24801082 101577038778   4095.7      4.2                  elif(node_c in node_b_distances): # b can already get to c
394  24004846 103404357180   4307.6      4.3                      (distance_b_c, node_after_b) = node_b_distances[node_c]
395  24004846 102717271836   4279.0      4.2                      if(node_after_b != node_a): # b doesn't already go to a first
396   7518275  31858204500   4237.4      1.3                          distance_b_a_c = neighbour_distance_b_a + distance_a_c
397   7518275  33470022717   4451.8      1.4                          if(distance_b_a_c < distance_b_c): # quicker to go via a
398    225357    956440656   4244.1      0.0                              node_b_update = True
399                                                           else: # b can't already get to c
400    796236   3415455549   4289.5      0.1                      distance_b_a_c = neighbour_distance_b_a + distance_a_c
401    796236   3412145520   4285.3      0.1                      if(distance_b_a_c < cutoff_distance): # not too for to go
402    593352   2514800052   4238.3      0.1                          node_b_update = True
403                                                                   
404                                                           ## Affinity distances update
405  41564609 164585250189   3959.7      6.8                  if node_b_update:
406    818709   3933555120   4804.6      0.2                      node_b_distances[node_c] = (distance_b_a_c, node_a)
407    818709   4151464335   5070.7      0.2                      if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
408    104293   1704446289  16342.9      0.1                          node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
409    818709   3557529531   4345.3      0.1                      node_b_changed = True
410                                                       
411                                                       # If any of node b's rows have exceeded the cutoff distance, then remove them
412  42350234 197075504439   4653.5      8.1              for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary
413  41838114 180297579789   4309.4      7.4                  if(distance_b_c > cutoff_distance):
414    206296    894881754   4337.9      0.0                      del node_b_distances[node_c]
415    206296    860508045   4171.2      0.0                      node_b_changed = True
416                                                               
417                                                               ## Affinity distances update
418    206296   4698692217  22776.5      0.2                      node_b_chemical.del_affinityDistance(node_b, node_c)
419                                                       
420                                                       # If we've modified node_b's distance table, tell its chemical to update accordingly
421    512120   2130466347   4160.1      0.1              if(node_b_changed):
422    217858   1201064454   5513.1      0.0                  node_b_chemical.nodes_changed.add(node_b)
423                                                   
424                                                   # Remove any neighbours that have infinite distance (have just unbound)
425                                                   ## TODO: not sure what difference it makes to do this here rather than above (after updating self.node_distances for neighbours)
426                                                   ##       but doing it above seems to break the walker's movement
427    737753   3830386968   5192.0      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].items(): # Can't use iteritems() here, as deleting from the dictionary
428    512120   2249770068   4393.1      0.1              if(neighbour_distance_b_a > cutoff_distance):
429       150       747747   4985.0      0.0                  del self.neighbours[node_a][node_b]
430                                                           
431                                                           ## Affinity distances update
432       150      2148813  14325.4      0.0                  self.del_affinityDistance(node_a, node_b)

我的代码解释：

这个函数维护了一个稀疏距离矩阵，表示（最短路径上的边权重之和）节点之间的网络距离在一个（非常大的）网络中。使用Floyd-Warshall算法处理完整表格会非常慢。（我先尝试了这个方法，但比当前版本慢了数个数量级。）因此，我的代码使用稀疏矩阵来表示完整距离矩阵的阈值版本（忽略距离大于200个单位的任何路径）。网络拓扑随时间变化，因此这个距离矩阵需要随时间更新。为了实现这一点，我使用了一个距离向量路由协议的简单实现：网络中的每个节点都知道到每个其他节点的距离和路径上的下一个节点。当发生拓扑变化时，与该变化相关联的节点相应地更新其距离表，并告诉它们的直接邻居。信息通过节点将它们的距离表发送给邻居而在网络中传播，邻居更新它们的距离表并将它们传播给它们的邻居。

有一个代表距离矩阵的对象：self.node_distances。这是一个将节点映射到路由表的字典。节点是我定义的一个对象。路由表是一个将节点映射到元组（距离，下一个节点）的字典。距离是从节点a到节点b的图距离，下一个节点是节点a和节点b之间路径上必须先到达的邻居节点。下一个节点为None表示节点a和节点b是图中的邻居。例如，距离矩阵的示例可能如下：

self.node_distances = { node_1 : { node_2 : (2.0, None),
                                   node_3 : (5.7, node_2),
                                   node_5 : (22.9, node_2) },
                        node_2 : { node_1 : (2.0, None),
                                   node_3 : (3.7, None),
                                   node_5 : (20.9, node_7)},
                        ...etc...

由于拓扑变化，原本相距较远的两个节点（或根本未连接）可能会靠近。当发生这种情况时，此矩阵中会添加条目。由于阈值，两个节点可能会相距太远而不再重要。当发生这种情况时，此矩阵中的条目将被删除。

self.neighbours矩阵类似于self.node_distances，但包含有关网络中直接链接（边缘）的信息。self.neighbours在该函数外部不断进行修改，由化学反应引起网络拓扑变化。

实际上我遇到问题的函数：propagate_distances_node()执行距离向量路由协议的一步。给定一个节点node_a，该函数确保node_a的邻居在距离矩阵中正确地表示（拓扑变化）。然后，该函数将node_a的路由表发送到其在网络中的所有直接邻居。它将node_a的路由表与每个邻居自己的路由表集成。

在我的程序的其余部分中，将重复调用propagate_distances_node()函数，直到距离矩阵收敛。维护一个节点集self.nodes_changed，其中包含自它们上次更新后已更改其路由表的节点。在算法的每次迭代中，会选择这些节点的随机子集，并对它们调用propagate_distances_node()。这意味着节点异步和随机地传播其路由表。当集合self.nodes_changed变为空时，此算法收敛于真实的距离矩阵。

"亲和距离"部分（add_affinityDistance和del_affinityDistance）是距离矩阵的一个（小）子矩阵的缓存，被程序的另一部分使用。

The reason I'm doing this is that I'm simulating computational analogues of chemicals participating in reactions, as part of my PhD. A "chemical" is a graph of "atoms" (nodes in the graph). Two chemicals binding together is simulated as their two graphs being joined by new edges. A chemical reaction happens (by a complicated process that isn't relevant here), changing the topology of the graph. But what happens in the reaction depends on how far apart the different atoms are that make up the chemicals. So for each atom in the simulation, I want to know which other atoms it is close to. A sparse, thresholded distance matrix is the most efficient way to store this information. Since the topology of the network changes as the reaction happens, I need to update the matrix. A 距离向量路由协议 is the fastest way I could come up with of doing this. I don't need a more compliacted routing protocol, because things like routing loops don't happen in my particular application (because of how my chemicals are structured). The reason I'm doing it stochastically is so that I can interleve the chemical reaction processes with the distance spreading, and simulate a chemical gradually changing shape over time as the reaction happens (rather than changing shape instantly).

The `self` in this function is an object representing a chemical. The nodes in `self.node_distances.keys()` are the atoms that make up the chemical. The nodes in `self.node_distances[node_x].keys()` are nodes from the chemical and potentially nodes from any chemicals that the chemical is bound to (and reacting with).

Update:

我尝试将每个node_x == node_y的实例替换为node_x is node_y（根据@Sven Marnach的评论），但是它减缓了速度！（我没有预料到这一点！）我的原始配置文件运行需要807.234秒，但是使用此修改后，时间增加到895.895秒。很抱歉，我对性能分析做错了！我正在逐行查看，这在我的代码中有太多差异（约90秒的差异都在噪声中）。正确进行性能分析时，is明显比==更快。使用CProfile，我的代码使用==需要34.394秒，但是使用is只需要33.535秒（我可以确认这已经超出了噪声范围）。 更新： 现有库

我不确定是否有现有的库可以满足我的要求，因为我的需求很特殊：我需要计算加权、无向图中所有节点对之间的最短路径长度。我只关心低于阈值的路径长度。计算路径长度后，我会对网络拓扑进行微小的更改（添加或删除边），然后我想重新计算路径长度。与阈值相比，我的图形巨大（从给定节点开始，大部分图形距离阈值更远），所以拓扑变化不会影响大部分最短路径长度。这就是我使用路由算法的原因：因为它通过图形结构传播拓扑变化信息，所以当其传播超出阈值时，我可以停止传播。也就是说，我不需要每次都重新计算所有路径。我可以使用先前的路径信息（在拓扑变化之前）来加速计算。这就是为什么我认为我的算法将比任何最短路径算法的库实现更快的原因。我从未见过路由算法在物理网络之外使用（但如果有人使用过，我会感兴趣）。

NetworkX被@Thomas K.建议使用。它有很多算法可以计算最短路径。它有一种算法可以计算带有截止点的所有对最短路径长度（这就是我想要的），但它只适用于无权图（我的是加权的）。不幸的是，它的加权图算法不允许使用截止点（这可能会使它们在我的图上运行缓慢）。而且它的算法似乎都不支持在非常相似的网络上使用预先计算的路径（即路由部分）。

igraph是我知道的另一个图形库，但是看了它的文档后，我没有找到关于最短路径的任何信息。但我可能错过了它 - 它的文档似乎不太全面。

NumPy也许是可能的，感谢@9000的评论。如果我给每个节点实例分配一个唯一的整数，我就可以将稀疏矩阵存储在NumPy数组中。然后，我可以使用整数而不是节点实例来索引NumPy数组。我还需要两个NumPy数组：一个用于距离，一个用于“next_node”引用。这可能比使用Python字典更快（我还不知道）。

还有其他有用的库吗？

更新：内存使用情况

我正在运行Windows（XP），因此这里有一些有关内存使用情况的信息，来自Process Explorer。CPU使用率为50％，因为我有一台双核机器。

global memory usage my program's memory usage

我的程序没有耗尽RAM并开始使用交换空间。从数字和IO图中可以看出这一点。IO图上的尖峰是程序打印到屏幕上以显示其运行情况的地方。

然而，我的程序随着时间的推移不断占用更多的RAM，这可能不是一个好现象（但总体上它没有占用太多的RAM，这就是为什么我直到现在才注意到增加）。

IO图上尖峰之间的距离随着时间的推移而增加。这是不好的——我的程序每100,000次迭代打印一次屏幕，这意味着随着时间的推移，每次迭代的执行时间都会变长……我通过运行程序并测量打印语句之间的时间（程序每10,000次迭代之间的时间）来确认这一点。这应该是恒定的，但是从图表中可以看出，它线性增加了……所以这里有些问题。（此图表的噪声是因为我的程序使用大量随机数，因此每次迭代的时间不同。）

lag between print statements increasing over time

我的程序运行了很长时间后，内存使用情况如下（因此肯定不会用完RAM）：

global memory usage - after a long run my program's memory usage - after a long run

- Adam Nellis

11

我希望所有问题都有这样的内容，让我感到难过的是我实际上无法帮助你。 - Leigh

7

关于您困惑的比较问题：运算符 == 实际上调用了被比较对象的方法 __eq__()。如果您只想知道对象标识，可以使用 is -- 这会更快。 - Sven Marnach

2

你是否经常重新创建存储在矩阵中的对象实例？如果不是，你可以为它们分配一系列整数，将它们全部放入带有相应索引的列表中，然后就能够将识别索引存储在NumPy数组中并将其用作索引。 - 9000

1

@Adam：我刚在不同版本的Python中进行了一些测试。我定义了几个类（其中没有一个实现了 __eq __()），并计算了像 a == b，a is b 和 a is a 的时间。对我而言，is 比 == 速度快大约25％。我无法想象任何理由会导致自定义类的实例反过来操作。这两个操作符之间唯一的区别是 == 首先在实例字典中查找 __eq __()，然后回退到与 is 执行相同的操作，而查找需要一些时间。 - Sven Marnach

1

在networkx.dijkstra_path()算法中，实际上有一个加权无向网络的截止参数。它还计算路径以及路径长度，因此可能比您需要的内存开销更大。修改该代码以不存储路径将非常简单。同时，networkx.dijkstra_predecessor_and_distance()也可能对您有趣，因为它跟踪了最短路径中的邻居（前任）。 - Aric

显示剩余11条评论

5个回答

6

你有考虑过使用 Pyrex / Cython 吗？

它可以自动将Python编译为C并转换为.pyd文件，因此可以在不需要太多工作的情况下大大加快速度。

- Macke

4

就性能而言，我认为你的代码没有问题（不考虑算法），只是遭受了大量迭代次数的影响。你的部分代码被执行了4000万次！

注意到80%的时间都花费在了20%的代码上——这13行代码被执行了2400万次以上。顺便提一下，这段代码很好地诠释了帕累托原理（或者说“20%的啤酒饮用者喝掉了80%的啤酒”）。

首要任务：你试过Psycho吗？它是一个JIT编译器，可以极大地加速你的代码——考虑到大量的迭代次数——比如说4-5倍的速度，你所需要做的就是（当然，在下载和安装之后）在开头插入这段代码：

import psyco
psyco.full()

这就是为什么我喜欢《Psycho》并在GCJ中使用它的原因，时间至关重要——没有需要编写的内容，也不会出错，只需添加两行代码即可获得突然的提升。

回到挑剔的问题上（例如将 == 替换为 is 等微小的更改，因为可以提高少量的执行时间）。这里有13行“有问题”的代码：

Line    #   Hits    Time    Per Hit % Time  Line Contents
412 42350234    197075504439    4653.5  8.1 for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary
386 42076729    184216680432    4378.1  7.6 for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems():
362 41756882    183992040153    4406.3  7.6 for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok)
413 41838114    180297579789    4309.4  7.4 if(distance_b_c > cutoff_distance):
363 41244762    172425596985    4180.5  7.1 if(node_after_b == node_a):
389 41564609    172040284089    4139.1  7.1 if(node_c == node_b): # a-b path
388 41564609    171150289218    4117.7  7.1 node_b_update = False
391 41052489    169406668962    4126.6  7   elif(node_after_a == node_b): # a-b-a-b path
405 41564609    164585250189    3959.7  6.8 if node_b_update:
394 24004846    103404357180    4307.6  4.3 (distance_b_c, node_after_b) = node_b_distances[node_c]
395 24004846    102717271836    4279    4.2 if(node_after_b != node_a): # b doesn't already go to a first
393 24801082    101577038778    4095.7  4.2 elif(node_c in node_b_distances): # b can already get to c

A) 除了你提到的代码行，我注意到 #388 在平凡时拥有相对较高的时间，它所做的只是 node_b_update = False。哦，等等——每次执行时，False 都会在全局作用域中查找！为了避免这种情况，在方法开始时分配 F, T = False, True 并将后续使用的 False 和 True 替换为局部变量 F 和 T。这应该会降低总体时间，尽管只有一点点（3%？）。

B) 我注意到 #389 中的条件仅出现了 512,120 次（基于 #390 的执行次数），而 #391 中的条件则出现了 16,251,407 次。由于没有依赖关系，反转这些检查的顺序是有意义的——因为早期的“削减”应该会给予一点提升（2%？）。我不确定完全避免使用 pass 语句是否有帮助，但如果它不影响可读性：

if (node_after_a is not node_b) and (node_c is not node_b):
   # neither a-b-a-b nor a-b path
   if (node_c in node_b_distances): # b can already get to c
       (distance_b_c, node_after_b) = node_b_distances[node_c]
       if (node_after_b is not node_a): # b doesn't already go to a first
           distance_b_a_c = neighbour_distance_b_a + distance_a_c
           if (distance_b_a_c < distance_b_c): # quicker to go via a
               node_b_update = T
   else: # b can't already get to c
       distance_b_a_c = neighbour_distance_b_a + distance_a_c
       if (distance_b_a_c < cutoff_distance): # not too for to go
           node_b_update = T

C) 我刚注意到你在一个案例中（＃365-367），使用了try-except，而你只需要从字典中获取默认值 - 可以尝试使用.get(key, defaultValue)或创建带有collections.defaultdict(itertools.repeat(float('+inf')))的字典。使用try-except是有代价的 - 见＃365报告3.5％的时间，那就是设置堆栈框架等。

D) 尽可能避免索引访问（无论是用 obj.field 还是 obj[idx]）。例如，我发现你在多个地方（＃336、339、346、366、386）使用self.node_distances[node_a]，这意味着每次使用都要使用两次索引（一次用于.，一次用于[]） - 当执行数千万次时，这会变得很昂贵。在方法开始处似乎可以这样做：node_a_distances = self.node_distances[node_a]，然后进一步使用它。

- Nas Banov

是的，我尝试过Psyco。它没有帮助我的代码，但感谢你提到它。我正在运行2.7版本，因为它有一个更好的哈希函数，用于在字典中存储对象实例（详见我的先前问题）。 (A) 这真的很有趣 - 有些在某些语言中超级高效的东西（分配文字）在其他语言中变得很慢。 (B) 这是好东西。我进一步采纳了你的建议（请参见我的答案 - 我本来想将其发布为更新，但SO不允许我这样做！）。 - Adam Nellis

@Adam Nellis：啊，但是事实证明True和False并不是字面常量！ :) 我必须进行一些测试来说服自己，比如 False, True = True, False 和 def f(): return False 然后 print f(); False = 7; print f(); del False; print f()。 - Nas Banov

@Adam：我很惊讶你没有从使用psyco中获得任何显著的速度提升。也许psyco和@profile不兼容。请参见我上面添加的项目（c）和（d）。 - Nas Banov

4

这需要相当多的工作，但是你可以考虑在GPU上使用Floyd-Warshall算法。已经有很多关于如何让Floyd-Warshall在GPU上高效运行的工作了。快速的谷歌搜索结果如下： http://cvit.iiit.ac.in/papers/Pawan07accelerating.pdf http://my.safaribooksonline.com/book/programming/graphics/9780321545411/gpu-computing-for-protein-structure-prediction/ch43lev1sec2#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODAzMjE1NDU0MTEvNDg3 http://www.gpucomputing.net/?q=node/1203 http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter43.html 虽然在Python中实现的Floyd-Warshall比较慢，但是在强大的GPU上运行良好的版本可能仍然显著优于你的新Python代码。

这里有一个轶事。我有一段简短、简单、计算密集型的代码，它做了类似于霍夫累积的事情。在Python中，尽我所能地优化它，它需要约7秒才能在快速的i7上运行。然后我编写了一个完全没有优化过的GPU版本；它在Nvidia GTX 480上只需要约0.002秒。你可能会有不同的经验，但对于任何明显并行的任务，GPU很可能是长期的赢家，并且由于它是一个经过充分研究的算法，你应该能够利用现有的高度调试的代码。

对于Python/GPU桥接，我建议使用PyCUDA或PyOpenCL。

- Josh Bleecher Snyder

1

我本来想把这个作为我的问题的更新发布，但是 Stack Overflow 只允许在问题中使用 30000 个字符，所以我把它作为答案发布。

更新：到目前为止，我最好的优化方案

我采纳了大家的建议，现在我的代码运行速度比之前快了约 21%，这很不错 - 谢谢大家！

这是我迄今为止做到的最好的优化。我用 is 替换了所有的 == 测试节点，禁用了垃圾回收，并按照 @Nas Banov 的建议重新编写了第 388 行的大型 if 语句部分。我添加了众所周知的 try/except 技巧来避免测试（第 390 行 - 删除测试 node_c in node_b_distances），这非常有帮助，因为它几乎从不抛出异常。我尝试交换第 391 和第 392 行，并将 node_b_distances[node_c] 分配给一个变量，但这种方式是最快的。

然而，我仍然没有找到内存泄漏的原因（请参见我的问题中的图表）。但我认为这可能是在我的代码的另一个部分（我没有在这里发布）中。如果我能够修复内存泄漏，那么这个程序将运行得足够快，让我可以使用它 :)

Timer unit: 3.33366e-10 s
File: routing_distances.py
Function: propagate_distances_node at line 328
Total time: 760.74 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
328                                               @profile
329                                               def propagate_distances_node(self, node_a, cutoff_distance=200):
330                                                       
331                                                   # a makes sure its immediate neighbours are correctly in its distance table
332                                                   # because its immediate neighbours may change as binds/folding change
333    791349   4158169713   5254.5      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
334    550522   2331886050   4235.8      0.1              use_neighbour_link = False
335                                                       
336    550522   2935995237   5333.1      0.1              if(node_b not in self.node_distances[node_a]): # a doesn't know distance to b
337     15931     68829156   4320.5      0.0                  use_neighbour_link = True
338                                                       else: # a does know distance to b
339    534591   2728134153   5103.2      0.1                  (node_distance_b_a, next_node) = self.node_distances[node_a][node_b]
340    534591   2376374859   4445.2      0.1                  if(node_distance_b_a > neighbour_distance_b_a): # neighbour distance is shorter
341        78       347355   4453.3      0.0                      use_neighbour_link = True
342    534513   3145889079   5885.5      0.1                  elif((None is next_node) and (float('+inf') == neighbour_distance_b_a)): # direct route that has just broken
343        74       327600   4427.0      0.0                      use_neighbour_link = True
344                                                               
345    550522   2414669022   4386.1      0.1              if(use_neighbour_link):
346     16083     81850626   5089.3      0.0                  self.node_distances[node_a][node_b] = (neighbour_distance_b_a, None)
347     16083     87064200   5413.4      0.0                  self.nodes_changed.add(node_a)
348                                                           
349                                                           ## Affinity distances update
350     16083     86580603   5383.4      0.0                  if((node_a.type == Atom.BINDING_SITE) and (node_b.type == Atom.BINDING_SITE)):
351       234      6656868  28448.2      0.0                      self.add_affinityDistance(node_a, node_b, self.chemistry.affinity(node_a.data, node_b.data))     
352                                                   
353                                                   # a sends its table to all its immediate neighbours
354    791349   4034651958   5098.4      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
355    550522   2392248546   4345.4      0.1              node_b_changed = False
356                                               
357                                                       # b integrates a's distance table with its own
358    550522   2520330696   4578.1      0.1              node_b_chemical = node_b.chemical
359    550522   2734341975   4966.8      0.1              node_b_distances = node_b_chemical.node_distances[node_b]
360                                                       
361                                                       # For all b's routes (to c) that go to a first, update their distances
362  46679347 222161837193   4759.3      9.7              for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok)
363  46128825 211963639122   4595.0      9.3                  if(node_after_b is node_a):
364                                                               
365  18677439  79225517916   4241.8      3.5                      try:
366  18677439 101527287264   5435.8      4.4                          distance_b_a_c = neighbour_distance_b_a + self.node_distances[node_a][node_c][0]
367    181510    985441680   5429.1      0.0                      except KeyError:
368    181510   1166118921   6424.5      0.1                          distance_b_a_c = float('+inf')
369                                                                   
370  18677439  89626381965   4798.6      3.9                      if(distance_b_c != distance_b_a_c): # a's distance to c has changed
371    692131   3352970709   4844.4      0.1                          node_b_distances[node_c] = (distance_b_a_c, node_a)
372    692131   3066946866   4431.2      0.1                          node_b_changed = True
373                                                                   
374                                                                   ## Affinity distances update
375    692131   3808548270   5502.6      0.2                          if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
376     96794   1655818011  17106.6      0.1                              node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
377                                                                   
378                                                               # If distance got longer, then ask b's neighbours to update
379                                                               ## TODO: document this!
380  18677439  88838493705   4756.5      3.9                      if(distance_b_a_c > distance_b_c):
381                                                                   #for (node, neighbour_distance) in node_b_chemical.neighbours[node_b].iteritems():
382   1656796   7949850642   4798.3      0.3                          for node in node_b_chemical.neighbours[node_b]:
383   1172486   6307264854   5379.4      0.3                              node.chemical.nodes_changed.add(node)
384                                                       
385                                                       # Look for routes from a to c that are quicker than ones b knows already
386  46999631 227198060532   4834.0     10.0              for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems():
387                                                           
388  46449109 218024862372   4693.8      9.6                  if((node_after_a is not node_b) and # not a-b-a-b path
389  28049321 126269403795   4501.7      5.5                     (node_c is not node_b)):         # not a-b path
390  27768341 121588366824   4378.7      5.3                      try: # Assume node_c in node_b_distances ('try' block will raise KeyError if not)
391  27768341 159413637753   5740.8      7.0                          if((node_b_distances[node_c][1] is not node_a) and # b doesn't already go to a first
392   8462467  51890478453   6131.8      2.3                             ((neighbour_distance_b_a + distance_a_c) < node_b_distances[node_c][0])):
393                                                               
394                                                                       # Found a route
395    224593   1168129548   5201.1      0.1                              node_b_distances[node_c] = (neighbour_distance_b_a + distance_a_c, node_a)
396                                                                       ## Affinity distances update
397    224593   1274631354   5675.3      0.1                              if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
398     32108    551523249  17177.1      0.0                                  node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
399    224593   1165878108   5191.1      0.1                              node_b_changed = True
400                                                                       
401    809945   4449080808   5493.1      0.2                      except KeyError:
402                                                                   # b can't already get to c (node_c not in node_b_distances)
403    809945   4208032422   5195.5      0.2                          if((neighbour_distance_b_a + distance_a_c) < cutoff_distance): # not too for to go
404                                                                       
405                                                                       # These lines of code copied, for efficiency 
406                                                                       #  (most of the time, the 'try' block succeeds, so don't bother testing for (node_c in node_b_distances))
407                                                                       # Found a route
408    587726   3162939543   5381.7      0.1                              node_b_distances[node_c] = (neighbour_distance_b_a + distance_a_c, node_a)
409                                                                       ## Affinity distances update
410    587726   3363869061   5723.5      0.1                              if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
411     71659   1258910784  17568.1      0.1                                  node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
412    587726   2706161481   4604.5      0.1                              node_b_changed = True
413                                                                   
414                                                               
415                                                       
416                                                       # If any of node b's rows have exceeded the cutoff distance, then remove them
417  47267073 239847142446   5074.3     10.5              for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary
418  46716551 242694352980   5195.0     10.6                  if(distance_b_c > cutoff_distance):
419    200755    967443975   4819.0      0.0                      del node_b_distances[node_c]
420    200755    930470616   4634.9      0.0                      node_b_changed = True
421                                                               
422                                                               ## Affinity distances update
423    200755   4717125063  23496.9      0.2                      node_b_chemical.del_affinityDistance(node_b, node_c)
424                                                       
425                                                       # If we've modified node_b's distance table, tell its chemical to update accordingly
426    550522   2684634615   4876.5      0.1              if(node_b_changed):
427    235034   1383213780   5885.2      0.1                  node_b_chemical.nodes_changed.add(node_b)
428                                                   
429                                                   # Remove any neighbours that have infinite distance (have just unbound)
430                                                   ## TODO: not sure what difference it makes to do this here rather than above (after updating self.node_distances for neighbours)
431                                                   ##       but doing it above seems to break the walker's movement
432    791349   4367879451   5519.5      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].items(): # Can't use iteritems() here, as deleting from the dictionary
433    550522   2968919613   5392.9      0.1              if(neighbour_distance_b_a > cutoff_distance):
434       148       775638   5240.8      0.0                  del self.neighbours[node_a][node_b]
435                                                           
436                                                           ## Affinity distances update
437       148      2096343  14164.5      0.0                  self.del_affinityDistance(node_a, node_b)

- Adam Nellis

你尝试过Python 3.2吗？我测试了通过字典迭代，使用3.2版本的 "for ... in x.items()" 比 "for ... in x.iteritems()" 快了约30%，比 "for ... x.items()" 更快。 - casevh

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Paulo Scardine · Accepted Answer

node_after_b == node_a会尝试调用node_after_b.__eq__(node_a)：

>>> class B(object):
...     def __eq__(self, other):
...         print "B.__eq__()"
...         return False
... 
>>> class A(object):
...     def __eq__(self, other):
...         print "A.__eq__()"
...         return False
... 
>>> a = A()
>>> b = B()
>>> a == b
A.__eq__()
False
>>> b == a
B.__eq__()
False
>>>

在转向C之前，尝试使用优化版本覆盖Node.__eq__()。

更新

我进行了这个小实验（Python 2.6.6）：

#!/usr/bin/env python
# test.py
class A(object):
    def __init__(self, id):
        self.id = id

class B(A):
    def __eq__(self, other):
        return self.id == other.id

@profile
def main():
    list_a = []
    list_b = []
    for x in range(100000):
        list_a.append(A(x))
        list_b.append(B(x))

    ob_a = A(1)
    ob_b = B(1)
    for ob in list_a:
        if ob == ob_a:
            x = True
        if ob is ob_a:
            x = True
        if ob.id == ob_a.id:
            x = True
        if ob.id == 1:
            x = True
    for ob in list_b:
        if ob == ob_b:
            x = True
        if ob is ob_b:
            x = True
        if ob.id == ob_b.id:
            x = True
        if ob.id == 1:
            x = True

if __name__ == '__main__':
    main()

结果：

Timer unit: 1e-06 s

File: test.py Function: main at line 10 Total time: 5.52964 s

Line #      Hits         Time  Per Hit % Time  Line Contents
==============================================================
    10                                           @profile
    11                                           def main():
    12         1            5      5.0      0.0      list_a = []
    13         1            3      3.0      0.0      list_b = []
    14    100001       360677      3.6      6.5      for x in range(100000):
    15    100000       763593      7.6     13.8          list_a.append(A(x))
    16    100000       924822      9.2     16.7          list_b.append(B(x))
    17
    18         1           14     14.0      0.0      ob_a = A(1)
    19         1            5      5.0      0.0      ob_b = B(1)
    20    100001       500454      5.0      9.1      for ob in list_a:
    21    100000       267252      2.7      4.8          if ob == ob_a:
    22                                                       x = True
    23    100000       259075      2.6      4.7          if ob is ob_a:
    24                                                       x = True
    25    100000       539683      5.4      9.8          if ob.id == ob_a.id:
    26         1            3      3.0      0.0              x = True
    27    100000       271519      2.7      4.9          if ob.id == 1:
    28         1            3      3.0      0.0              x = True
    29    100001       296736      3.0      5.4      for ob in list_b:
    30    100000       472204      4.7      8.5          if ob == ob_b:
    31         1            4      4.0      0.0              x = True
    32    100000       283165      2.8      5.1          if ob is ob_b:
    33                                                       x = True
    34    100000       298839      3.0      5.4          if ob.id == ob_b.id:
    35         1            3      3.0      0.0              x = True
    36    100000       291576      2.9      5.3          if ob.id == 1:
    37         1            3      3.0      0.0              x = True

我很惊讶：

点运算符（ob.property）似乎非常昂贵（第25行与第27行相比）。
对于简单对象，is 和 '==' 没有太大差别。

然后我尝试使用更复杂的对象，结果与第一个实验一致。

你是否频繁进行交换操作？如果你的数据集太大而无法适应可用的RAM，我猜你可能会遇到某种与虚拟内存获取相关的I/O争用。

你正在运行Linux吗？如果是这样，请在运行程序时发布您机器上的vmstat。发送类似以下内容的输出：

vmstat 10 100

祝你好运！

更新（来自OP的评论）：

我建议尝试使用sys.setcheckinterval并启用/禁用GC。其原理是对于这种特殊情况（大量实例），默认的GC引用计数检查有些昂贵，并且其默认间隔太长。

是的，我之前尝试过 sys.setcheckinterval。我将它从默认值100改为了 1000，但没有任何明显的差异。禁用垃圾回收有所帮助 - 谢谢。这是目前为止最大的加速 - 节省了约20%的时间（整个运行需要171分钟，减少到135分钟）- 我不确定误差范围是多少，但肯定是统计学上显著的增加。- Adam Nellis 2月9日15:10

我的猜测：

我认为Python的GC是基于引用计数的。它会时不时地检查每个实例的引用计数；由于你正在遍历这些巨大的内存结构，在你的特定情况下，GC的默认频率（1000个周期？）太低了 - 是一种巨大的浪费。- Yours Truly 2月10日2:06