如何为Python程序分配更多内存?在4GB内存上,它没有使用超过64MB。

4
我有一份使用Python编写的程序,在4GB RAM 32位12.04 Ubuntu上对输入数据进行处理。该程序的时间和空间复杂性均为O(n)。当输入数据大小约为100 kb时,它可以在大约4秒内完成执行,使用峰值RAM消耗率为0.5%(使用LINUX中的“top”命令)。然而,当我尝试输入大小为500 kB、2.5 MB和16 MB的数据时,进程未能在1小时内完成(在每种情况下,我都不得不使用Ctrl+C取消),而内存消耗量被卡在1.6%(即每种情况下约为64MB)。我能否以某种方式为该Python进程分配更多RAM内存?
注意:我正在使用Python的“mrjob”库实现Map Reduce作业。
以下是成功执行输入csv文件为100 kb时的日志。
   ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py as.txt > asop.txtusing configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000
Counters from step 2:
  (no counters found)
Moving /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/step-1-mapper_part-00000 -> /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output/part-00000
Streaming final output from /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269/output
removing tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.094809.251269

这是输入csv文件大小为2.5 MB时的执行日志和回溯。

ankit@ubuntu:~/mrj/mrjo/mrjob/examples$ python mt1.py matlabsample.csv > matsamop.txt
using configs in /home/ankit/.mrjob.conf
creating tmp directory /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000
> /usr/bin/python mt1.py --step-num=0 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00001
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
Counters from step 1:
  (no counters found)
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper-sorted
> sort /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00000 /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-mapper_part-00001
> /usr/bin/python mt1.py --step-num=0 --reducer /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-0-reducer_part-00000
Counters from step 1:
  (no counters found)
> /usr/bin/python mt1.py --step-num=1 --mapper /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/input_part-00000
writing to /home/ankit/mrj/mrjo/examples/mt1.ankit.20121224.065246.700221/step-1-mapper_part-00000
^CTraceback (most recent call last):


  File "mt1.py", line 311, in <module>
    Motion_Tagging.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 545, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 561, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/job.py", line 631, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/runner.py", line 490, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 193, in _run
    combiner_args=combiner_args)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 488, in _invoke_step
    self._wait_for_process(proc_dict, step_num)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 657, in _wait_for_process
    tb_lines = find_python_traceback(stderr_lines)
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/parse.py", line 171, in find_python_traceback
    for line in lines:
  File "/usr/local/lib/python2.7/dist-packages/mrjob-0.3.5-py2.7.egg/mrjob/local.py", line 680, in _process_stderr_from_script
    for line in stderr:
KeyboardInterrupt

2
我怀疑你的代码存在内存问题;64MB根本不算什么内存使用量。 - Martijn Pieters
2
我认为你的RAM分配没有问题 - 通常,解释器只会占用它所需的内存。我认为你的程序并不是以O(n)的时间复杂度运行。如果你发布你的代码,我们可以看一下。 - inspectorG4dget
@inspectorG4dget:感谢您的回复。不幸的是,由于某些原因,我无法发布代码,但我向您保证,它的时间和空间复杂度都是O(n)。是的,这是一个Map Reduce作业。 - user1403483
@Martijn Pieters:当我在顶级进程中观察到它时,这正是我所想的。 - user1403483
@AnkitAgrawal:显然,某个地方它没有工作。添加记录输出以验证它仍在工作。当你按 CTRL-C 时得到的回溯也应该提供关于它正在做什么的线索。 - Martijn Pieters
4
如果你不能分享代码,那么这个问题对我们来说就太局限了,我们无法帮助你,而且也永远不会对他人有所帮助。出于这些原因,我已投票关闭此问题。 - Martijn Pieters
2个回答

1
你不能“为Python进程分配内存”,而是在Python程序中使用更大的结构。从根本上讲,你的算法可能存在缺陷,无法充分利用可用的内存。

2
算法不需要有缺陷,只因为它没有更多的可用内存。更可能的是算法进入了一个无限循环或其他情况,它可能不需要任何更多的内存。 - phant0m
@Ignacio:我编写的Python脚本以csv文件作为输入,将每个交易/元组存储为主列表中的列表。脚本的其余部分涉及对列表中的元素执行计算。也就是说,我只处理列表数据结构。您能否详细说明一下使用更大的结构是什么意思? - user1403483

0

请注意,这不是代码级别的解决方案。 但您可以通过下面的链接了解Python内存实现的深入思考以及如何解决问题。它还讨论了Python内存管理可以改进的其他领域。希望这将有用。

http://www.evanjones.ca/memoryallocator/


1
不,内存使用量在这里是一个误导。原帖作者并没有遭受到未释放内存的问题。 - Martijn Pieters

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接