注意:此问题最初是在这里提出的,但赏金时间已过期,即使没有找到可接受的答案。我重新提出这个问题,包括原始问题中提供的所有细节。
一个Python脚本正在使用sched模块每60秒运行一组类函数:
# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))
脚本正在使用代码here作为守护进程运行。
一些被称为doChecks的类方法使用subprocess模块调用系统函数以获取系统统计信息:
ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]
在一段时间内,这个程序可以正常运行,但最终会崩溃,并显示以下错误信息:
File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory
脚本崩溃后服务器上的 free -m 命令输出为:
$ free -m
total used free shared buffers cached
Mem: 894 345 549 0 0 0
-/+ buffers/cache: 345 549
Swap: 0 0 0
这个服务器正在运行CentOS 5.3。我自己的CentOS盒子上无法复现,也没有其他用户报告同样的问题。
我尝试了原始问题中建议的许多调试方法:
在调用Popen之前,记录free -m的输出。内存使用量没有显著变化,即脚本运行时内存不会逐渐被使用完。
我在Popen调用中添加了close_fds=True,但这并没有改变什么——脚本仍然因为同样的错误而崩溃。建议参见此处和此处。
我检查了rlimits,发现RLIMIT_DATA和RLIMIT_AS都显示为(-1,-1),如此处所建议。
一篇文章认为没有交换空间可能是原因,但实际上根据Web主机的说法,交换空间是按需提供的,并且此处也认为这是一个虚假的原因。
进程被关闭是因为使用.communicate()的行为,这得到了Python源代码和注释的支持此处。
整个检查过程可以在GitHub这里找到,其中getProcesses函数定义在442行。该函数由doChecks()在520行开始调用。
在崩溃之前,使用strace运行脚本并输出如下:
recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234
gettimeofday({1250893252, 887805}, NULL) = 0
write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
gettimeofday({1250893252, 888362}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
gettimeofday({1250893252, 888897}, NULL) = 0
write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
gettimeofday({1250893252, 889184}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
close(4) = 0
gettimeofday({1250893252, 889591}, NULL) = 0
write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
pipe([4, 5]) = 0
pipe([6, 7]) = 0
fcntl64(7, F_GETFD) = 0
fcntl64(7, F_SETFD, FD_CLOEXEC) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
write(2, "Traceback (most recent call last"..., 35) = 35
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/agent."..., 52) = 52
open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/home/admin/sd-agent/dae"..., 60) = 60
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/agent."..., 54) = 54
open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/sched"..., 55) = 55
fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054
write(2, " ", 4) = 4
write(2, "void = action(*argument)\n", 25) = 25
close(8) = 0
munmap(0xb7d28000, 4096) = 0
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/checks"..., 60) = 60
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, " File \"/usr/bin/sd-agent/checks"..., 64) = 64
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/subpr"..., 65) = 65
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n # c2pread <-"..., 4096) = 4096
write(2, " ", 4) = 4
write(2, "errread, errwrite)\n", 19) = 19
close(8) = 0
munmap(0xb7d28000, 4096) = 0
open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
write(2, " File \"/usr/lib/python2.4/subpr"..., 71) = 71
fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
read(8, "lso, the newlines attribute of t"..., 4096) = 4096
read(8, "code < 0:\n print >>sys.st"..., 4096) = 4096
read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
read(8, " p2cread\n # c2pread <-"..., 4096) = 4096
read(8, "table(self, handle):\n "..., 4096) = 4096
read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
read(8, " p2cwrite = None, None\n "..., 4096) = 4096
write(2, " ", 4) = 4
write(2, "self.pid = os.fork()\n", 21) = 21
close(8) = 0
munmap(0xb7d28000, 4096) = 0
write(2, "OSError", 7) = 7
write(2, ": ", 2) = 2
write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
write(2, "\n", 1) = 1
unlink("/var/run/sd-agent.pid") = 0
close(3) = 0
munmap(0xb7e0d000, 4096) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
brk(0xa022000) = 0xa022000
exit_group(1) = ?
/var/log/messages
或dmesg
命令。 - mark4ofork
一个大型进程如python
,因为它“承诺”会紧随fork
之后立即执行一个更小的进程),或者(3)减少脚本的内存占用,因此fork
就不必代表您请求那么多的内存(追踪内存泄漏和膨胀等),或者**(4)生成辅助进程,使用posix_spawn
/vfork
等**。 - vladr