如何使用 `setrlimit` 限制内存使用？RLIMIT_AS 太快杀死进程；RLIMIT_DATA、RLIMIT_RSS 和 RLIMIT_STACK 没有杀死进程。

Question

如何使用 `setrlimit` 限制内存使用？RLIMIT_AS 太快杀死进程；RLIMIT_DATA、RLIMIT_RSS 和 RLIMIT_STACK 没有杀死进程。

11

我正在尝试在Linux系统上使用setrlimit限制内存使用，以防止我的进程崩溃机器（我的代码导致高性能集群上的节点崩溃，因为一个错误导致内存消耗超过100 GiB）。我似乎找不到正确的资源来传递给setrlimit；我认为应该是resident，这个不能用setrlimit限制, 但我对resident、heap、stack感到困惑。在下面的代码中，如果我只取消注释RLIMIT_AS，那么代码将在numpy.ones(shape=(1000, 1000, 10), dtype="f8")处失败并出现MemoryError，即使该数组只有80 MB。如果我只取消注释RLIMIT_DATA、RLIMIT_RSS或RLIMIT_STACK，那么两个数组都会成功分配，即使总内存使用量为2 GB，是期望最大值的两倍。

我希望我的程序在尝试分配过多内存时立即崩溃（无论如何）。为什么RLIMIT_DATA、RLIMIT_RSS、RLIMIT_STACK和RLIMIT_AS都不能像我想的那样运作呢？正确的资源是什么，应该传递给setrlimit？

$ cat mwe.py 
#!/usr/bin/env python3.5

import resource
import numpy

#rsrc = resource.RLIMIT_AS
#rsrc = resource.RLIMIT_DATA
#rsrc = resource.RLIMIT_RSS
#rsrc = resource.RLIMIT_STACK

soft, hard = resource.getrlimit(rsrc)
print("Limit starts as:", soft, hard)

resource.setrlimit(rsrc, (1e9, 1e9))

soft, hard = resource.getrlimit(rsrc)
print("Limit is now:", soft, hard)
print("Allocating 80 KB, should certainly work")
M1 = numpy.arange(100*100, dtype="u8")

print("Allocating 80 MB, should work")
M2 = numpy.arange(1000*1000*10, dtype="u8")

print("Allocating 2 GB, should fail")
M3 = numpy.arange(1000*1000*250, dtype="u8")

input("Still here…")

取消注释RLIMIT_AS行后的输出：

$ ./mwe.py 
Limit starts as: -1 -1
Limit is now: 1000000000 -1
Allocating 80 KB, should certainly work
Allocating 80 MB, should work
Traceback (most recent call last):
  File "./mwe.py", line 22, in <module>
    M2 = numpy.arange(1000*1000*10, dtype="u8")
MemoryError

如果取消其他任何一个的注释并运行，则会输出：

$ ./mwe.py 
Limit starts as: -1 -1
Limit is now: 1000000000 -1
Allocating 80 KB, should certainly work
Allocating 80 MB, should work
Allocating 2 GB, should fail
Still here…

在最后一行，top 报告说我的进程正在使用 379 GB 的 VIRT，2.0 GB 的 RES。

系统详情：

$ uname -a
Linux host.somewhere.ac.uk 2.6.32-573.3.1.el6.x86_64 #1 SMP Mon Aug 10 09:44:54 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 6.7 (Santiago)

$ free -h
             total       used       free     shared    buffers     cached
Mem:          2.0T       1.9T        37G       1.6G       3.4G       1.8T
-/+ buffers/cache:        88G       1.9T 
Swap:         464G       4.8M       464G 

$ python3.5 --version
Python 3.5.0

$ python3.5 -c "import numpy; print(numpy.__version__)"
1.11.1

- gerrit

可能是在Python脚本中设置堆栈大小的重复问题。 - jww

1

尝试在Stack Clash修复之后设置rlimit_stack可能会导致失败或相关问题。还请参阅Red Hat Issue 1463241。 - jww

我也运行了你的脚本，它正常工作。问题一定在你的个人配置中。 - user1038445

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Ilia Barahovsky · Accepted Answer

抱歉，我无法回答你的问题。但我希望以下内容对你有所帮助：

Your script works as expected on my system. Please share exact spec for yours, might be there is a known problem with Linux distro, kernel or even numpy...
You should be OK with RLIMIT_AS. As explained here this should limit the entire virtual memory used by the process. And virtual memory includes all: swap memory, shared libraries, code and data. More details here.

You may add the following function (adopted from this answer) to your script to check actual virtual memory usage at any point:

def peak_virtual_memory_mb():
    with open('/proc/self/status') as f:
        status = f.readlines()
        vmpeak = next(s for s in status if s.startswith("VmPeak:"))
        return vmpeak

A general advice, disable swap memory. In my experience with high performance servers it does more harm than solves problems.