resource
模块 来限制你的进程及其子进程可用的资源。resource.RLIMIT_AS
(或 RLIMIT_DATA
,RLIMIT_STACK
),例如,使用上下文管理器来自动将其恢复为先前的值:import contextlib
import resource
@contextlib.contextmanager
def limit(limit, type=resource.RLIMIT_AS):
soft_limit, hard_limit = resource.getrlimit(type)
resource.setrlimit(type, (limit, hard_limit)) # set soft limit
try:
yield
finally:
resource.setrlimit(type, (soft_limit, hard_limit)) # restore
with limit(1 << 30): # 1GB
# do the thing that might try to consume all memory
如果达到了限制,将会引发MemoryError
错误。#!/usr/bin/python
import sys
import zlib
f = open(sys.argv[1], "rb")
z = zlib.decompressobj(15+16)
total = 0
while True:
buf = z.unconsumed_tail
if buf == "":
buf = f.read(1024)
if buf == "":
break
got = z.decompress(buf, 4096)
if got == "":
break
total += len(got)
print total
if z.unused_data != "" or f.read(1024) != "":
print "warning: more input after end of gzip stream"
它将略微高估当解压缩tar文件中的所有文件所需的空间。长度包括这些文件以及tar目录信息。
gzip.py代码不控制解压缩的数据量,除了由于输入数据的大小。在gzip.py中,它每次读取1024个压缩字节。因此,如果您对未压缩数据使用高达1056768字节的内存使用量(1032 * 1024,其中1032:1是deflate的最大压缩比),则可以使用gzip.py。此处的解决方案使用具有第二个参数的zlib.decompress
,该参数限制未压缩数据的数量。gzip.py没有这个功能。
这将通过解码tar格式准确确定提取的tar条目的总大小:
#!/usr/bin/python
import sys
import zlib
def decompn(f, z, n):
"""Return n uncompressed bytes, or fewer if at the end of the compressed
stream. This only decompresses as much as necessary, in order to
avoid excessive memory usage for highly compressed input.
"""
blk = ""
while len(blk) < n:
buf = z.unconsumed_tail
if buf == "":
buf = f.read(1024)
got = z.decompress(buf, n - len(blk))
blk += got
if got == "":
break
return blk
f = open(sys.argv[1], "rb")
z = zlib.decompressobj(15+16)
total = 0
left = 0
while True:
blk = decompn(f, z, 512)
if len(blk) < 512:
break
if left == 0:
if blk == "\0"*512:
continue
if blk[156] in ["1", "2", "3", "4", "5", "6"]:
continue
if blk[124] == 0x80:
size = 0
for i in range(125, 136):
size <<= 8
size += blk[i]
else:
size = int(blk[124:136].split()[0].split("\0")[0], 8)
if blk[156] not in ["x", "g", "X", "L", "K"]:
total += size
left = (size + 511) // 512
else:
left -= 1
print total
if blk != "":
print "warning: partial final block"
if left != 0:
print "warning: tar file ended in the middle of an entry"
if z.unused_data != "" or f.read(1024) != "":
print "warning: more input after end of gzip stream"
bz2.decompress
的输出,即使没有第二个参数也可以完成。在Python接口中缺少限制解压缩输出大小的功能是一种根本性缺陷。.tar.gz
文件后,结果是一个大小为10MB的文件。在设置了ulimit -v 200000
的情况下运行您的代码会失败,因此它使用的远远超过了10MB的输入,因此容易受到zipbombs攻击。 - Joachim Breitnerz.decompress
是安全的,但对于bzip2来说就不行了(除非我漏掉了什么),并且由于bzip库API的缺陷,很难被轻松采用。此外,它似乎比我的解决方案中的代码更复杂和容易出错。 - Joachim Breitnerclass SafeUncompressor(object):
"""Small proxy class that enables external file object
support for uncompressed, bzip2 and gzip files. Works transparently, and
supports a maximum size to avoid zipbombs.
"""
blocksize = 16 * 1024
class FileTooLarge(Exception):
pass
def __init__(self, fileobj, maxsize=10*1024*1024):
self.fileobj = fileobj
self.name = getattr(self.fileobj, "name", None)
self.maxsize = maxsize
self.init()
def init(self):
import bz2
import gzip
self.pos = 0
self.fileobj.seek(0)
self.buf = ""
self.format = "plain"
magic = self.fileobj.read(2)
if magic == '\037\213':
self.format = "gzip"
self.gzipobj = gzip.GzipFile(fileobj = self.fileobj, mode = 'r')
elif magic == 'BZ':
raise IOError, "bzip2 support in SafeUncompressor disabled, as self.bz2obj.decompress is not safe"
self.format = "bz2"
self.bz2obj = bz2.BZ2Decompressor()
self.fileobj.seek(0)
def read(self, size):
b = [self.buf]
x = len(self.buf)
while x < size:
if self.format == 'gzip':
data = self.gzipobj.read(self.blocksize)
if not data:
break
elif self.format == 'bz2':
raw = self.fileobj.read(self.blocksize)
if not raw:
break
# this can already bomb here, to some extend.
# so disable bzip support until resolved.
# Also monitor https://dev59.com/QmYr5IYBdhLWcg3wg6b9 for ideas
data = self.bz2obj.decompress(raw)
else:
data = self.fileobj.read(self.blocksize)
if not data:
break
b.append(data)
x += len(data)
if self.pos + x > self.maxsize:
self.buf = ""
self.pos = 0
raise SafeUncompressor.FileTooLarge, "Compressed file too large"
self.buf = "".join(b)
buf = self.buf[:size]
self.buf = self.buf[size:]
self.pos += len(buf)
return buf
def seek(self, pos, whence=0):
if whence != 0:
raise IOError, "SafeUncompressor only supports whence=0"
if pos < self.pos:
self.init()
self.read(pos - self.pos)
def tell(self):
return self.pos
对于bzip2来说,它并不能很好地工作,因此这部分代码已被禁用。原因是bz2.BZ2Decompressor.decompress
已经可以产生一大块不需要的数据。
如果您开发linux平台,可以在单独的进程中运行解压缩,并使用ulimit来限制内存使用。
import subprocess
subprocess.Popen("ulimit -v %d; ./decompression_script.py %s" % (LIMIT, FILE))
我还需要处理上传的zip文件中的zip炸弹。
我通过创建固定大小的tmpfs并将其解压缩到其中来实现此目的。如果提取的数据太大,则tmpfs将耗尽空间并显示错误。
以下是用于创建200M tmpfs以进行解压缩的Linux命令。
sudo mkdir -p /mnt/ziptmpfs
echo 'tmpfs /mnt/ziptmpfs tmpfs rw,nodev,nosuid,size=200M 0 0' | sudo tee -a /etc/fstab