如何使用Boto3 Python在S3中创建zip文件?

4
我正在尝试从S3存储桶中的子文件夹中创建一个zip文件,然后将该zip文件保存在同一存储桶中的另一个子文件夹中。
当我在本地运行我的flask应用程序时,我可以从S3子文件夹存储桶中创建zip文件,但在Heroku上无法创建,因为它不会存储任何东西。
我正在查看这个示例,但它似乎已经过时,使用本地文件。 https://www.botreetechnologies.com/blog/create-and-download-zip-file-in-django-via-amazon-s3 这是我正在使用的代码片段。
from flask import Response
import boto3, zipfile, os

AWS_ACCESS_KEY_ID = "some access key"
AWS_ACCESS_SECRET_ACCESS_KEY = "some secret key"
AWS_STORAGE_BUCKET_NAME = "some bucket"

aws_session = boto3.Session(aws_access_key_id = AWS_ACCESS_KEY_ID,
                   aws_secret_access_key = AWS_SECRET_ACCESS_KEY)

s3 = aws_session.resource("s3")

s3 = boto3.client("s3", region_name = "some region")
s3_resource = boto3.resource("s3")
blog_folder = "blog_1"

paginator = s3.get_paginator("list_objects")

file_list = [page for page in paginator.paginate(Bucket=AWS_STORAGE_BUCKET_NAME)\
            .search("Contents[?Size >`0`][]")
            if blog_folder in page["Key"]]



zf = zipfile.ZipFile(byte, "w")
zipped_files = []

zip_filename = "download_files.zip"

for key in file_list:

    file_name = key["Key"].split("/")[-1]

    my_bucket = s3_resource.Bucket(AWS_STORAGE_BUCKET_NAME)

    file_obj = my_bucket.Object(key["Key"]).get()


    zipped_files.append(file_obj["Body"].read())

任何想法如何解决这个问题?对于用户来说,下载一个zip文件比单个文件更加方便。非常感谢您的帮助。

如果无法保存本地文件,唯一的方法是将所有内容都存储在内存中进行流式传输。如果文件很大,则有些危险。 - John Rotenstein
我认为我可能已经通过在Heroku中使用临时文件解决了它。出于某种原因,它只是起作用了。真的很困惑,所以可能不稳定!下面发布了解决方案。 - MichaelRSF
1
@MichaelRSF 请发布解决方案。我也在尝试使用Python在S3中创建zip文件。 - steve-o
嗨,史蒂夫,我刚登录并看到了你的评论。下面是代码,但尚未测试。理论上应该可以工作。您可以忽略 blog = Blog.query.filter_by(id = blog_id).first() - MichaelRSF
2个回答

9

Python的内存zip库非常适合这种情况。以下是我一个项目中的示例:

import io
import zipfile

zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, "a", zipfile.ZIP_DEFLATED, False) as zipper:
    infile_object = s3.get_object(Bucket=bucket, Key=object_key) 
    infile_content = infile_object['Body'].read()
    zipper.writestr(file_name, infile_content)

s3.put_object(Bucket=bucket, Key=PREFIX + zip_name, Body=zip_buffer.getvalue())

1
这并没有将任何内容放入S3中,会收到200消息响应,但内容不会出现在指定位置。 - zaf187

0

我成功地在我的Heroku Flask应用程序中让它工作了。 希望能帮助到任何遇到困难的人。 PS 子文件夹 = 博客文件夹 因此结构是,Bucket / blog_folder / 资源 Bucket / blog_folder / zipped

import tempfile, zipfile, os, boto3
AWS_ACCESS_KEY_ID = "some access key"
AWS_ACCESS_SECRET_ACCESS_KEY = "some secret key"
AWS_STORAGE_BUCKET_NAME = "some bucket"



def make_zipfile(output_filename, source_dir):
    relroot = os.path.abspath(os.path.join(source_dir, os.pardir)) 
    with zipfile.ZipFile(output_filename, "w", zipfile.ZIP_DEFLATED) as zip:
         for root, dirs, files in os.walk(source_dir):
             # add directory (needed for empty dirs)
             zip.write(root, os.path.relpath(root, relroot))
             for file in files:
                 filename = os.path.join(root, file)
                 if os.path.isfile(filename): # regular files only
                     arcname = os.path.join(os.path.relpath(root, relroot), file)
                     zip.write(filename, arcname)

aws_session = boto3.Session(aws_access_key_id = AWS_ACCESS_KEY_ID,
                   aws_secret_access_key = AWS_SECRET_ACCESS_KEY)

s3 = aws_session.resource("s3")

current_path = os.getcwd()
temp = tempfile.TemporaryDirectory(suffix="_tmp",  prefix="basic_", dir=current_path)

### AT TOP OF YOUR APP.PY file ^^^^^^^^^^

@app_blog.route("/download_blog_res_zipfile/<int:blog_id>", methods = ["GET", "POST"])
def download_blog_res_zipfile(blog_id):
    
    current_path = os.getcwd()
    
    blog = Blog.query.filter_by(id = blog_id).first()
    print(blog)
    print("DOWNLOAD COUNT!!!")
    print(blog.download_count)
    blog.download_count += 1
    db.session.commit()
    
    
    del_folders = os.listdir(os.getcwd() + "/BLOG_ZIPPED_FOLDER")
    
    
    for folder in del_folders:
        
        zipp_path = os.getcwd() + "/BLOG_ZIPPED_FOLDER/" + folder
        
        print(folder)
        print("DELETING ZIPPING!")
        
        shutil.rmtree(os.getcwd() + "/BLOG_ZIPPED_FOLDER/" + folder)
        
        
    temp_zipp = tempfile.TemporaryDirectory(suffix="_tmp", prefix="zipping_",
                                            dir=current_path + "/BLOG_ZIPPED_FOLDER")
    
    
    s3 = boto3.client("s3", region_name = REGION_NAME)
    s3_resource = boto3.resource("s3")
    my_bucket = s3_resource.Bucket(AWS_STORAGE_BUCKET_NAME)
    
    
    paginator = s3.get_paginator("list_objects")
    
    folder = "blogs/blog_{}/resources".format(blog.id)
    
    
    file_list = [page for page in paginator.paginate(Bucket = AWS_STORAGE_BUCKET_NAME)\
                 .search("Contents[?Size >`0`][]")
                 if folder in page["Key"]]
    
    
    for key in file_list:
        
        
        file_name = key["Key"].split("/")[-1]
        
        print(file_name)
        
        file_obj = my_bucket.Object(key["Key"]).get()["Body"]
        
        with open(os.getcwd() + "/" + BLOG_FOLDER + "/" + file_name, "wb") as w:
            
            w.write(file_obj.read())
            
            
    make_zipfile(temp_zipp.name + "/blog_res_{}.zip".format(blog_id),
                 current_path + "/" + BLOG_FOLDER)
    
    try:
        
        for key in file_list:
            
            file_name = key["Key"].split("/")[-1]
            
            file_path = current_path + "/" + BLOG_FOLDER +"/" + file_name
            os.remove(file_path)
            print("TRYY!!")
            print("REMOVED!!!")
            
            
            
    except:
        
        for key in file_list:
            
            file_name = key["Key"].split("/")[-1]
            
            file_path = current_path + "/" + BLOG_FOLDER + "/" + file_name
            os.remove(file_path)
            print("EXCEPT!!!")
            print("REMOVED!!!")

    
    return send_from_directory(temp_zipp.name, "blog_res_{}.zip".format(blog_id),
                               as_attachment = True)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接