Python S3下载zip文件

Question

Python S3下载zip文件

pythonamazon-s3zip

13

我在S3上上传了zip文件，现在想要下载并处理它们。我不需要永久保存这些文件，但是需要暂时处理它们。请问如何操作呢？

- user1802143

如果您只想下载文件而不解压缩任何文件，也可以使用“download_file”方法，如此答案所示：https://dev59.com/_rX3oIgBc1ULPQZFwZVW#71474927 - Aelius

6个回答

4

Pandas提供了一种快捷方式，可从最佳答案中省去大部分代码，并允许您无需关注文件路径是在s3、gcp还是本地计算机上。

import pandas as pd  

obj = pd.io.parsers.get_filepath_or_buffer(file_path)[0]
with io.BytesIO(obj.read()) as byte_stream:
    # Use your byte stream, to, for example, print file names...
    with zipfile.ZipFile(byte_stream, mode='r') as zipf:
        for subfile in zipf.namelist():
            print(subfile)

- Teddy Ward

3

如果速度是一个问题，一个好的方法是选择一个距离您的S3存储桶相对较近（在同一地区）的EC2实例，并使用该实例来解压/处理您的压缩文件。这将减少延迟并使您能够相当有效地处理它们。完成工作后，您可以删除每个提取的文件。

注意：只有在您可以接受使用EC2实例的情况下，此方法才适用。

- DanGar

1

从S3存储桶中读取zip文件中的特定文件。

import boto3
import os
import zipfile
import io
import json


'''
When you configure awscli, you\'ll set up a credentials file located at 
~/.aws/credentials. By default, this file will be used by Boto3 to authenticate.
'''
os.environ['AWS_PROFILE'] = "<profile_name>"
os.environ['AWS_DEFAULT_REGION'] = "<region_name>"

# Let's use Amazon S3
s3_name = "<bucket_name>"
zip_file_name = "<zip_file_name>"
file_to_open = "<file_to_open>"
s3 = boto3.resource('s3')
obj = s3.Object(s3_name, zip_file_name )

with io.BytesIO(obj.get()["Body"].read()) as tf:
    # rewind the file
    tf.seek(0)
    # Read the file as a zipfile and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        file_contents= zipf.read(file_to_open).decode("utf-8")
        print(file_contents)

参考自 @brice 的答案。

- nirojshrestha019

1

我相信您已经听说过 boto，它是 Python 与 Amazon Web Services 接口。

您可以从 s3 获取 key 到 file。

import boto
import zipfile.ZipFile as ZipFile

s3 = boto.connect_s3() # connect
bucket = s3.get_bucket(bucket_name) # get bucket
key = bucket.get_key(key_name) # get key (the file in s3)
key.get_file(local_name) # set this to temporal file

with ZipFile(local_name, 'r') as myzip:
    # do something with myzip

os.unlink(local_name) # delete it

你也可以使用tempfile。更多细节请参见创建和读取临时文件。

- emesday

0

在 @brice 的回答上进行补充

如果你想逐行读取文件中的任何数据，这里是代码：

with zipfile.ZipFile(tf, mode='r') as zipf:
    for line in zipf.read("xyz.csv").split(b"\n"):
        print(line)
        break # to break off after the first line

希望这能帮到你！

- Mohseen Mulla

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- brice · Accepted Answer

因为可工作的软件>全面的文档：

Boto2

import zipfile
import boto
import io

# Connect to s3
# This will need your s3 credentials to be set up 
# with `aws configure` using the aws CLI.
#
# See: https://aws.amazon.com/cli/
conn = boto.s3.connect_s3()

# get hold of the bucket
bucket = conn.get_bucket("my_bucket_name")

# Get hold of a given file
key = boto.s3.key.Key(bucket)
key.key = "my_s3_object_key"

# Create an in-memory bytes IO buffer
with io.BytesIO() as b:

    # Read the file into it
    key.get_file(b)

    # Reset the file pointer to the beginning
    b.seek(0)

    # Read the file as a zipfile and process the members
    with zipfile.ZipFile(b, mode='r') as zipf:
        for subfile in zipf.namelist():
            do_stuff_with_subfile()

Boto3

import zipfile
import boto3
import io

# this is just to demo. real use should use the config 
# environment variables or config file.
#
# See: http://boto3.readthedocs.org/en/latest/guide/configuration.html

session = boto3.session.Session(
    aws_access_key_id="ACCESSKEY", 
    aws_secret_access_key="SECRETKEY"
)

s3 = session.resource("s3")
bucket = s3.Bucket('stackoverflow-brice-test')
obj = bucket.Object('smsspamcollection.zip')

with io.BytesIO(obj.get()["Body"].read()) as tf:

    # rewind the file
    tf.seek(0)

    # Read the file as a zipfile and process the members
    with zipfile.ZipFile(tf, mode='r') as zipf:
        for subfile in zipf.namelist():
            print(subfile)

在MacOSX上使用Python3进行测试。