如何使用boto3 Python API将大文件从AWS S3存储桶复制到另一个S3存储桶?

3
如何使用boto3 Python API将大文件从AWS S3存储桶复制到另一个S3存储桶?如果使用client.copy(),则会抛出错误"An error occurred (InvalidArgument) when calling the UploadPartCopy operation: Range specified is not valid for source object of size:"。
1个回答

2
根据AWS S3 boto3 API文档,我们应该使用multipart upload。我已经搜索过了,但没有找到明确、精准的答案。最终,在彻底阅读boto3 api后,我找到了答案。以下是答案。这段代码在多线程中也可以完美地工作。
如果使用多线程,请在每个线程中创建s3_client。我测试了这种方法,并成功地将大量数据从一个S3 bucket复制到另一个不同的s3 bucket。
获取s3_client的代码如下:
def get_session_client():
    # session = boto3.session.Session(profile_name="default")
    session = boto3.session.Session()
    client = session.client("s3")
    return session, client



    def copy_with_multipart(local_s3_client, src_bucket, target_bucket, key, object_size):
        current_thread_name = get_current_thread_name()
        try:
            initiate_multipart = local_s3_client.create_multipart_upload(
                Bucket=target_bucket,
                Key=key
            )
            upload_id = initiate_multipart['UploadId']
            # 5 MB part size
            part_size = 5 * 1024 * 1024
            byte_position = 0
            part_num = 1

            parts_etags = []
            
            while (byte_position < object_size):
                #  The last part might be smaller than partSize, so check to make sure
                #  that lastByte isn't beyond the end of the object.
                last_byte = min(byte_position + part_size - 1, object_size - 1)
                copy_source_range = f"bytes={byte_position}-{last_byte}"
                # Copy this part
                try:
                    info_log(f"{current_thread_name} Creating upload_part_copy source_range: {copy_source_range}")
                    response = local_s3_client.upload_part_copy(
                        Bucket=target_bucket,
                        CopySource={'Bucket': src_bucket, 'Key': key},
                        CopySourceRange=copy_source_range,
                        Key=key,
                        PartNumber=part_num,
                        UploadId=upload_id
                    )
                except Exception as ex:
                    error_log(f"{current_thread_name} Error while CREATING UPLOAD_PART_COPY for key {key}")
                    raise ex
                parts_etags.append({"ETag": response["CopyPartResult"]["ETag"], "PartNumber": part_num})
                part_num += 1
                byte_position += part_size
            try:
                response = local_s3_client.complete_multipart_upload(
                    Bucket=target_bucket,
                    Key=key,
                    MultipartUpload={
                        'Parts': parts_etags
                    },
                    UploadId=upload_id
                )
                info_log(f"{current_thread_name} {key} COMPLETE_MULTIPART_UPLOAD COMPLETED SUCCESSFULLY, response={response} !!!!")
            except Exception as ex:
                error_log(f"{current_thread_name} Error while CREATING COMPLETE_MULTIPART_UPLOAD for key {key}")
                raise ex
        except Exception as ex:
            error_log(f"{current_thread_name} Error while CREATING CREATE_MULTIPART_UPLOAD for key {key}")
            raise ex

调用多部分方法:

  _, local_s3_client = get_session_client()
 copy_with_multipart(local_s3_client, src_bucket_name, target_bucket_name, key, src_object_size)

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接