AWS SDK使用Node.js实现s3的分段上传

13

我正在尝试使用node.js的aws-sdk将大文件上传到s3存储桶。

V2方法upload会以分块上传的方式上传文件。

我想要使用新版本的aws-sdk(V3),有什么方法可以上传大文件吗?看起来PutObjectCommand方法并不行。

我发现有一些方法,比如CreateMultiPartUpload,但我找不到一个完整的可用的示例。

先感谢您的帮助。

3个回答

15

截至2021年,我建议使用lib-storage包,它抽象了许多实现细节。

示例代码:

import { Upload } from "@aws-sdk/lib-storage";
import { S3Client, S3 } from "@aws-sdk/client-s3";

const target = { Bucket, Key, Body };
try {
  const parallelUploads3 = new Upload({
    client: new S3({}) || new S3Client({}),
    tags: [...], // optional tags
    queueSize: 4, // optional concurrency configuration
    partSize: 5MB, // optional size of each part
    leavePartsOnError: false, // optional manually handle dropped parts
    params: target,
  });

  parallelUploads3.on("httpUploadProgress", (progress) => {
    console.log(progress);
  });

  await parallelUploads3.done();
} catch (e) {
  console.log(e);
}

来源:https://github.com/aws/aws-sdk-js-v3/blob/main/lib/lib-storage/README.md


3
由于某些原因,我的应用程序只会随着这种方法而增加内存。我找不到如何以良好和高效的方式使用v3。 - Rhadamez Gindri Hercilio
1
到目前为止,我使用这种方法的经验很好。感谢您的答案。 - badsyntax
@RhadamezGindriHercilio,我在这里同意你的看法,当批量上传时,我发现内存会随着时间增加。你有没有找到替代方案呢? - Jordan Lewallen
@JordanLewallen,实际上是的,但我最终找到了另一个更好的解决方案,可以向你展示。 - Rhadamez Gindri Hercilio
@RhadamezGindriHercilio非常希望能够看到一个解决方案的gist或链接。一直在努力解决这个问题。谢谢! - Jordan Lewallen
文档中提到,它会在内存中缓冲队列大小 * 部分大小,这可能是你看到内存使用量增加的原因。 - Melodie

5

以下是我使用aws-sdk v3、nodejs和TypeScript,上传Buffer作为多段上传的代码。

错误处理还需要加强(你可能希望在出现错误的情况下停止/重试),但这应该是一个很好的起点……我已经测试了多达15MB的XML文件,目前看起来没问题。不过不保证100%可靠!;)

import {
  CompleteMultipartUploadCommand,
  CompleteMultipartUploadCommandInput,
  CreateMultipartUploadCommand,
  CreateMultipartUploadCommandInput,
  S3Client,
  UploadPartCommand,
  UploadPartCommandInput
} from '@aws-sdk/client-s3'

const client = new S3Client({ region: 'us-west-2' })

export const uploadMultiPartObject = async (file: Buffer, createParams: CreateMultipartUploadCommandInput): Promise<void> => {
  try {
    const createUploadResponse = await client.send(
      new CreateMultipartUploadCommand(createParams)
    )
    const { Bucket, Key } = createParams
    const { UploadId } = createUploadResponse
    console.log('Upload initiated. Upload ID: ', UploadId)

    // 5MB is the minimum part size
    // Last part can be any size (no min.)
    // Single part is treated as last part (no min.)
    const partSize = (1024 * 1024) * 5 // 5MB
    const fileSize = file.length
    const numParts = Math.ceil(fileSize / partSize)

    const uploadedParts = []
    let remainingBytes = fileSize

    for (let i = 1; i <= numParts; i ++) {
      let startOfPart = fileSize - remainingBytes
      let endOfPart = Math.min(partSize, startOfPart + remainingBytes)

      if (i > 1) {
        endOfPart = startOfPart + Math.min(partSize, remainingBytes)
        startOfPart += 1
      }

      const uploadParams: UploadPartCommandInput = {
        // add 1 to endOfPart due to slice end being non-inclusive
        Body: file.slice(startOfPart, endOfPart + 1),
        Bucket,
        Key,
        UploadId,
        PartNumber: i
      }
      const uploadPartResponse = await client.send(new UploadPartCommand(uploadParams))
      console.log(`Part #${i} uploaded. ETag: `, uploadPartResponse.ETag)

      remainingBytes -= Math.min(partSize, remainingBytes)

      // For each part upload, you must record the part number and the ETag value.
      // You must include these values in the subsequent request to complete the multipart upload.
      // https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html
      uploadedParts.push({ PartNumber: i, ETag: uploadPartResponse.ETag })
    }

    const completeParams: CompleteMultipartUploadCommandInput = {
      Bucket,
      Key,
      UploadId,
      MultipartUpload: {
        Parts: uploadedParts
      }
    }
    console.log('Completing upload...')
    const completeData = await client.send(new CompleteMultipartUploadCommand(completeParams))
    console.log('Upload complete: ', completeData.Key, '\n---')
  } catch(e) {
    throw e
  }
}


谢谢您的回复。不幸的是,我仍然遇到相同的MalformedXML错误。此外,当我运行您的代码时,每个部分仍然会得到相同的ETag。 - Adi Fuchs
实际上,除了第一个和最后一个之外,所有的ETag都是相同的。 - Adi Fuchs
1
我得到了Etag未定义的错误...这会导致CompleteMultipartUploadCommand出现“提供的XML格式不正确”的错误。除此之外,它似乎工作得很好。 - Otani Shuzo
1
这是问题的所在: https://github.com/aws/aws-sdk-js-v3/issues/3177#issuecomment-1062967707 - Otani Shuzo

1
这里是使用AWS SDK v3的完整可用代码。
import { Upload } from "@aws-sdk/lib-storage";
import { S3Client, S3 } from "@aws-sdk/client-s3";
import { createReadStream } from 'fs';

const inputStream = createReadStream('clamav_db.zip');
const Bucket = process.env.DB_BUCKET
const Key = process.env.FILE_NAME
const Body = inputStream

const target = { Bucket, Key, Body};
try {
  const parallelUploads3 = new Upload({
    client: new S3Client({
      region: process.env.AWS_REGION,
      credentials: { accessKeyId: process.env.AWS_ACCESS_KEY, secretAccessKey: process.env.AWS_SECRET_KEY }
    }),
    queueSize: 4, // optional concurrency configuration
    partSize: 5242880, // optional size of each part
    leavePartsOnError: false, // optional manually handle dropped parts
    params: target,
  });

  parallelUploads3.on("httpUploadProgress", (progress) => {
    console.log(progress);
  });

  await parallelUploads3.done();
} catch (e) {
  console.log(e);
}

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接