我需要从S3存储中提取一堆zip文件,并将它们添加到tar存档中并将该存档存储在S3上。很可能这些zip文件的总和将大于Lambda函数允许的512mb本地存储。我有一个部分解决方案,可以获取来自S3的对象并在不使用Lambda本地存储的情况下将它们提取并放置在S3对象中。
提取对象线程
public class ExtractObject implements Runnable{
private String objectName;
private String uuid;
private final byte[] buffer = new byte[1024];
public ExtractAdvert(String name, String uuid) {
this.objectName= name;
this.uuid= uuid;
}
@Override
public void run() {
final String srcBucket = "my-bucket-name";
final AmazonS3 s3Client = new AmazonS3Client();
try {
S3Object s3Object = s3Client.getObject(new GetObjectRequest(srcBucket, objectName));
ZipInputStream zis = new ZipInputStream(s3Object.getObjectContent());
ZipEntry entry = zis.getNextEntry();
while(entry != null) {
String fileName = entry.getName();
String mimeType = FileMimeType.fromExtension(FilenameUtils.getExtension(fileName)).mimeType();
System.out.println("Extracting " + fileName + ", compressed: " + entry.getCompressedSize() + " bytes, extracted: " + entry.getSize() + " bytes, mimetype: " + mimeType);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int len;
while ((len = zis.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(outputStream.size());
meta.setContentType(mimeType);
System.out.println("##### " + srcBucket + ", " + FilenameUtils.getFullPath(objectName) + "tmp" + File.separator + uuid + File.separator + fileName);
// Add this to tar archive instead of putting back to s3
s3Client.putObject(srcBucket, FilenameUtils.getFullPath(objectName) + "tmp" + File.separator + uuid + File.separator + fileName, is, meta);
is.close();
outputStream.close();
entry = zis.getNextEntry();
}
zis.closeEntry();
zis.close();
} catch (IOException ioe) {
System.out.println(ioe.getMessage());
}
}
}
这段代码针对需要提取的每个对象运行,并按所需结构将它们保存在一个 S3 对象中,以便于制作 tar 文件。我认为我需要做的是将对象保留在内存中并添加到 tar 存档中,然后上传。但是在反复尝试后,我没有成功创建 tar 文件。主要问题是我无法在 Lambda 中使用 tmp 目录。
编辑:我应该在进行处理时创建 tar 文件,而不是将对象放回 S3 吗?(请参见评论 // Add this to tar archive instead of putting back to s3)如果是这样的话,我如何创建 tar 流而不将其存储在本地?
编辑2:尝试打包文件
ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
ListObjectsV2Result result;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
TarArchiveOutputStream tarOut = new TarArchiveOutputStream(baos);
do {
result = s3Client.listObjectsV2(req);
for (S3ObjectSummary objectSummary : result.getObjectSummaries()) {
if(objectSummary.getKey().startsWith("tmp/") ) {
System.out.printf(" - %s (size: %d)\n", objectSummary.getKey(), objectSummary.getSize());
S3Object s3Object = s3Client.getObject(new GetObjectRequest(bucketName, objectSummary.getKey()));
InputStream is = s3Object.getObjectContent();
System.out.println("Pre Create entry");
TarArchiveEntry archiveEntry = new TarArchiveEntry(IOUtils.toByteArray(is));
// Getting following exception above
// IllegalArgumentException: Invalid byte 111 at offset 7 in ' positio' len=8
System.out.println("Pre put entry");
tarOut.putArchiveEntry(archiveEntry);
System.out.println("Post put entry");
}
}
String token = result.getNextContinuationToken();
System.out.println("Next Continuation Token: " + token);
req.setContinuationToken(token);
} while (result.isTruncated());
ObjectMetadata metadata = new ObjectMetadata();
InputStream is = new ByteArrayInputStream(baos.toByteArray());
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + "tar-file", is, metadata));