AWS Lambda和S3以及Pandas - 将CSV加载到S3中，触发Lambda，加载到pandas中，再放回bucket？

Question

AWS Lambda和S3以及Pandas - 将CSV加载到S3中，触发Lambda，加载到pandas中，再放回bucket？

pythonpandasamazon-web-servicesamazon-s3aws-lambda

6

我对AWS和lambda还不熟悉，如果这是一个愚蠢的问题，请原谅。我想要实现的是将电子表格加载到s3存储桶中，基于该上传触发lambda，让lambda将csv加载到pandas中并进行操作，然后将数据框架写回到第二个s3存储桶中。

我已经阅读了很多有关压缩Python脚本、所有库和依赖项以及上传的信息，并且这是另一个问题。我也已经弄清楚了如何在将文件上传到S3存储桶时触发lambda，以及如何将该文件自动复制到第二个s3存储桶中。

我遇到的困难是找不到任何关于“中间部分”的信息，即在lambda函数内部将文件加载到pandas中并在其中操作文件。

第一个问题：这种操作是否可能？第二个问题：我如何从s3存储桶“获取”文件并将其加载到pandas中？会像这样吗？

import pandas as pd
import boto3
import json
s3 = boto3.resource('s3')

def handler(event, context):
     dest_bucket = s3.Bucket('my-destination-bucket')
     df = pd.read_csv(event['Records'][0]['s3']['object']['key'])
     # stuff to do with dataframe goes here

     s3.Object(dest_bucket.name, <code for file key>).copy_from(CopySource = df)

我真的不知道这是否正确，这只是一个猜测。任何帮助都将不胜感激，因为显然我不太懂这方面的知识！

- Tkelly

1

应该是可以的，参见以下问题，从s3读取文件到pandas。https://dev59.com/AloU5IYBdhLWcg3wM1C9 - Usman Azhar

感谢您的回复。看起来这个回答更适用于访问S3存储桶中的文件，但Lambda根本没有被使用，而似乎只是一个普通的Python脚本。根据我的问题，我该如何在AWS Lambda函数内进行修改呢？ - Tkelly

你可以在handler方法中使用Python脚本，也可以编写一个单独的方法。它解释了如何执行此操作，在你的情况下，你需要将其放在Lambda函数中，因为你已经配置了Lambda触发器，所以它应该可以工作。 - Usman Azhar

1

看起来你正在将S3对象键传递给pandas read_csv()方法。 S3键的格式为dir1/dir2/file.csv。你需要的是对象的S3 URI，其格式为s3://bucket/dir1/dir2/file.csv。因此，请从事件对象中的bucket和key构建正确的URI，然后将其传递给pandas read_csv()。 - jarmod

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Nicholas Martinez · Accepted Answer

这段代码会在进行PUTS操作时触发一个Lambda函数，然后进行GETS操作，最后将其放入另一个存储桶中。

from __future__ import print_function
import os
import time
import json
import boto3

s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = quote(event['Records'][0]['s3']['object']['key'].encode('utf8'))
    try:
        response = s3.get_object(Bucket=bucket, Key=key)
        s3_upload_article(response, bucket, end_path)
        return response['ContentType']
    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

def s3_upload_article(html, bucket, end_path):
    s3.put_object(Body=html, Bucket=bucket, Key=end_path, ContentType='text/html', ACL='public-read')

我将这段代码从一个更复杂的Lambda脚本中拆分出来，但是我希望它展示了你需要做的一些内容。对象的PUTS只会触发脚本。在事件被触发后发生的任何其他操作都由您编写到脚本中。

bucket = event['Records'][0]['s3']['bucket']['name']
key = quote(event['Records'][0]['s3']['object']['key'].encode('utf8'))

前几行中的Bucket和key是触发事件的对象的Bucket和key。其他内容由您自己决定。