如何将 pandas dataframe 以 JSON 格式写入到 S3？

Question

如何将 pandas dataframe 以 JSON 格式写入到 S3？

4

我有一个 AWS Lambda 函数，用于创建数据框，我需要将该文件写入 S3 存储桶。

import pandas as pd
import boto3
import io

# code to get the df

destination = "output_" + str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) + '.json'

df.to_json(destination) # this file should be written to S3 bucket

- mellifluous

2个回答

0

你也可以使用以下代码。

#Creating Session using Boto3

session = boto3.Session(
aws_access_key_id='<key ID>',
aws_secret_access_key='<secret_key>'
)
 
#Create s3 session with boto3

s3 = session.resource('s3')
 
json_buffer = io.StringIO()
 
# Create dataframe and convert to pandas
df = spark.range(4).withColumn("organisation", lit("stackoverflow"))
df_p = df.toPandas()
df_p.to_json(json_buffer, orient='records')
 
#Create s3 object
object = s3.Object('<bucket-name>', '<JSON file name>')
 
#Put the object into bucket
result = object.put(Body=json_buffer.getvalue())

- Aman Sehgal

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- mellifluous · Accepted Answer

以下代码在AWS Lambda中运行，并上传json文件到S3。

Lambda角色应具有S3访问权限。

import pandas as pd
import boto3
import io

# code to get the df

destination = "output_" + str(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) + '.json'

json_buffer = io.StringIO()

df.to_json(json_buffer)

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my-bucket-name')

my_bucket.put_object(Key=destination, Body=json_buffer.getvalue())