在Python中解析多部分请求字符串

Question

在Python中解析多部分请求字符串

pythonamazon-web-servicesaws-lambdaaws-api-gateway

7

我有一个字符串，如下所示：

"--5b34210d81fb44c5a0fdc1a1e5ce42c3\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--5b34210d81fb44c5a0fdc1a1e5ce42c3\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--5b34210d81fb44c5a0fdc1a1e5ce42c3--\r\n"

我还可以在其他变量中获取请求头。

如何使用Python3轻松解析它？

我通过API Gateway在AWS Lambda中处理文件上传，请求体和头通过Python字典可用。

StackOverflow上有其他类似的问题，但大多数都假定使用requests模块或其他模块，并期望请求细节以特定对象或格式提供。

注意：我知道用户可以上传到S3并触发Lambda，但在这种情况下，我有意选择不这样做。

- Sam Anthony

6个回答

5

扩展sam-anthony的回答（我已经对它进行了一些修复，以使其在Python 3.6.8上工作）：

from requests_toolbelt.multipart import decoder

multipart_string = b"--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--ce560532019a77d83195f9e9873e16a1--\r\n"
content_type = "multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"

for part in decoder.MultipartDecoder(multipart_string, content_type).parts:
  print(part.text)

John Smith
Hello World

您需要做的是通过 pip install requests-toolbelt --target=. 安装此库，然后将其与 Lambda 脚本一起上传。

以下是一个可行的示例：

from requests_toolbelt.multipart import decoder

def lambda_handler(event, context):

    content_type_header = event['headers']['Content-Type']

    body = event["body"].encode()

    response = ''
    for part in decoder.MultipartDecoder(body, content_type_header).parts:
      response += part.text + "\n"

    return {
        'statusCode': 200,
        'body': response
    }

这应该足以使您的依赖项被识别。如果它们没有被识别，请尝试在zip文件中使用"/python/lib/python3.6/site-packages"文件结构，并将您的Python脚本放在根目录。

- cesartalves

4

如果你想使用Python的CGI功能，

from cgi import parse_multipart, parse_header
from io import BytesIO

c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'
decoded_string = base64.b64decode(event['body'])
#For Python 3: these two lines of bugfixing are mandatory
#see also: https://dev59.com/YVwZ5IYBdhLWcg3waft6
c_data['boundary'] = bytes(c_data['boundary'], "utf-8")
c_data['CONTENT-LENGTH'] = event['headers']['Content-length']
form_data = parse_multipart(BytesIO(decoded_string), c_data)

for image_str in form_data['file']:
    ...

- Ye Min Htut

4

我遇到了一些奇怪的编码问题，而且api gateway的行为也很奇怪。最初接收到请求正文是字节形式的，然后重新部署后开始以base64形式接收。无论如何，这是最终为我工作的代码。

import json
import base64
import boto3
from requests_toolbelt.multipart import decoder

s3client = boto3.client("s3")
def lambda_handler(event, context):
    content_type_header = event['headers']['content-type']
    postdata = base64.b64decode(event['body']).decode('iso-8859-1')
    imgInput = ''
    lst = []
    for part in decoder.MultipartDecoder(postdata.encode('utf-8'), content_type_header).parts:
        lst.append(part.text)
    response = s3client.put_object(  Body=lst[0].encode('iso-8859-1'),  Bucket='test',    Key='mypicturefinal.jpg')
    return {'statusCode': '200','body': 'Success', 'headers': { 'Content-Type': 'text/html' }}

- Jeffrey DeMuth

2

很不幸地，从Python 3.11开始，cgi模块已被弃用。

如果你可以使用multipart库（当前的cgi模块文档中提到它作为可能的替代品），你可以在AWS Lambda函数中像这样使用它的parse_form_data()函数：

import base64
from io import BytesIO

from multipart import parse_form_data


def lambda_handler(event, context):
    """
    Process a HTTP POST request of encoding type "multipart/form-data".
    """

    # HTTP headers are case-insensitive
    headers = {k.lower():v for k,v in event['headers'].items()}

    # AWS API Gateway applies base64 encoding on binary data
    body = base64.b64decode(event['body'])

    # Parse the multipart form data
    environ = {
        'CONTENT_LENGTH': headers['content-length'],
        'CONTENT_TYPE': headers['content-type'],
        'REQUEST_METHOD': 'POST',
        'wsgi.input': BytesIO(body)
    }
    form, files = parse_form_data(environ)

    # Example usage...
    form_data = dict(form)
    logger.info(form_data)

    attachments = {key:{
            'filename': file.filename,
            'content_type': file.content_type,
            'size': file.size,
            'data': file.raw
        } for key,file in files.items()}
    logger.info(attachments)

- jrc

0

如果使用CGI，我建议使用FieldStorage：

from cgi import FieldStorage

fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']
originalFileName = fs.filename
binaryFileData = fs.file.read()

参见： https://dev59.com/21DZs4cB2Jgan1znZOZT#38718958 如果事件主体包含多个文件：

fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']

返回一个FieldStorage对象列表，因此您可以执行以下操作：

for f in fs:
    originalFileName = f.filename
    binaryFileData = f.file.read()

总的来说，我的解决方案可以处理单个文件、多个文件以及包含无文件的主体，并确保它是mutlipart/form-data格式的：

from cgi import parse_header, FieldStorage

#see also: https://dev59.com/-6zka4cB1Zd3GeqP50NK#56405982
c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'

#see also: https://dev59.com/21DZs4cB2Jgan1znZOZT#38718958
fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']

#If fs contains a single file or no file: making FieldStorage object to a list, so it gets iterable
if not(type(fs) == list):
    fs = [fs]

for f in fs:
    originalFileName = f.filename
    #no file: 
    if originalFileName == '':
        continue
    binaryFileData = f.file.read()
    #Do something with the data

- Lukas

5

这个错误信息的意思是：TypeError: fp 必须是文件指针 Traceback（最近一次调用的函数在最上面）。 - user3821178

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sam Anthony · Accepted Answer

可以使用类似于下面的方法进行解析：

from requests_toolbelt.multipart import decoder
multipart_string = "--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--ce560532019a77d83195f9e9873e16a1--\r\n"
content_type = "multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"
decoder.MultipartDecoder(multipart_string, content_type)