手动向StreamingHttpResponse (Django)添加行

8
我正在使用Django的StreamingHttpResponse来动态流式传输一个大型CSV文件。根据文档,需要将一个迭代器传递给响应的streaming_content参数:
import csv
from django.http import StreamingHttpResponse

def get_headers():
    return ['field1', 'field2', 'field3']

def get_data(item):
    return {
        'field1': item.field1,
        'field2': item.field2,
        'field3': item.field3,
    }

# StreamingHttpResponse requires a File-like class that has a 'write' method
class Echo(object):
    def write(self, value):
        return value


def get_response(queryset):
    writer = csv.DictWriter(Echo(), fieldnames=get_headers())
    writer.writeheader() # this line does not work

    response = StreamingHttpResponse(
        # the iterator
        streaming_content=(writer.writerow(get_data(item)) for item in queryset),
        content_type='text/csv',
    )
    response['Content-Disposition'] = 'attachment;filename=items.csv'

    return response

我的问题是:我如何手动在CSV写入器上编写一行数据?手动调用writer.writerow(data)或writer.writeheader()(它也内部调用writerow())似乎并没有写入数据集,而只有来自streaming_content的生成/流式数据被写入输出数据集。
3个回答

16

使用生成器函数而不是在StreamingHttpResponse的streaming_content参数中即时计算结果,并使用我们创建的伪缓冲区(Echo Class)将一行写入响应,可以得出答案:

import csv
from django.http import StreamingHttpResponse

def get_headers():
    return ['field1', 'field2', 'field3']

def get_data(item):
    return {
        'field1': item.field1,
        'field2': item.field2,
        'field3': item.field3,
    }

# StreamingHttpResponse requires a File-like class that has a 'write' method
class Echo(object):
    def write(self, value):
        return value

def iter_items(items, pseudo_buffer):
    writer = csv.DictWriter(pseudo_buffer, fieldnames=get_headers())
    yield pseudo_buffer.write(get_headers())

    for item in items:
        yield writer.writerow(get_data(item))

def get_response(queryset):
    response = StreamingHttpResponse(
        streaming_content=(iter_items(queryset, Echo())),
        content_type='text/csv',
    )
    response['Content-Disposition'] = 'attachment;filename=items.csv'
    return response

尽管看起来很奇怪,但这个答案是在众多给出同一主题答案中正确的一个。奇怪的是,这样做以创建Django的流式下载如此难以理解... - rubmz
好的解决方案,但我会将: yield pseudo_buffer.write(get_headers()) 改为: yield pseudo_buffer.writerow(get_headers()) 以便在文件的下一行中写入,并避免将['field1','field2','field3']作为标题写入,而正确的标题应该是'field1','field2','field3'。 你可以看到pseudo_buffer.write(...)会直接写入传递的原始参数。 - Reidel
非常感谢您的回复。非常有帮助 :) - davjfish

4
提议的解决方案实际上可能会导致不正确/不匹配的CSV(标题与数据不匹配)。您需要使用类似以下内容替换受影响的部分:
header = dict(zip(fieldnames, fieldnames))
yield writer.writerow(header)

相反,这是从writeheader的实现中来的https://github.com/python/cpython/blob/08045391a7aa87d4fbd3e8ef4c852c2fa4e81a8a/Lib/csv.py#L141:L143

由于某些原因,它与yield不兼容

希望这可以在未来帮助某些人 :)

此外,请注意,如果使用Python 3.8+就不需要修复,因为有这个PR: https://bugs.python.org/issue27497


0

在Python中,您可以使用itertools链接生成器,将标题行添加到查询集行中。

以下是操作步骤:

import itertools

def some_streaming_csv_view(request):
    """A view that streams a large CSV file."""
    # Generate a sequence of rows. The range is based on the maximum number of
    # rows that can be handled by a single sheet in most spreadsheet
    # applications.
    headers = [["title 1", "title 2"], ]
    row_titles = (header for header in headers) # title generator

    items = Item.objects.all()
    rows = (["Row {}".format(item.pk), str(item.pk)] for item in items)
    pseudo_buffer = Echo()
    writer = csv.writer(pseudo_buffer)
    rows = itertools.chain(row_titles, rows)  # merge 2 generators
    return StreamingHttpResponse(
        (writer.writerow(row) for row in rows),
        content_type="text/csv",
        headers={'Content-Disposition': 'attachment; filename="somefilename.csv"'},
    )

然后您将获得包含标题和查询集的 CSV 文件:

title 1, title 2
1, 1
2, 2
...

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接