在Django中流式传输CSV文件

23
我正在尝试将csv文件作为附件下载进行流传输。CSV文件的大小超过4MB,需要一种方式让用户主动下载文件,而不必等待所有数据先被创建并保存到内存中。
我最初使用了基于Django的FileWrapper类自己的文件包装器。但是失败了。然后我在这里看到了一种使用生成器来流式传输响应的方法:How to stream an HttpResponse with Django 当我在生成器中引发错误时,我可以看到我使用get_row_data()函数创建了正确的数据,但当我尝试返回响应时,它是空的。我还禁用了DjangoGZipMiddleware。有人知道我做错了什么吗? 编辑:我遇到的问题是由ConditionalGetMiddleware引起的。我不得不替换它,代码如下面的一个答案。 以下是视图:
from django.views.decorators.http import condition

@condition(etag_func=None)
def csv_view(request, app_label, model_name):
    """ Based on the filters in the query, return a csv file for the given model """

    #Get the model
    model = models.get_model(app_label, model_name)

    #if there are filters in the query
    if request.method == 'GET':
        #if the query is not empty
        if request.META['QUERY_STRING'] != None:
            keyword_arg_dict = {}
            for key, value in request.GET.items():
                #get the query filters
                keyword_arg_dict[str(key)] = str(value)
            #generate a list of row objects, based on the filters
            objects_list = model.objects.filter(**keyword_arg_dict)
        else:
            #get all the model's objects
            objects_list = model.objects.all()
    else:
        #get all the model's objects
        objects_list = model.objects.all()
    #create the reponse object with a csv mimetype
    response = HttpResponse(
        stream_response_generator(model, objects_list),
        mimetype='text/plain',
        )
    response['Content-Disposition'] = "attachment; filename=foo.csv"
    return response

这是我用来流式响应的生成器:
def stream_response_generator(model, objects_list):
    """Streaming function to return data iteratively """
    for row_item in objects_list:
        yield get_row_data(model, row_item)
        time.sleep(1)

以下是我创建csv行数据的方法:

def get_row_data(model, row):
    """Get a row of csv data from an object"""
    #Create a temporary csv handle
    csv_handle = cStringIO.StringIO()
    #create the csv output object
    csv_output = csv.writer(csv_handle)
    value_list = [] 
    for field in model._meta.fields:
        #if the field is a related field (ForeignKey, ManyToMany, OneToOne)
        if isinstance(field, RelatedField):
            #get the related model from the field object
            related_model = field.rel.to
            for key in row.__dict__.keys():
                #find the field in the row that matches the related field
                if key.startswith(field.name):
                    #Get the unicode version of the row in the related model, based on the id
                    try:
                        entry = related_model.objects.get(
                            id__exact=int(row.__dict__[key]),
                            )
                    except:
                        pass
                    else:
                        value = entry.__unicode__().encode("utf-8")
                        break
        #if it isn't a related field
        else:
            #get the value of the field
            if isinstance(row.__dict__[field.name], basestring):
                value = row.__dict__[field.name].encode("utf-8")
            else:
                value = row.__dict__[field.name]
        value_list.append(value)
    #add the row of csv values to the csv file
    csv_output.writerow(value_list)
    #Return the string value of the csv output
    return csv_handle.getvalue()
3个回答

35

这里有一些简单的代码,可以流式传输CSV文件。你可以根据这个代码进行修改以适应你需要的任何操作:

import cStringIO as StringIO
import csv

def csv(request):
    def data():
        for i in xrange(10):
            csvfile = StringIO.StringIO()
            csvwriter = csv.writer(csvfile)
            csvwriter.writerow([i,"a","b","c"])
            yield csvfile.getvalue()

    response = HttpResponse(data(), mimetype="text/csv")
    response["Content-Disposition"] = "attachment; filename=test.csv"
    return response

这段代码将每一行写入内存文件,读取该行并逐个生成。

这个版本适用于生成大量数据的情况,但在使用之前请确保理解上述内容:

import cStringIO as StringIO
import csv

def csv(request):
    csvfile = StringIO.StringIO()
    csvwriter = csv.writer(csvfile)

    def read_and_flush():
        csvfile.seek(0)
        data = csvfile.read()
        csvfile.seek(0)
        csvfile.truncate()
        return data

    def data():
        for i in xrange(10):
            csvwriter.writerow([i,"a","b","c"])
        data = read_and_flush()
        yield data

    response = HttpResponse(data(), mimetype="text/csv")
    response["Content-Disposition"] = "attachment; filename=test.csv"
    return response

我还没有需要流式传输数据,但知道获取简单而优雅的东西有多快是很好的。 - Filip Dupanović
虽然我非常喜欢这个答案,但事实证明这不是我的问题。我确实使用了你写的完全相同的代码,只是为了看看它是否会生成响应,但响应返回为0字节。所以我仍然陷入了同样的困境。 - frederix
看起来禁用ConditionalGetMiddleware实际上会允许发送响应。但我真的更喜欢保持该中间件启用状态。有没有一种方法可以使用生成器并保持该中间件启用状态? - frederix
好的,我解决了。我会编辑我的原始问题并选择这个答案,因为这是流式传输CSV文件(或使用生成器的文件)的很好的解决方案。 - frederix
3
这个解决方案的更新方法是使用Django 1.5中新引入的StreamingHttpResponse。 :) - krak3n
显示剩余2条评论

12

自Django 1.5以来,中间件问题已得到解决,并引入了StreamingHttpResponse。以下代码应该可以解决问题:

import cStringIO as StringIO
import csv

def csv_view(request):
    ...
    # Assume `rows` is an iterator or lists
    def stream():
        buffer_ = StringIO.StringIO()
        writer = csv.writer(buffer_)
        for row in rows:
            writer.writerow(row)
            buffer_.seek(0)
            data = buffer_.read()
            buffer_.seek(0)
            buffer_.truncate()
            yield data
    response = StreamingHttpResponse(
        stream(), content_type='text/csv'
    )
    disposition = "attachment; filename=file.csv"
    response['Content-Disposition'] = disposition
    return response

关于如何从Django中输出csv的文档,有一些内容在这里,但它没有利用StreamingHttpResponse,所以我去开了一个票以便跟踪它


3

我遇到的问题是与ConditionalGetMiddleware有关的。我看到django-piston提供了替换中间件,它允许流媒体:

from django.middleware.http import ConditionalGetMiddleware

def compat_middleware_factory(klass):
    """
    Class wrapper that only executes `process_response`
    if `streaming` is not set on the `HttpResponse` object.
    Django has a bad habbit of looking at the content,
    which will prematurely exhaust the data source if we're
    using generators or buffers.
    """
    class compatwrapper(klass):
        def process_response(self, req, resp):
            if not hasattr(resp, 'streaming'):
                return klass.process_response(self, req, resp)
            return resp
    return compatwrapper

ConditionalMiddlewareCompatProxy = compat_middleware_factory(ConditionalGetMiddleware)

那么你需要用你的ConditionalMiddlewareCompatProxy中间件来替换ConditionalGetMiddleware,然后在你的视图中(借用了这个问题的聪明答案中的代码):
def csv_view(request):
    def data():
        for i in xrange(10):
            csvfile = StringIO.StringIO()
            csvwriter = csv.writer(csvfile)
            csvwriter.writerow([i,"a","b","c"])
            yield csvfile.getvalue()

    #create the reponse object with a csv mimetype
    response = HttpResponse(
        data(),
        mimetype='text/csv',
        )
    #Set the response as an attachment with a filename
    response['Content-Disposition'] = "attachment; filename=test.csv"
    response.streaming = True
    return response

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接