在Python中写入UTF-8格式的文本文件

3

我的Django应用程序从用户那里获取文档,创建一些关于它的报告,并将其写入txt文件。有趣的问题是,在我的Mac OS上一切都很好,但在Windows上,它无法读取一些字母,将其转换为符号,如é™ä±。这是我的代码:

views.py:

def result(request):
    last_uploaded = OriginalDocument.objects.latest('id')
    original = open(str(last_uploaded.document), 'r')
    original_words = original.read().lower().split()
    words_count = len(original_words)
    open_original = open(str(last_uploaded.document), "r")
    read_original = open_original.read()
    characters_count = len(read_original)
    report_fives = open("static/report_documents/" + str(last_uploaded.student_name) + 
    "-" + str(last_uploaded.document_title) + "-5.txt", 'w', encoding="utf-8")
    # Path to the documents with which original doc is comparing
    path = 'static/other_documents/doc*.txt'
    files = glob.glob(path)
    #endregion

    rows, found_count, fives_count, rounded_percentage_five, percentage_for_chart_five, fives_for_report, founded_docs_for_report = search_by_five(last_uploaded, 5, original_words, report_fives, files)


    context = {
        ...
    }

    return render(request, 'result.html', context)

报告文本文件:
['universitetindé™', 'té™hsili', 'alä±ram.', 'mé™n'] was found in static/other_documents\doc1.txt.
...
1个回答

2
问题在于您没有指定编码方式就调用了文件的open()方法。正如Python文档所述,缺省编码方式取决于平台。这可能是为什么您在Windows和MacOS上看到不同结果的原因。
假设文件本身实际上是UTF-8编码的,只需在读取文件时指定该编码方式即可:
original = open(str(last_uploaded.document), 'r', encoding="utf-8")

加了这个属性到那一行后,结果就消失了 :( - undefined
由于未知字符错误,即使报告文档文件无法在MS Word中打开。 - undefined
我发现了问题!非常感谢你! - undefined

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接