谷歌应用引擎：UnicodeDecodeError: 'ascii'编解码器无法解码第48个字节（范围不在128内）的0xe2。

Question

谷歌应用引擎：UnicodeDecodeError: 'ascii'编解码器无法解码第48个字节（范围不在128内）的0xe2。

pythongoogle-app-engineunicodejinja2python-unicode

29

我正在使用Google App Engine开发一个小应用程序，该应用程序利用Quora RSS源。该应用程序有一个表单，根据用户输入的内容，它将输出与输入相关的链接列表。现在，如果单词由“-”分隔，则一字查询和大部分双字查询都能正常工作。然而，对于三个字母的单词和一些双字母的单词，我得到以下错误：

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

这是我的Python代码：

import os
import webapp2
import jinja2
from google.appengine.ext import db
import urllib2
import re

template_dir = os.path.join(os.path.dirname(__file__), 'templates')
jinja_env = jinja2.Environment(loader = jinja2.FileSystemLoader(template_dir), autoescape=True)

class Handler(webapp2.RequestHandler):
    def write(self, *a, **kw):
        self.response.out.write(*a, **kw)
    def render_str(self, template, **params):
        t = jinja_env.get_template(template)
        return t.render(params)
    def render(self, template, **kw):
        self.write(self.render_str(template, **kw))

class MainPage(Handler):
    def get(self):
        self.render("formrss.html")
    def post(self):
        x = self.request.get("rssquery")
        url = "http://www.quora.com/" + x + "/rss"
        content = urllib2.urlopen(url).read()
        allTitles =  re.compile('<title>(.*?)</title>')
        allLinks = re.compile('<link>(.*?)</link>')
        list = re.findall(allTitles,content)
        linklist = re.findall(allLinks,content)
        self.render("frontrss.html", list = list, linklist = linklist)



app = webapp2.WSGIApplication([('/', MainPage)], debug=True)

以下是HTML代码：

这里是HTML代码：

<h1>Quora Live Feed</h1><br><br><br>

{% extends "rssbase.html" %}

{% block content %}
    {% for e in range(1, 19) %}
        {{ (list[e]) }} <br>
        <a href="{{ linklist[e] }}">{{ linklist[e] }}</a>
        <br><br>
    {% endfor %}
{% endblock %}

- Manas Chaturvedi

1

你能给我们提供完整的回溯信息吗？仅有异常信息无法告诉我们异常是在哪里引发的，或者Python是如何到达该位置的。 - Martijn Pieters

1

这个错误非常可怕，有时会在Python中发生，真的很令人困惑，我看到我不是唯一遇到这个问题的人。我遇到过多次，答案甚至并不总是清楚，是编码还是解码。例如，在Java中从未发生过这种错误，因为“一切都是Unicode”，那么为什么Python要把这种困惑强加给我们，而Java从来没有这个问题呢？我在Google App Engine上编写国际化Web应用程序时遇到了这个错误多次，即使它正在工作，该怎么做也从未清楚。 - Niklas Rosencrantz

2个回答

0

在我的AppEngine应用中，我像这样进行转换：

content = unicode(content)

我认为这更清晰易用。

- KimKha

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Jon Wayne Parrott · Accepted Answer

Python 似乎试图使用 ascii 编解码器将 Unicode 字符串解码为普通字符串，但失败了。当您处理 Unicode 数据时，需要进行解码：

content = content.decode('utf-8')