>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
我该如何解决这个问题?谢谢!
>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
在使用Python MySQLdb模块时,我遇到了同样的问题。由于MySQL会允许您将几乎任何二进制数据存储在文本字段中,而不考虑字符集,所以我在这里找到了解决方案:
编辑:引用上面网址中的内容以满足第一条评论中的请求...
"UnicodeEncodeError:'latin-1'编解码器无法对字符进行编码..."
这是因为MySQLdb通常尝试将所有东西编码为Latin-1。可以通过在建立连接后立即执行以下命令来解决此问题:
db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')
"db"是
MySQLdb.connect()
的结果,而"dbc"是db.cursor()
的结果。
字符 U+201C 左双引号在 Latin-1 (ISO-8859-1) 编码中不存在。
它存在于代码页 1252 (西欧语言) 中。这是一种基于 ISO-8859-1 的 Windows 特有编码,将额外的字符放入范围 0x80-0x9F 内。代码页 1252 经常被误认为是 ISO-8859-1,并且现在已成为标准 Web 浏览器行为的一种令人烦恼的问题:如果您将页面呈现为 ISO-8859-1,则浏览器将把它们视为 cp1252。但是,它们实际上是两种不同的编码:
>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'
如果你的数据库仅用于字节存储,你可以使用cp1252进行编码以转义“
和其他出现在Windows西方代码页中的字符。但是,其他未出现在cp1252中的Unicode字符将会导致错误。
你可以通过使用encode(..., 'ignore')
来抑制错误并丢弃这些字符,但实际上,在本世纪中,你应该在数据库和页面中都使用UTF-8编码。这种编码允许使用任何字符。最理想的情况是,你应该告诉MySQL你正在使用UTF-8字符串(通过设置数据库连接和字符串列的排序规则),这样它就可以正确地进行大小写不敏感的比较和排序。
cp1252
不是ISO-8859-1的严格超集吗?也就是说,当浏览器接收到一个ISO-8859-1页面时,它们可以将其呈现为CP1252,因为页面中不会有来自范围0x80-0x9F
的任何字符。 - MSalters最佳解决方案是:
按照以下注释操作(添加use_unicode=True
和charset="utf8"
)
db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim Mar 13 '14 at 17:04
详见:
class Connection(_mysql.connection):
"""MySQL Database Connection Object"""
default_cursor = cursors.Cursor
def __init__(self, *args, **kwargs):
"""
Create a connection to the database. It is strongly recommended
that you only use keyword parameters. Consult the MySQL C API
documentation for more information.
host
string, host to connect
user
string, user to connect as
passwd
string, password to use
db
string, database to use
port
integer, TCP/IP port to connect to
unix_socket
string, location of unix_socket to use
conv
conversion dictionary, see MySQLdb.converters
connect_timeout
number of seconds to wait before the connection attempt
fails.
compress
if set, compression is enabled
named_pipe
if set, a named pipe is used to connect (Windows only)
init_command
command which is run once the connection is created
read_default_file
file from which default client values are read
read_default_group
configuration group to use from the default file
cursorclass
class object, used to create cursors (keyword only)
use_unicode
If True, text-like columns are returned as unicode objects
using the connection's character set. Otherwise, text-like
columns are returned as strings. columns are returned as
normal strings. Unicode objects will always be encoded to
the connection's character set regardless of this setting.
charset
If supplied, the connection character set will be changed
to this character set (MySQL-4.1 and newer). This implies
use_unicode=True.
sql_mode
If supplied, the session SQL mode will be changed to this
setting (MySQL-4.1 and newer). For more details and legal
values, see the MySQL documentation.
client_flag
integer, flags to use or 0
(see MySQL docs or constants/CLIENTS.py)
ssl
dictionary or mapping, contains SSL connection parameters;
see the MySQL documentation for more details
(mysql_ssl_set()). If this is set, and the client does not
support SSL, NotSupportedError will be raised.
local_infile
integer, non-zero enables LOAD LOCAL INFILE; zero disables
autocommit
If False (default), autocommit is disabled.
If True, autocommit is enabled.
If None, autocommit isn't set and server default is used.
There are a number of undocumented, non-standard methods. See the
documentation for the MySQL C API for some hints on what they do.
"""
utf8mb4
。请参考 what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql。 - Cheney我希望你的数据库至少是UTF-8编码。在将字符串放入数据库之前,你需要运行yourstring.encode('utf-8')
。
import unicodedata
def strip_accents(text):
return "".join(char for char in
unicodedata.normalize('NFKD', text)
if unicodedata.category(char) != 'Mn')
strip_accents('áéíñóúü')
输出:
'aeinouu'
\u201c
,但该编码无法描述该代码点。你可能需要更改数据库使用utf-8,并使用适当的编码存储字符串数据,或者在存储内容之前对输入进行清理;例如使用像Sam Ruby的出色i18n指南这样的东西。它讨论了windows-1252
可能引起的问题,并建议如何处理它,以及链接到示例代码!SQLAlchemy的用户只需要将字段指定为convert_unicode=True
。
例子:
sqlalchemy.String(1000, convert_unicode=True)
SQLAlchemy会直接接受Unicode对象并处理编码,然后再返回它们。
Latin-1(又称ISO 8859-1)是一种单字节字符编码方案,而你无法将\u201c
(“
)放入一个字节中。
你是否意味着使用UTF-8编码?
u'He said \u201CHello\u201D'.encode('cp1252')
mysql.connector 的最新版本只有
db.set_charset_collation('utf8', 'utf8_general_ci')
而不是
db.set_character_set('utf8') //This feature is not available