sqlite3.ProgrammingError: 除非您使用能够解释8位字节字符串的text_factory，否则不得使用8位字节字符串。

Question

sqlite3.ProgrammingError: 除非您使用能够解释8位字节字符串的text_factory，否则不得使用8位字节字符串。

92

我正在使用Python中的SQLite3，尝试存储一个UTF-8 HTML代码片段的压缩版本。

代码如下：

...
c = connection.cursor()
c.execute('create table blah (cid integer primary key,html blob)')
...
c.execute('insert or ignore into blah values (?, ?)',(cid, zlib.compress(html)))

你在哪个步骤遇到了错误：

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

如果我使用“text”而不是“blob”，并且不压缩HTML片段，那么一切都正常（尽管数据库太大）。当我使用“blob”并通过Python zlib库进行压缩时，就会出现上述错误消息。我查找了一下，但没有找到简单的答案。

- R. Hill

5个回答

35

找到了解决方案，我应该花更多的时间搜索。

解决方案是将值转换为Python的“缓冲区”，像这样：

c.execute('insert or ignore into blah values (?, ?)',(cid, buffer(zlib.compress(html))))

希望这能帮到其他人。

- R. Hill

1

当我这样做时，我的数据库中充满了base36文本，这会使数据库比直接存储blob更大。 - Brian Minton

3

文档中提到了，你应该使用sqlite3.Binary而不是现在这个。 - MarioVilas

看起来sqlite3.Binary()只是buffer()的别名，至少在https://github.com/ghaering/pysqlite/blob/master/lib/dbapi2.py#L54中是这样。 - stevegt

哦，看起来pysqlite文档的这一部分实际上鼓励使用buffer()函数：“因此，以下Python类型可以毫无问题地发送到SQLite：... '[Python类型] buffer ... [SQLite类型] BLOB'” https://docs.python.org/2/library/sqlite3.html#introduction - stevegt

35

要使用BLOB类型，您必须首先将zlib压缩的字符串转换为二进制数据，否则sqlite将尝试将其处理为文本字符串。可以通过使用sqlite3.Binary()来完成此操作。例如：

c.execute('insert or ignore into blah values (?, ?)',(cid, 
sqlite3.Binary(zlib.compress(html))))

- MarioVilas

这个可以工作。但是，我想知道为什么需要这样做。类型“BLOB”不已经表示该列中的数据是二进制的吗？请注意，在Python 2中，字符串可以是文本或二进制的。sqlite3难道不应该将对象（zlib压缩字符串）视为BLOB类型的二进制吗？ - user1783732

我认为Python没有将整个数据库模式存储在内存中以便查询正确的数据类型 - 很可能它只是根据您传递的内容在运行时猜测类型，因此无法区分二进制字符串和文本字符串。 - MarioVilas

由于SQLite使用动态类型：https://www.sqlite.org/datatype3.html @user1783732 - Lester Cheung

1

语法：

有五种可能的存储类型：NULL、INTEGER、TEXT、REAL和BLOB

BLOB通常用于存储pickle模型或dill pickle模型

> cur.execute('''INSERT INTO Tablename(Col1, Col2, Col3, Col4) VALUES(?,?,?,?)''', 
                                      [TextValue, Real_Value, Buffer(model), sqlite3.Binary(model2)])
> conn.commit()

> # Read Data:
> df = pd.read_sql('SELECT * FROM Model, con=conn) 
> model1 = str(df['Col3'].values[0]))
> model2 = str(df['Col'].values[0]))

- Pranzell

0

你可以使用 repr(html) 存储值，而不是使用原始输出，然后在检索值以供使用时使用 eval(html)。

c.execute('insert or ignore into blah values (?, ?)',(1, repr(zlib.compress(html))))

- zwalker

1

使用 eval 和 repr 是非常不规范的。无论您信任数据源有多少。 - Jason Fried

我同意，在这里任何东西都比eval()更好。正确的解决方案是使用sqlite3.Binary，但如果由于某些原因无法使用，则最好以更安全的方式对数据进行编码 - 例如使用base64。 - MarioVilas

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- zag · Accepted Answer

如果你想在 sqlite3 中使用 8 位字符串而不是 Unicode 字符串，请为 sqlite 连接设置适当的 text_factory:

connection = sqlite3.connect(...)
connection.text_factory = str