SQLAlchemy与PostgreSQL和全文搜索

Question

SQLAlchemy与PostgreSQL和全文搜索

11

我正在使用flask，sqlalchemy和flask-sqlalchemy。我想使用gin和to_tsvector在postgres中创建一个完整的测试搜索索引。目前，我正在尝试以下内容。我认为这是我接近表达目标的方式，但它不起作用。

from sqlalchemy.ext.declarative import declared_attr
from sqlalchemy.schema import Index
from sqlalchemy.sql.expression import func

from app import db


class Post(db.Model):

    id = db.Column(db.Integer, primary_key=True)
    added = db.Column(db.DateTime, nullable=False)
    pub_date = db.Column(db.DateTime, nullable=True)
    content = db.Column(db.Text)

    @declared_attr
    def __table_args__(cls):
        return (Index('idx_content', func.to_tsvector("english", "content"), postgresql_using="gin"), )

这会抛出以下错误...

Traceback (most recent call last):
  File "./manage.py", line 5, in <module>
    from app import app, db
  File "/vagrant/app/__init__.py", line 36, in <module>
    from pep.models import *
  File "/vagrant/pep/models.py", line 8, in <module>
    class Post(db.Model):
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/flask_sqlalchemy.py", line 477, in __init__
    DeclarativeMeta.__init__(self, name, bases, d)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/api.py", line 48, in __init__
    _as_declarative(cls, classname, cls.__dict__)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/ext/declarative/base.py", line 222, in _as_declarative
    **table_kw)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 326, in __new__
    table._init(name, metadata, *args, **kw)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 393, in _init
    self._init_items(*args)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 63, in _init_items
    item._set_parent_with_dispatch(self)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/events.py", line 235, in _set_parent_with_dispatch
    self._set_parent(parent)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 2321, in _set_parent
    ColumnCollectionMixin._set_parent(self, table)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/schema.py", line 1978, in _set_parent
    self.columns.add(col)
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2391, in add
    self[column.key] = column
  File "/home/vagrant/.virtualenvs/pep/local/lib/python2.7/site-packages/sqlalchemy/sql/expression.py", line 2211, in __getattr__
    key)
AttributeError: Neither 'Function' object nor 'Comparator' object has an attribute 'key'

我也尝试过

return (Index('idx_content', "content", postgresql_using="gin"), )

然而，它不能像postgres一样工作（至少是9.1版本，因为我运行的是这个版本），因为它期望调用to_tsvector函数。这行代码生成SQL语句;

CREATE INDEX content_index ON post USING gin (content)

与其说是我想要的，

CREATE INDEX content_index ON post USING gin(to_tsvector('english', content))

我打开了一张工单，因为我认为这可能是一个错误/限制。http://www.sqlalchemy.org/trac/ticket/2605

- d0ugal

1

这是哪个 SQLAlchemy 版本？ - plaes

0.8.0 beta。然而，我也尝试了0.7.2（我认为是最新的稳定版本）。 - d0ugal

1

ењЁ0.8.0betaд№‹еђЋпјЊеє”з”Ёдє†дёЋcolumn.keyдёЋcolumn.nameдЅїз”Ёз›ёе…ізљ„дї®е¤ЌгЂ‚ - plaes

好的，谢谢你告诉我这个消息 - 我会留意它的下一个测试版/正式版发布。 - d0ugal

4个回答

3

在我创建一些单列和多列tsvector GIN索引时，遇到了这个旧问题。对于任何想要使用列名称的字符串表示来创建这些索引的人，下面是一种使用SQLAlchemy text() 构造的简单方法。

from sqlalchemy import Column, Index, Integer, String, text
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func


Base = declarative_base()

def to_tsvector_ix(*columns):
    s = " || ' ' || ".join(columns)
    return func.to_tsvector('english', text(s))

class Example(Base):
    __tablename__ = 'examples'

    id = Column(Integer, primary_key=True)
    atext = Column(String)
    btext = Column(String)

    __table_args__ = (
        Index(
            'ix_examples_tsv',
            to_tsvector_ix('atext', 'btext'),
            postgresql_using='gin'
            ),
        )

- benvc

2

简化版，使用ORM方法和sqlalchemy-utils中的TSVectorType助手（https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html）：

在ORM模型（声明式）中定义`TSVECTOR`列（`TSVectorType`），该列将自动从源文本字段中填充

import sqlalchemy as sa
from sqlalchemy_utils.types.ts_vector import TSVectorType
# ^-- https://sqlalchemy-utils.readthedocs.io/en/latest/_modules/sqlalchemy_utils/types/ts_vector.html


class MyModel(Base):
    __tablename__ = 'mymodel'
    id = sa.Column(sa.Integer, primary_key=True)
    content = sa.Column(sa.String, nullable=False)
    content_tsv = sa.Column(
        TSVectorType("content", regconfig="english"),
        sa.Computed("to_tsvector('english', \"content\")", persisted=True))
    #      ^-- equivalent for SQL:
    #   COLUMN content_tsv TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', "content")) STORED;

    __table_args__ = (
        # Indexing the TSVector column
        sa.Index("idx_mymodel_content_tsv", content_tsv, postgresql_using="gin"), 
    )

如何在TSVector列上进行查询？（使用`ts_rank`按TS排名返回结果）


tsquery = sa.func.plainto_tsquery("english", "multiple words query here")

stmt = sa.select(MyModel).where(
    MyModel.content_tsv.bool_op("@@")(tsquery)
).order_by(
    sa.func.ts_rank(MyModel.content_tsv, tsquery).desc()
).limit(5)

t, = db.session.execute(stmt).first()

注意使用 `SQLAlchemy 1.4` 和 `SQLAlchemy 2.0` 进行查询时的重要区别：

如果您尝试在 SQLAlchemy 1.4 中进行查询：

stmt = sa.select(MyModel).where(
    MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")
)

(如此处显示的那样，适用于 1.4 版本), 而不是:

tsquery = sa.func.plainto_tsquery("english", "multiple words query here")
#                 ^^^^^^^^^^^^^^^
stmt = sa.select(MyModel).where(
    MyModel.content_tsv.bool_op("@@")(tsquery)
)

它将转换为to_tsquery("english", "multiple words query here")，但此方法只接受一个单词，因此会引发SyntaxError: ProgrammingError: (psycopg2.errors.SyntaxError) syntax error in tsquery: "multiple words query here"。

这种行为在SQLAlchemy 2.0中已经改变，如此处所述：https://docs.sqlalchemy.org/en/20/dialects/postgresql.html#full-text-search。

从`SQLAlchemy 2.0`开始，`TSVColumn.match(...)`使用`plainto_tsquery(...)`：

MyModel.content_tsv.match("multiple words query here", postgresql_regconfig="english")

will translate to:

plainto_tsquery("english", "multiple words query here")
^^^^^^^^^^^^^^^
# instead of
# to_tsquery("english", "multiple words query here")
# which is NOT the same

- Jean Monet

1

因此，在SQLAlchemy 0.9及更高版本中，这样做是有效的：

class Content(Base, ):
    __tablename__ = 'content'

    id = sa.Column(sa.Integer, primary_key=True)

    description = sa.Column(sa.UnicodeText, nullable=False, server_default='')
    @declared_attr
    def __table_args__(cls):
        return (sa.Index('idx_content',
                     sa.sql.func.to_tsvector("english", cls.description),
                     postgresql_using="gin"), )

值得注意的是，与第一个示例不同之处在于直接引用列名，而不是提供带引号的列名，因为那样无法工作。

- Damian

我正在尝试在0.9.8上运行它，但是出现了错误：sqlalchemy.exc.ArgumentError: Can't add unnamed column to column collection。 - Kiran Jonnalagadda

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- d0ugal · Accepted Answer

目前我已经手动添加了以下代码行来完成此操作，但如果有正确的SQLAlchemy方法的话，我更愿意使用它。

create_index = DDL("CREATE INDEX idx_content ON pep USING gin(to_tsvector('english', content));")
event.listen(Pep.__table__, 'after_create', create_index.execute_if(dialect='postgresql'))

在SQLAlchemy的bug追踪器中有一些有趣的讨论。看起来这是目前索引定义的局限性。基本上，我的要求是允许索引成为表达式而不仅仅是列名，但目前还不支持。该问题正在跟踪这个功能请求：http://www.sqlalchemy.org/trac/ticket/695。但是，需要一个开发者来推动并完成这项工作（已经等待了一段时间）。

SQLAlchemy与PostgreSQL和全文搜索

在ORM模型（声明式）中定义TSVECTOR列（TSVectorType），该列将自动从源文本字段中填充

如何在TSVector列上进行查询？（使用ts_rank按TS排名返回结果）

注意使用 SQLAlchemy 1.4 和 SQLAlchemy 2.0 进行查询时的重要区别：

从SQLAlchemy 2.0开始，TSVColumn.match(...)使用plainto_tsquery(...)：

在ORM模型（声明式）中定义`TSVECTOR`列（`TSVectorType`），该列将自动从源文本字段中填充

如何在TSVector列上进行查询？（使用`ts_rank`按TS排名返回结果）

注意使用 `SQLAlchemy 1.4` 和 `SQLAlchemy 2.0` 进行查询时的重要区别：

从`SQLAlchemy 2.0`开始，`TSVColumn.match(...)`使用`plainto_tsquery(...)`：