Django MPTT Postgres更新查询运行缓慢

8

我在一个模型中使用mptt来管理标签系统(每个标签都有一个可选的TreeForeignKey到“父”标签)

每当我需要保存标签模型时,以下查询运行异常缓慢(超过45秒)

 UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + %s) WHERE "taxonomy_taxonomy"."tree_id" > %s 

我会通过自动化标记系统来发送文章内容,该系统可以生成多达20个标签。显然,这是不可取的 :)

我添加了db_index=False,希望能够改变写入时间(读取似乎不是问题),但问题仍然存在。

以下是相关模型:

class Taxonomy(MPTTModel):

parent = TreeForeignKey('self',blank=True,null=True,related_name='children',verbose_name='Parent', db_index=False)
parent_name = models.CharField(max_length=64, blank=True, null=True, editable=False)
name = models.CharField(verbose_name='Title', max_length=100, db_index=True)
slug = models.SlugField(verbose_name='Slug', blank=True)
primary = models.BooleanField(
    verbose_name='Is Primary',
    default=False,
    db_index=True,
)
type = models.CharField(max_length=30, db_index=True)
created_date = models.DateTimeField(auto_now_add=True, null=True)
updated_date = models.DateTimeField(auto_now=True, null=True)
publication_date = models.DateTimeField(null=True, blank=True)
scheduled_date = models.DateTimeField(null=True, blank=True)
workflowstate = models.CharField(max_length=30, default='draft')
created_by = models.ForeignKey(User, null=True)

paid_content = models.BooleanField(verbose_name='Is Behind the Paywall', default=False, blank=True)
publish_now = True
show_preview = False
temporary = models.BooleanField(default=False)

def save(self, *args, **kwargs):

    if self.slug is None:
        self.slug = self.name

    if not self.slug:
        self.slug = slugify(self.name)[:50]

    if self.parent:
        self.parent_name = self.parent.name

    self.slug = slugify(self.slug)
    self.workflowstate = "published"

    super(Taxonomy, self).save(*args, **kwargs)

    store_to_backend_mongo(self)
    publish_to_frontend(self)

查询计划(由New Relic报告):

1) Update on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)

2) -> Seq Scan on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)

3) Filter: ?

最后,是在这样的查询中的回溯信息:

    Traceback (most recent call last):
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 711, in __iter__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 1087, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/base.py", line 132, in get_response
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/framework_django.py", line 499, in wrapper
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/contrib/auth/decorators.py", line 22, in _wrapped_view
File "./editorial/views.py", line 242, in calculate_queryly
File "./editorial/views.py", line 292, in queryly_function
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/manager.py", line 127, in manager_method
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 348, in create
File "./taxonomy/models.py", line 179, in save
File "./taxonomy/models.py", line 58, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 946, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 702, in insert_at
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 467, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 491, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 726, in _create_tree_space
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 364, in _mptt_update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 563, in update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/database_dbapi2.py", line 22, in execute

有什么办法可以让这些模型保存的速度更快吗?

附加信息: 这是在Postgres中使用psycopg2引擎。 'ENGINE': 'django.db.backends.postgresql_psycopg2'。

第二次编辑: 按照要求,我使用EXPLAIN ANALYZE运行了查询,结果如下:

nj=# EXPLAIN ANALYZE UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + 1) WHERE "taxonomy_taxonomy"."tree_id" > 1;

                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Update on taxonomy_taxonomy  (cost=0.00..9588.75 rows=24582 width=132) (actual time=258718.550..258718.550 rows=0 loops=1)
   ->  Seq Scan on taxonomy_taxonomy  (cost=0.00..9588.75 rows=24582 width=132) (actual time=59.956..8271.209 rows=24582 loops=1)
         Filter: (tree_id > 1)
         Rows Removed by Filter: 2
 Planning time: 28.763 ms
 Execution time: 258718.661 ms
(6 rows)

你使用的是哪个数据库?如果是MySQL,使用的是什么引擎,InnoDB还是MyISAM? - Andrei Avram
啊,我应该包含那个的。我现在会编辑它。我正在使用Postgres。从settings.py文件中:'ENGINE': 'django.db.backends.postgresql_psycopg2'。 - Robert Townley
你应该在psql上运行相同的查询,使用EXPLAIN ANALYZE,并将输出添加到这里(new relic的输出不完整)。 - Tommaso Barbugli
谢谢Tommaso,我已经做了。 我是否正确地理解,这是在遍历表格的25k行时飞快地执行,迭代所有行?我明白为什么会这样做(tree_id将相关元素设置为彼此相邻),但这是否意味着tree_id字段不适合我的具有25k行的表格的目的? - Robert Townley
你确定 db_index=False 能够提高性能吗?数据库仍然需要通过 WHERE 子句查找要更新的行。 - Adrián
2个回答

9

django mptt使用嵌套集模型进行操作,具体信息可参考这篇文章

因此,如果你的保存方法导致插入操作,django-mptt需要重新计算大量数据。这在大型表上是行不通的。

您必须放弃使用django-mptt并发明自己的数据库模式。


这还是真的吗?MPTT仍在使用嵌套集模型吗? - lingxiao
MPTT是一种在数据库中存储分层数据的技术。其目的是使检索操作非常高效。这种效率的权衡是,执行插入和移动树中的项目更加复杂,因为需要进行一些额外的工作来始终保持树结构处于良好状态。 - Yevgeniy Shchemelev

1

当您修改树时,整个表格似乎会有很多更新。在postgres上,这将导致许多已删除的行实际上并没有真正删除,除非您执行vacuum full。我们经历了一张表格被吹起来后,在进行清理操作后,其大小缩小到了0.3%。相应地,性能大大提高。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接