优化Postgres查询

3
                                 QUERY PLAN                                   
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=32164.87..32164.89 rows=1 width=44) (actual time=221552.831..221552.831 rows=0 loops=1)
   ->  Sort  (cost=32164.87..32164.87 rows=1 width=44) (actual time=221552.827..221552.827 rows=0 loops=1)
         Sort Key: t.date_effective, t.acct_account_transaction_id, p.method, t.amount, c.business_name, t.amount
         ->  Nested Loop  (cost=22871.67..32164.86 rows=1 width=44) (actual time=221552.808..221552.808 rows=0 loops=1)
               ->  Nested Loop  (cost=22871.67..32160.37 rows=1 width=52) (actual time=221431.071..221546.619 rows=670 loops=1)
                     ->  Nested Loop  (cost=22871.67..32157.33 rows=1 width=43) (actual time=221421.218..221525.056 rows=2571 loops=1)
                           ->  Hash Join  (cost=22871.67..32152.80 rows=1 width=16) (actual time=221307.382..221491.019 rows=2593 loops=1)
                                 Hash Cond: ("outer".acct_account_id = "inner".acct_account_fk)
                                 ->  Seq Scan on acct_account a  (cost=0.00..7456.08 rows=365008 width=8) (actual time=0.032..118.369 rows=61295 loops=1)
                                 ->  Hash  (cost=22871.67..22871.67 rows=1 width=16) (actual time=221286.733..221286.733 rows=2593 loops=1)
                                       ->  Nested Loop Left Join  (cost=0.00..22871.67 rows=1 width=16) (actual time=1025.396..221266.357 rows=2593 loops=1)
                                             Join Filter: ("inner".orig_acct_payment_fk = "outer".acct_account_transaction_id)
                                             Filter: ("inner".link_type IS NULL)
                                             ->  Seq Scan on acct_account_transaction t  (cost=0.00..18222.98 rows=1 width=16) (actual time=949.081..976.432 rows=2596 loops=1)
                                                   Filter: ((("type")::text = 'debit'::text) AND ((transaction_status)::text = 'active'::text) AND (date_effective >= '2012-03-01'::date) AND (date_effective < '2012-04-01 00:00:00'::timestamp without time zone))
                                             ->  Seq Scan on acct_payment_link l  (cost=0.00..4648.68 rows=1 width=15) (actual time=1.073..84.610 rows=169 loops=2596)
                                                   Filter: ((link_type)::text ~~ 'return_%'::text)
                           ->  Index Scan using contact_pk on contact c  (cost=0.00..4.52 rows=1 width=27) (actual time=0.007..0.008 rows=1 loops=2593)
                                 Index Cond: (c.contact_id = "outer".contact_fk)
                     ->  Index Scan using acct_payment_transaction_fk on acct_payment p  (cost=0.00..3.02 rows=1 width=13) (actual time=0.005..0.005 rows=0 loops=2571)
                           Index Cond: (p.acct_account_transaction_fk = "outer".acct_account_transaction_id)
                           Filter: ((method)::text <> 'trade'::text)
               ->  Index Scan using contact_role_pk on contact_role  (cost=0.00..4.48 rows=1 width=4) (actual time=0.007..0.007 rows=0 loops=670)
                     Index Cond: ("outer".contact_id = contact_role.contact_fk)
                     Filter: (exchange_fk = 74)
Total runtime: 221553.019 ms

1
这里是更易读格式的计划:http://explain.depesz.com/s/12r - user330315
1
你应该重新编写SQL查询,仅使用显式JOIN语法。混合使用隐式联接和显式联接是一个不好的做法。 - user330315
从...,acct_account a,acct_payment p,我没有看到这些表的任何连接字段。可能会导致笛卡尔积。 - wildplasser
除非表为空(actual rows=0),否则它不是笛卡尔积。 - vyegorov
4个回答

4
您的问题在这里:
->  Nested Loop Left Join  (cost=0.00..22871.67 rows=1 width=16) (actual time=1025.396..221266.357 rows=2593 loops=1)
    Join Filter: ("inner".orig_acct_payment_fk = "outer".acct_account_transaction_id)
    Filter: ("inner".link_type IS NULL)
        ->  Seq Scan on acct_account_transaction t  (cost=0.00..18222.98 rows=1 width=16) (actual time=949.081..976.432 rows=2596 loops=1)
                Filter: ((("type")::text = 'debit'::text) AND ((transaction_status)::text = 'active'::text) AND (date_effective >= '2012-03-01'::date) AND (date_effective   
            Seq Scan on acct_payment_link l  (cost=0.00..4648.68 rows=1 width=15) (actual time=1.073..84.610 rows=169 loops=2596)
                Filter: ((link_type)::text ~~ 'return_%'::text)

它期望在acct_account_transaction表中找到1行记录,但实际上找到了2596行,另一个表也是如此。

您没有提及您的Postgres版本(可以吗?),但这应该可以解决问题:

SELECT DISTINCT
    t.date_effective,
    t.acct_account_transaction_id,
    p.method,
    t.amount,
    c.business_name,
    t.amount
FROM
    contact c inner join contact_role on (c.contact_id=contact_role.contact_fk and contact_role.exchange_fk=74),
    acct_account a, acct_payment p,
    acct_account_transaction t
WHERE
    p.acct_account_transaction_fk=t.acct_account_transaction_id
    and t.type = 'debit'
    and transaction_status = 'active'
    and p.method != 'trade'
    and t.date_effective >= '2012-03-01'
    and t.date_effective < (date '2012-03-01' + interval '1 month')
    and c.contact_id=a.contact_fk and a.acct_account_id = t.acct_account_fk
    and not exists(
         select * from acct_payment_link l 
           where orig_acct_payment_fk == acct_account_transaction_id 
           and link_type like 'return_%'
    )
ORDER BY
    t.date_effective DESC

此外,尝试为相关列设置适当的统计目标。友好手册链接: http://www.postgresql.org/docs/current/static/sql-altertable.html

0

我撤回我的第一个建议,因为它改变了查询的性质。

我发现在 LEFT JOIN 中花费了太多时间。

第一件事情是尝试只扫描一次 `acct_payment_link` 表。你可以尝试重写查询为:
``` ... LEFT JOIN (SELECT * FROM acct_payment_link WHERE link_type LIKE 'return_%') AS l ... ```
你应该检查一下统计数据,因为计划中的行数和返回的行数之间存在差异。
你没有包括表和索引的定义,最好看一下这些。
你可能还想使用 contrib/pg_tgrm 扩展来在 `acct_payment_link.link_type` 上建立索引,但我建议将其作为最后一个尝试的选项。

顺便问一下,你正在使用的PostgreSQL版本是什么?


1
但这将有效地将左连接更改为内连接。 - user330315

0

你的索引是什么,最近有没有进行分析?尽管在该表上有几个条件:

  • 类型
  • 生效日期

它仍在对 acct_account_transaction 进行表扫描。如果这些列上没有索引,则一个复合索引 (类型,生效日期) 可以帮助优化(假设有很多行不符合这些条件)。


0

您的语句已经重写并格式化:

SELECT DISTINCT
       t.date_effective,
       t.acct_account_transaction_id,
       p.method,
       t.amount,
       c.business_name,
       t.amount
FROM   contact                  c
JOIN   contact_role            cr ON cr.contact_fk = c.contact_id
JOIN   acct_account             a ON a.contact_fk = c.contact_id 
JOIN   acct_account_transaction t ON t.acct_account_fk = a.acct_account_id 
JOIN   acct_payment             p ON p.acct_account_transaction_fk
                                   = t.acct_account_transaction_id
LEFT   JOIN acct_payment_link   l ON orig_acct_payment_fk
                                   = acct_account_transaction_id
                                        -- missing table-qualification!
                                 AND link_type like 'return_%'
                                        -- missing table-qualification!
WHERE  transaction_status = 'active'    -- missing table-qualification!
AND    cr.exchange_fk = 74
AND    t.type = 'debit'
AND    t.date_effective >= '2012-03-01'
AND    t.date_effective <  (date '2012-03-01' + interval '1 month')
AND    p.method != 'trade'
AND    l.link_type IS NULL
ORDER  BY t.date_effective DESC;
  • 明确的JOIN语句更可取。我根据你的JOIN逻辑重新排列了你的表。

  • 为什么要用(date '2012-03-01' + interval '1 month')而不是日期'2012-04-01'

  • 一些表格限定条件缺失。在像这样复杂的语句中,这是不好的风格。可能会隐藏错误。

性能的关键在于适当的索引,正确的PostgreSQL配置和准确的统计数据

PostgreSQL维基中有关性能调优的通用建议。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接