有没有可能提高查询速度？

Question

有没有可能提高查询速度？

3

我是SQL新手，需要帮助。我有4个表：

helmet                                  arm
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  head1  |  5  |   2.2  |       |   1  |   arm1  |  4  |   2.7  |
|   2  |  head2  |  6  |   2.9  |       |   2  |   arm2  |  5  |   3.1  |
|   3  |  head3  |  7  |   3.5  |       |   3  |   arm3  |  2  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+

body                                    leg
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  body1  |  10  |  5.5  |       |   1  |   leg1  |  8  |   3.5  |
|   2  |  body2  |  5   |  2.4  |       |   2  |   leg2  |  5  |   2.0  |
|   3  |  body3  |  17  |  6.9  |       |   3  |   leg3  |  8  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+`

我希望找到总重量小于等于输入值的最高 totaldef
例如：totalweight <= 10

查询：

select 
    helmet.name as hname, body.name as bname, 
    arm.name as aname, leg.name as lname,
    helmet.poise + body.poise + arm.poise + leg.poise as totalpoise, 
    helmet.weight + body.weight + arm.weight + leg.weight as totalweight 
from 
    helmet 
inner join 
    body on 1=1
inner join 
    arm on 1=1
inner join 
    leg on 1=1 
where 
    helmet.weight + body.weight + arm.weight + leg.weight <= 10
order by 
    totalpoise desc 
limit 5

结果：

+-------+-------+-------+-------+----------+-------------+
| hname | bname | aname | lname | totaldef | totalweight |
+-------+-------+------ +-------+----------+-------------+
| head2 | body2 |  arm1 |  leg3 |    23    |     9.8     |
| head1 | body2 |  arm2 |  leg3 |    23    |     9.5     |
| head3 | body2 |  arm3 |  leg3 |    22    |     9.5     |
| head1 | body2 |  arm1 |  leg3 |    22    |     9.1     |
| head2 | body2 |  arm3 |  leg3 |    21    |     8.9     |
+-------+-------+-------+-------+----------+-------------+

每个表格大约有100行，因此可能的结果是1亿多行。查询需要很长时间。我不确定这与我的硬件、数据库类型或查询有关。

附注：我使用HDD硬盘并拥有8GB的内存。我已在MySQL和PostgreSQL上进行了测试。

更新：我还没有创建索引。

这是解释计划吗？ explain plan 需要多长时间？这取决于输入。在MySQL上，需要几分钟到几小时。在PostgreSQL上，需要约30秒至2分钟。

更新2：我的表格从不更改。那么我可以将所有结果存储在一个表格中吗？这有帮助吗？

更新3：我考虑分区。这可能会快得多，但问题是如果某个下分区中的[装甲套装]的总防御值高于上分区中的[装甲套装]。例如：

[head1,arm1,body1,leg1][totaldef 25][totalweight 9.9]
[head2,arm2,body2,leg2][totaldef 20][totalweight 11.0]

因为它位于其他分区，所以分区总重量>10会错过那个[装甲套装]。

这是CSV文件，供任何想要测试的人使用。CSV文件

更新4 我认为最快的方法是创建物化视图，但我猜性能的关键是对其进行排序。我不知道哪种排序可以帮助物化视图或索引，但我将它们都排序了，这很有帮助。

我没想到会得到这么多帮助。谢谢。

- slowbs

2

你错过了最重要的信息：索引？？？ - juergen d

1

如果是我，我可能只会有一个表，而不是四个。 - Strawberry

如果你这样做，那么你用来查找这四个名字的查询语句是什么？ - devoh

1

这个表是否经常变化？如果不是，那么您可以在其笛卡尔积上创建具有适当索引的物化视图，并使用它来替代原表。 - Abelisto

1

如果性能对您最重要，请将结果存储在新表中，该表将包含类似于 helm_id,body_id,arm_id,leg_id,weight_sum,def_sum 的结构。 - Lemjur

显示剩余7条评论

4个回答

2

使用适当索引的物化视图在我的老旧SSD台式机上表现相当不错，仅需1.8秒，使用Postgresql默认配置：

一个物化视图与适当索引结合，性能表现良好。

create materialized view v as
select
    h.name as hname, b.name as bname, a.name as aname, l.name as lname,
    total_poise, total_weight
from
    helmet h
    cross join
    body b
    cross join
    arm a
    cross join
    leg l
    cross join lateral (
        select
            h.weight + b.weight + l.weight + a.weight as total_weight,
            h.poise + b.poise + l.poise + a.poise as total_poise
    ) total
order by total_poise desc, total_weight
;

create index v_index on v (total_poise desc, total_weight);

执行和分析：

select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
         hname         |          bname           |         aname          |          lname           | total_poise | total_weight 
-----------------------+--------------------------+------------------------+--------------------------+-------------+--------------
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.4
 Fume Sorcerer Mask+10 | Lion Warrior Cape+10     | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Red Lion Warrior Cape+10 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Lion Warrior Skirt+10    |          20 |          9.6
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Moon Butterfly Skirt+10  |          20 |          9.6


explain analyze
select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..11.71 rows=5 width=88) (actual time=1847.680..1847.694 rows=5 loops=1)
   ->  Index Scan using v_index on v  (cost=0.57..11191615.70 rows=5020071 width=88) (actual time=1847.678..1847.691 rows=5 loops=1)
         Index Cond: (total_weight <= '10'::double precision)
 Planning time: 0.126 ms
 Execution time: 1847.722 ms

- Clodoaldo Neto

你忘记了主要条件 'totalweight <= 10'。 - devoh

@devoh 这行代码包含了所有的条件: where 10 >= all (array[total_weight, h.weight, b.weight, l.weight, a.weight]) - Clodoaldo Neto

你说得对。我的错。我会将你的查询添加到我的基准测试中。204.438秒。 - devoh

这似乎是我尝试过的最快的。当输入在0-10范围内时仍然很慢，但只需要15-20秒，它就会使我的硬盘驱动器使用率达到100％。因此，我认为如果我更换SSD，它可以更快。 - slowbs

2

因为您的表不会发生变化，所以可以缓存中间数据。对于PostgreSQL来说，可以使用物化视图：

create materialized view equipments as
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;
create index i_def on equipments(total_def);
create index i_weight on equipments(total_weight);

这是一次性的繁重操作，但之后可以查询类似以下内容的信息：

select *
from equipments
where total_weight <= 10
order by total_def desc
limit 5;

将会更快。当然，您可以将您的表格连接到上述查询中以获取有关设备的详细信息。

如果表格已更改，则可以调用REFRESH MATERIALIZED VIEW。

我不熟悉MySQL，但是您可以搜索mysql materialized view或仅使用常规表格。

另一种尝试：分区。

(如果在之前的尝试中创建了物化视图equipments，则使用drop materialized view equipments命令删除它)

create table equipments(
  helmet_id int, arm_id int, body_id int, leg_id int,
  total_weight float, total_def float);

这是基本表格。接下来我们将创建分区。例如，如果最大总重量为40，则有四个分区用于0-10、10-20、20-30和30-40的总重量：

create table equipments_10 (check (total_weight>0 and total_weight<=10))
  inherits (equipment); 
create table equipments_20 (check (total_weight>10 and total_weight<=20))
  inherits (equipment); 
create table equipments_30 (check (total_weight>20 and total_weight<=30))
  inherits (equipment); 
create table equipments_40 (check (total_weight>30))
  inherits (equipment);

填充我们的表格：

insert into equipments
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;

并创建大量索引，以便让PostgreSQL有机会选择最有效的执行计划：

create index i_equip_total_def on equipments(total_def);
create index i_equip_total_weight on equipments(total_weight); 
create index i_equip_10_total_def on equipments_10(total_def);
create index i_equip_10_total_weight on equipments_10(total_weight); 
create index i_equip_20_total_def on equipments_20(total_def);
create index i_equip_20_total_weight on equipments_20(total_weight); 
create index i_equip_30_total_def on equipments_30(total_def);
create index i_equip_30_total_weight on equipments_30(total_weight); 
create index i_equip_40_total_def on equipments_40(total_def);
create index i_equip_40_total_weight on equipments_40(total_weight);

最后对数据进行统计：

analyze equipments;
analyze equipments_10;
analyze equipments_20;
analyze equipments_30;
analyze equipments_40;

查询与之前尝试的类似。

PS：如果有人想尝试，这里是我的测试。 PPS：在我的测试中，每个查询，无论参数如何，都不到0.5毫秒（在我史前的硬件上）。

- Abelisto

我尝试了一下，按 total_def 升序排序速度确实更快。但是按降序排序时，只有在输入高值（如30或以上）时才很快，如果输入小于30，则非常慢。 - slowbs

@slowbs 是的，我看到了问题。已经向答案中添加了另一个解决方案。 - Abelisto

1

仅供娱乐和完整性：在统一表上进行递归解决方案。这可能不是最快的，但如果表格变得更大并且可以使用索引，则可能获胜。（像3*3*3*3这样的简单示例通常会产生哈希连接计划，甚至嵌套表扫描）

-- the data
CREATE TABLE helmet(id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO helmet(id, name, poise, weight) VALUES
(   1, 'head1', 5, 2.2) ,(   2, 'head2', 6, 2.9) ,(   3, 'head3', 7, 3.5) ;

CREATE TABLE body (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO body(id, name, poise, weight) VALUES
 (   1, 'body1', 10, 5.5) ,(   2, 'body2', 5 , 2.4) ,(   3, 'body3', 17, 6.9) ;

CREATE TABLE arm (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO arm(id, name, poise, weight) VALUES
 (   1, 'arm1', 4, 2.7) ,(   2, 'arm2', 5, 3.1) ,(   3, 'arm3', 2, 1.8) ;

CREATE TABLE leg (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO leg(id, name, poise, weight) VALUES
 (   1, 'leg1', 8, 3.5) ,(   2, 'leg2', 5, 2.0) ,(   3, 'leg3', 8, 1.8) ;


-- combine the four tables into one
CREATE table allgear AS
SELECT 1 AS gid, 'helmet' AS gear, h.id, h.name, h.poise, h.weight FROM helmet h
UNION ALL
SELECT 2 AS gid, 'body' AS gear, b.id, b.name, b.poise, b.weight FROM body b
UNION ALL
SELECT 3 AS gid, 'arm' AS gear, a.id, a.name, a.poise, a.weight FROM arm a
UNION ALL
SELECT 4 AS gid, 'leg' AS gear, l.id, l.name, l.poise, l.weight FROM leg l
        ;

-- add som structure ...
ALTER TABLE allgear ADD PRIMARY KEY(gid, id);
CREATE INDEX ON allgear(gid, weight);
VACUUM ANALYZE allgear;

-- SELECT * FROM allgear ORDER by gid, id;


-- Recursive query with some pruning on the partial results.
-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid +1 AND (rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)
        )
SELECT * FROM rrr
WHERE gid = 4 -- the gid of the final one
ORDER BY totweight DESC
LIMIT 5
        ;

结果：

 gid |           arr           | totpoise | totweight 
-----+-------------------------+----------+-----------
   4 | {head2,body2,arm1,leg2} |       20 |     10.00
   4 | {head1,body2,arm3,leg1} |       20 |      9.90
   4 | {head2,body2,arm1,leg3} |       23 |      9.80
   4 | {head3,body2,arm3,leg2} |       19 |      9.70
   4 | {head1,body2,arm2,leg2} |       20 |      9.70
(5 rows)

注意：我得到了更多的组合，可能是因为我使用了 DECIMAL(4,2) 而不是浮点类型。

额外信息：如果我们知道剩余层级（齿轮类型）将添加的最小重量，甚至在较低层级中，我们可以添加一些额外的修剪。我为此添加了一个额外的表格。

CREATE TABLE minima AS
SELECT gid, MIN(weight) AS mimi
FROM allgear
GROUP BY gid;
-- add an extra level ...
INSERT INTO minima(gid, mimi) VALUES (5, 0.0);

-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid+1
        -- Do some extra pruning: Partial sum + the missing parts should not sum up to more than 10
        JOIN LATERAL ( SELECT SUM(mimi) AS debt
                FROM minima
                WHERE gid > ag.gid
                ) susu ON (susu.debt +rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)

        )
SELECT * FROM rrr
WHERE gid = 4
ORDER BY totweight DESC
LIMIT 5
        ;

- wildplasser

我尝试了这个，它让我的硬盘运行到了100%。当输入在0-10范围内时速度更快，但在其他情况下仍然很慢。我认为如果我换成SSD可能会更快。我还没有尝试过额外的东西。 - slowbs

递归方法依赖于（早期）剪枝。如果上层不能剪枝，它仍需要探索几乎整个NNN*N树。 - wildplasser

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- devoh · Accepted Answer

非常有趣的问题。我不知道针对您的情况是否有特殊的方法。如果我是您，我会尝试以下方法：身体似乎比头盔、手臂和腿重。因此，我将首先查询该表，然后在每个连接上进行查询，并确保重量总和不超过您输入的值。如下所示：

SELECT helmet.name AS hname, body.name AS bname, arm.name AS aname, leg.name AS lname,
helmet.poise + body.poise + arm.poise + leg.poise AS totalpoise, 
helmet.weight + body.weight + arm.weight + leg.weight AS totalweight 
FROM body 
    INNER JOIN helmet 
    ON 1=1 
        AND body.weight + helmet.weight <= 10
    INNER JOIN arm 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight <= 10
    INNER JOIN leg 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight + leg.weight <= 10
WHERE body.weight <= 10
ORDER BY totalpoise DESC limit 5

正如@juergen-d在评论中提到的那样，索引会影响性能。您可以对每个权重列测试有无索引的差异。

对于PostgreSQL：

CREATE INDEX index_body_on_weight ON body(weight);

在与zerkms和Laurenz Albe的讨论后，他们一致认为这三个索引是无用的，不应使用：（如果我有时间，我会做一个基准测试）。

CREATE INDEX index_helmet_on_weight ON helmet(weight);
CREATE INDEX index_arm_on_weight ON arm(weight);
CREATE INDEX index_leg_on_weight ON leg(weight);

在PostgreSQL 9.3.5上的基准测试：

 slowbs's Query : 107.628 second
 my proposition Query : 12.066 second
 my proposition Query : 16.257 second (with only index_body_on_weight)
 my proposition Query : 13.217 second (with 4 indexes)

基准测试结论：在这种情况下，索引是低效的。@zerkms和@Laurenz Albe是正确的。

最后，请分享您的结果。