有没有可能提高查询速度?

3

我是SQL新手,需要帮助。我有4个表:

helmet                                  arm
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  head1  |  5  |   2.2  |       |   1  |   arm1  |  4  |   2.7  |
|   2  |  head2  |  6  |   2.9  |       |   2  |   arm2  |  5  |   3.1  |
|   3  |  head3  |  7  |   3.5  |       |   3  |   arm3  |  2  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+

body                                    leg
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  body1  |  10  |  5.5  |       |   1  |   leg1  |  8  |   3.5  |
|   2  |  body2  |  5   |  2.4  |       |   2  |   leg2  |  5  |   2.0  |
|   3  |  body3  |  17  |  6.9  |       |   3  |   leg3  |  8  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+`

我希望找到总重量小于等于输入值的最高 totaldef
例如:totalweight <= 10

查询:

select 
    helmet.name as hname, body.name as bname, 
    arm.name as aname, leg.name as lname,
    helmet.poise + body.poise + arm.poise + leg.poise as totalpoise, 
    helmet.weight + body.weight + arm.weight + leg.weight as totalweight 
from 
    helmet 
inner join 
    body on 1=1
inner join 
    arm on 1=1
inner join 
    leg on 1=1 
where 
    helmet.weight + body.weight + arm.weight + leg.weight <= 10
order by 
    totalpoise desc 
limit 5

结果:

+-------+-------+-------+-------+----------+-------------+
| hname | bname | aname | lname | totaldef | totalweight |
+-------+-------+------ +-------+----------+-------------+
| head2 | body2 |  arm1 |  leg3 |    23    |     9.8     |
| head1 | body2 |  arm2 |  leg3 |    23    |     9.5     |
| head3 | body2 |  arm3 |  leg3 |    22    |     9.5     |
| head1 | body2 |  arm1 |  leg3 |    22    |     9.1     |
| head2 | body2 |  arm3 |  leg3 |    21    |     8.9     |
+-------+-------+-------+-------+----------+-------------+

每个表格大约有100行,因此可能的结果是1亿多行。查询需要很长时间。我不确定这与我的硬件、数据库类型或查询有关。
附注:我使用HDD硬盘并拥有8GB的内存。我已在MySQL和PostgreSQL上进行了测试。
更新:我还没有创建索引。
这是解释计划吗? explain plan 需要多长时间? 这取决于输入。 在MySQL上,需要几分钟到几小时。 在PostgreSQL上,需要约30秒至2分钟。
更新2:我的表格从不更改。那么我可以将所有结果存储在一个表格中吗?这有帮助吗?
更新3:我考虑分区。这可能会快得多,但问题是如果某个下分区中的[装甲套装]的总防御值高于上分区中的[装甲套装]。 例如:
[head1,arm1,body1,leg1][totaldef 25][totalweight 9.9]
[head2,arm2,body2,leg2][totaldef 20][totalweight 11.0]

因为它位于其他分区,所以分区总重量>10会错过那个[装甲套装]。

这是CSV文件,供任何想要测试的人使用。CSV文件

更新4 我认为最快的方法是创建物化视图,但我猜性能的关键是对其进行排序。我不知道哪种排序可以帮助物化视图或索引,但我将它们都排序了,这很有帮助。

我没想到会得到这么多帮助。谢谢。


2
你错过了最重要的信息:索引??? - juergen d
1
如果是我,我可能只会有一个表,而不是四个。 - Strawberry
如果你这样做,那么你用来查找这四个名字的查询语句是什么? - devoh
1
这个表是否经常变化?如果不是,那么您可以在其笛卡尔积上创建具有适当索引的物化视图,并使用它来替代原表。 - Abelisto
1
如果性能对您最重要,请将结果存储在新表中,该表将包含类似于 helm_id,body_id,arm_id,leg_id,weight_sum,def_sum 的结构。 - Lemjur
显示剩余7条评论
4个回答

2
非常有趣的问题。我不知道针对您的情况是否有特殊的方法。如果我是您,我会尝试以下方法:身体似乎比头盔、手臂和腿重。因此,我将首先查询该表,然后在每个连接上进行查询,并确保重量总和不超过您输入的值。如下所示:
SELECT helmet.name AS hname, body.name AS bname, arm.name AS aname, leg.name AS lname,
helmet.poise + body.poise + arm.poise + leg.poise AS totalpoise, 
helmet.weight + body.weight + arm.weight + leg.weight AS totalweight 
FROM body 
    INNER JOIN helmet 
    ON 1=1 
        AND body.weight + helmet.weight <= 10
    INNER JOIN arm 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight <= 10
    INNER JOIN leg 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight + leg.weight <= 10
WHERE body.weight <= 10
ORDER BY totalpoise DESC limit 5

正如@juergen-d在评论中提到的那样,索引会影响性能。您可以对每个权重列测试有无索引的差异。

对于PostgreSQL:

CREATE INDEX index_body_on_weight ON body(weight);

在与zerkms和Laurenz Albe的讨论后,他们一致认为这三个索引是无用的,不应使用:(如果我有时间,我会做一个基准测试)。
CREATE INDEX index_helmet_on_weight ON helmet(weight);
CREATE INDEX index_arm_on_weight ON arm(weight);
CREATE INDEX index_leg_on_weight ON leg(weight);

在PostgreSQL 9.3.5上的基准测试:

 slowbs's Query : 107.628 second
 my proposition Query : 12.066 second
 my proposition Query : 16.257 second (with only index_body_on_weight)
 my proposition Query : 13.217 second (with 4 indexes)

基准测试结论:在这种情况下,索引是低效的。@zerkms和@Laurenz Albe是正确的。

最后,请分享您的结果。


@zerkms "每个权重列上的索引"。我没有写语法,因为slowbs至少使用了2个RDBMS。 - devoh
1
你觉得为每一列单独创建索引有何帮助? - zerkms
1
除了开玩笑之外:拥有冗余(即无用)索引会带来伤害,而不是好处。 - zerkms
1
我认为答案很好,但除了第一个索引以外都是无用的,应该将它们删除。多余的索引会浪费空间并影响DML性能。 - Laurenz Albe
1
你需要在那里放一个粗体声明,说明这只是你的猜测,你不确定它是否有任何帮助(实际上没有),以便其他人不会从中学习。 - zerkms
显示剩余11条评论

2
使用适当索引的物化视图在我的老旧SSD台式机上表现相当不错,仅需1.8秒,使用Postgresql默认配置:

一个物化视图与适当索引结合,性能表现良好。

create materialized view v as
select
    h.name as hname, b.name as bname, a.name as aname, l.name as lname,
    total_poise, total_weight
from
    helmet h
    cross join
    body b
    cross join
    arm a
    cross join
    leg l
    cross join lateral (
        select
            h.weight + b.weight + l.weight + a.weight as total_weight,
            h.poise + b.poise + l.poise + a.poise as total_poise
    ) total
order by total_poise desc, total_weight
;

create index v_index on v (total_poise desc, total_weight);

执行和分析:

select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
         hname         |          bname           |         aname          |          lname           | total_poise | total_weight 
-----------------------+--------------------------+------------------------+--------------------------+-------------+--------------
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.4
 Fume Sorcerer Mask+10 | Lion Warrior Cape+10     | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Red Lion Warrior Cape+10 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Lion Warrior Skirt+10    |          20 |          9.6
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Moon Butterfly Skirt+10  |          20 |          9.6


explain analyze
select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..11.71 rows=5 width=88) (actual time=1847.680..1847.694 rows=5 loops=1)
   ->  Index Scan using v_index on v  (cost=0.57..11191615.70 rows=5020071 width=88) (actual time=1847.678..1847.691 rows=5 loops=1)
         Index Cond: (total_weight <= '10'::double precision)
 Planning time: 0.126 ms
 Execution time: 1847.722 ms

你忘记了主要条件 'totalweight <= 10'。 - devoh
@devoh 这行代码包含了所有的条件: where 10 >= all (array[total_weight, h.weight, b.weight, l.weight, a.weight]) - Clodoaldo Neto
你说得对。我的错。我会将你的查询添加到我的基准测试中。204.438秒。 - devoh
这似乎是我尝试过的最快的。当输入在0-10范围内时仍然很慢,但只需要15-20秒,它就会使我的硬盘驱动器使用率达到100%。因此,我认为如果我更换SSD,它可以更快。 - slowbs

2

因为您的表不会发生变化,所以可以缓存中间数据。对于PostgreSQL来说,可以使用物化视图

create materialized view equipments as
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;
create index i_def on equipments(total_def);
create index i_weight on equipments(total_weight);

这是一次性的繁重操作,但之后可以查询类似以下内容的信息:
select *
from equipments
where total_weight <= 10
order by total_def desc
limit 5;

将会更快。当然,您可以将您的表格连接到上述查询中以获取有关设备的详细信息。

如果表格已更改,则可以调用REFRESH MATERIALIZED VIEW

我不熟悉MySQL,但是您可以搜索mysql materialized view或仅使用常规表格。


另一种尝试:分区

(如果在之前的尝试中创建了物化视图equipments,则使用drop materialized view equipments命令删除它)

create table equipments(
  helmet_id int, arm_id int, body_id int, leg_id int,
  total_weight float, total_def float);

这是基本表格。接下来我们将创建分区。例如,如果最大总重量为40,则有四个分区用于0-10、10-20、20-30和30-40的总重量:

create table equipments_10 (check (total_weight>0 and total_weight<=10))
  inherits (equipment); 
create table equipments_20 (check (total_weight>10 and total_weight<=20))
  inherits (equipment); 
create table equipments_30 (check (total_weight>20 and total_weight<=30))
  inherits (equipment); 
create table equipments_40 (check (total_weight>30))
  inherits (equipment);

填充我们的表格:

insert into equipments
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;

并创建大量索引,以便让PostgreSQL有机会选择最有效的执行计划:

create index i_equip_total_def on equipments(total_def);
create index i_equip_total_weight on equipments(total_weight); 
create index i_equip_10_total_def on equipments_10(total_def);
create index i_equip_10_total_weight on equipments_10(total_weight); 
create index i_equip_20_total_def on equipments_20(total_def);
create index i_equip_20_total_weight on equipments_20(total_weight); 
create index i_equip_30_total_def on equipments_30(total_def);
create index i_equip_30_total_weight on equipments_30(total_weight); 
create index i_equip_40_total_def on equipments_40(total_def);
create index i_equip_40_total_weight on equipments_40(total_weight);

最后对数据进行统计:

analyze equipments;
analyze equipments_10;
analyze equipments_20;
analyze equipments_30;
analyze equipments_40;

查询与之前尝试的类似。
PS:如果有人想尝试,这里是我的测试。 PPS:在我的测试中,每个查询,无论参数如何,都不到0.5毫秒(在我史前的硬件上)。

我尝试了一下,按 total_def 升序排序速度确实更快。但是按降序排序时,只有在输入高值(如30或以上)时才很快,如果输入小于30,则非常慢。 - slowbs
@slowbs 是的,我看到了问题。已经向答案中添加了另一个解决方案。 - Abelisto

1

仅供娱乐和完整性:在统一表上进行递归解决方案。这可能不是最快的,但如果表格变得更大并且可以使用索引,则可能获胜。(像3*3*3*3这样的简单示例通常会产生哈希连接计划,甚至嵌套表扫描)


-- the data
CREATE TABLE helmet(id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO helmet(id, name, poise, weight) VALUES
(   1, 'head1', 5, 2.2) ,(   2, 'head2', 6, 2.9) ,(   3, 'head3', 7, 3.5) ;

CREATE TABLE body (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO body(id, name, poise, weight) VALUES
 (   1, 'body1', 10, 5.5) ,(   2, 'body2', 5 , 2.4) ,(   3, 'body3', 17, 6.9) ;

CREATE TABLE arm (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO arm(id, name, poise, weight) VALUES
 (   1, 'arm1', 4, 2.7) ,(   2, 'arm2', 5, 3.1) ,(   3, 'arm3', 2, 1.8) ;

CREATE TABLE leg (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO leg(id, name, poise, weight) VALUES
 (   1, 'leg1', 8, 3.5) ,(   2, 'leg2', 5, 2.0) ,(   3, 'leg3', 8, 1.8) ;


-- combine the four tables into one
CREATE table allgear AS
SELECT 1 AS gid, 'helmet' AS gear, h.id, h.name, h.poise, h.weight FROM helmet h
UNION ALL
SELECT 2 AS gid, 'body' AS gear, b.id, b.name, b.poise, b.weight FROM body b
UNION ALL
SELECT 3 AS gid, 'arm' AS gear, a.id, a.name, a.poise, a.weight FROM arm a
UNION ALL
SELECT 4 AS gid, 'leg' AS gear, l.id, l.name, l.poise, l.weight FROM leg l
        ;

-- add som structure ...
ALTER TABLE allgear ADD PRIMARY KEY(gid, id);
CREATE INDEX ON allgear(gid, weight);
VACUUM ANALYZE allgear;

-- SELECT * FROM allgear ORDER by gid, id;


-- Recursive query with some pruning on the partial results.
-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid +1 AND (rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)
        )
SELECT * FROM rrr
WHERE gid = 4 -- the gid of the final one
ORDER BY totweight DESC
LIMIT 5
        ;

结果:

 gid |           arr           | totpoise | totweight 
-----+-------------------------+----------+-----------
   4 | {head2,body2,arm1,leg2} |       20 |     10.00
   4 | {head1,body2,arm3,leg1} |       20 |      9.90
   4 | {head2,body2,arm1,leg3} |       23 |      9.80
   4 | {head3,body2,arm3,leg2} |       19 |      9.70
   4 | {head1,body2,arm2,leg2} |       20 |      9.70
(5 rows)

注意:我得到了更多的组合,可能是因为我使用了 DECIMAL(4,2) 而不是浮点类型。

额外信息:如果我们知道剩余层级(齿轮类型)将添加的最小重量,甚至在较低层级中,我们可以添加一些额外的修剪。我为此添加了一个额外的表格。


CREATE TABLE minima AS
SELECT gid, MIN(weight) AS mimi
FROM allgear
GROUP BY gid;
-- add an extra level ...
INSERT INTO minima(gid, mimi) VALUES (5, 0.0);

-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid+1
        -- Do some extra pruning: Partial sum + the missing parts should not sum up to more than 10
        JOIN LATERAL ( SELECT SUM(mimi) AS debt
                FROM minima
                WHERE gid > ag.gid
                ) susu ON (susu.debt +rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)

        )
SELECT * FROM rrr
WHERE gid = 4
ORDER BY totweight DESC
LIMIT 5
        ;

我尝试了这个,它让我的硬盘运行到了100%。当输入在0-10范围内时速度更快,但在其他情况下仍然很慢。我认为如果我换成SSD可能会更快。我还没有尝试过额外的东西。 - slowbs
递归方法依赖于(早期)剪枝。如果上层不能剪枝,它仍需要探索几乎整个NNN*N树。 - wildplasser

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接