MySQL查询非常缓慢

Question

MySQL查询非常缓慢

5

我的表格有以下列：

gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)

表格中包含以下索引

+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name           | Seq_in_index | Column_name        | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs |          0 | PRIMARY            |            1 | player_gamelogs_id | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
| player_gamelogs |          1 | player_name        |            1 | player_name        | A         |        3375 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | points          |            1 | points          | A         |         506 |     NULL | NULL   | YES  | BTREE      |         ## Heading ##|               |
| player_gamelogs |          1 | game_id            |            1 | game_id            | A         |       37133 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | season             |            1 | season             | A         |          30 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | team_abbreviation  |            1 | team_abbreviation  | A         |          70 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            3 | dk_points          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            3 | season_id          | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

我正在尝试计算赛季和球员在比赛开始前的平均得分。因此，对于第三场比赛，平均得分将是第一场比赛和第二场比赛的平均值。比赛编号按顺序排列，因此较早的比赛小于较晚的比赛。我还可以使用日期字段，但我认为数字比较会更快吗？

我的查询如下：

SELECT game_id, 
       player_id, 
       player_name, 
       (SELECT avg(points) 
          FROM player_gamelogs t2
         WHERE t2.game_id < t1.game_id 
           AND t1.player_id = t2.player_id 
           AND t1.season_id = t2.season_id) AS avg_points
  FROM player_gamelogs t1
 ORDER BY player_name, game_id;

EXPLAIN的输出如下：

| id | select_type        | table | type | possible_keys                        | key  | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | t1    | ALL  | NULL                                 | NULL | NULL    | NULL | 371330 | Using filesort                                  |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | game_id,player_id,game_player_season | NULL | NULL    | NULL | 371330 | Range checked for each record (index map: 0xC8) |

我不确定这是因为任务本身的性质还是因为我的查询效率低下。感谢任何建议！

- Eric Truett

平均分数（avg_points）是在向Stack Overflow发布问题时出现的错误 - 它不在实际查询中。 - Eric Truett

你能否在问题中解释一下你试图获取哪些数据？对我来说t2.game_id < t1.game_id这个操作似乎不太合理。 - Jorge Campos

游戏ID是唯一的吗？ - Sebas

game_id在每个组（即单个玩家）中是唯一的，但在整个表中不是唯一的，因为多个玩家在同一场游戏中。 - Eric Truett

1

t2.game_id < t1.game_id 限制了游戏只能是当前游戏之前的游戏。 - Eric Truett

显示剩余4条评论

3个回答

2

您的查询语句写得很好：

SELECT game_id, player_id, player_name, 
       (SELECT avg(t2.points) 
        FROM player_gamelogs t2
        WHERE t2.game_id < t1.game_id AND
              t1.player_id = t2.player_id AND
              t1.season_id = t2.season_id
      ) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;

但是，为了获得最佳性能，您需要在其上创建两个组合索引：(player_id, season_id, game_id, points) 和 (player_name, game_id, season_id)。

第一个索引应加快子查询速度。第二个索引用于外部的order by。

- Gordon Linoff

尽管有普遍的“覆盖索引”理论，但我在想这些点对于这个查询是否有意义。我想先不使用它来尝试一下... - Sebas

@Sebas...显然，“points”并不像“player_id”和“season_id”那样重要。但是，覆盖索引可以使引擎避免使用数据页，从而节省I/O。 - Gordon Linoff

1

现在您已经有了查询，对于每个玩家，您正在运行每个游戏和所有游戏... 因此，例如，如果每个人有10场比赛，您将获得每个赛季/人的以下结果

Game 10, Game 10 points, avg of games 1-9
Game 9, Game 9 points, avg of games 1-8...
...
...
Game 2, Game 2 points, avg of thus final game 1 only.

你说你想要最新的游戏，并且所有内容的平均值都在其下面。因此，我假设你不关心每个人的低级游戏水平。

你还查询了所有赛季。如果一个赛季已经结束，你是否关心旧赛季？或者只关心当前赛季。否则，你将浏览所有赛季，所有球员...

以上所述，我提供以下建议。首先，使用WHERE子句将查询限制为最新赛季，但我故意在查询/分组中保留了赛季，以防你需要其他赛季。然后，我将对于给定的人/赛季获取最大游戏作为最终1行（每个人赛季），然后获取其下面所有内容的平均值。因此，在10个游戏到2个游戏的场景示例中，我不会抓取底层的9-2行，只返回我的场景中的第10个游戏。

select
      pgMax.Player_ID,
      pgMax.Season_ID,
      pgMax.mostRecentGameID,
      pgl3.points as mostRecentGamePoints,
      pgl3.player_name,
      coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame
   from
      ( select pgl1.player_id,
               pgl1.season_id,
               max( pgl1.game_id ) as mostRecentGameID
           from
              player_gameLogs pgl1
           where
               pgl1.season_id = JustOneSeason
           group by
              pgl1.player_id,
              pgl1.season_id ) pgMax

         JOIN player_gamelogs pgl pgl2
            on pgMax.player_id = pgl2.player_id
           AND pgMax.season_id = pgl2.season_id
           AND pgMax.mostRecentGameID > pgl2.game_id

         JOIN player_gamelogs pgl pgl3
            on pgMax.player_id = pgl3.player_id
           AND pgMax.season_id = pgl3.season_id
           AND pgMax.mostRecentGameID = pgl3.game_id
   group by
      pgMax.Player_ID,
      pgMax.Season_ID
   order by
      pgMax.Player_ID

现在，为了优化查询，最好使用一个组合索引 (player_id, season_id, game_id, points)。然而，如果你只是想查找当前赛季的数据，则将索引放在 (season_id, player_id, game_id, points) 上，并将赛季 ID 放在第一位置以预先验证 WHERE 子句。

- DRapp

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sebas · Accepted Answer

请考虑以下查询：

SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
        LEFT JOIN player_gamelogs t2 ON 
                t1.game_id > t2.game_id 
            AND t1.player_id = t2.player_id
            AND t1.season_id = t2.season_id 
GROUP BY
    t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;

注：

为了达到最佳性能，您需要在（season_id，game_id，player_id，player_name）上添加一个额外的索引。
更好的方法是拥有一个球员表，从中检索id获取名称。对于我们必须从日志表中获取球员姓名的情况，我认为这是多余的，特别是如果它在索引中是必需的。
Group by 已经按分组列排序。如果可以，请避免之后的排序，因为它会产生无用的开销。正如评论中所述，这不是一种官方行为，并且应权衡假定其随时间的一致性的结果与突然失去排序的风险。