MySQL查询非常缓慢

5

我的表格有以下列:

gamelogs_id (auto_increment primary key)
player_id (int)
player_name (varchar)
game_id (int)
season_id (int)
points (int)
表格中包含以下索引
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table           | Non_unique | Key_name           | Seq_in_index | Column_name        | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_gamelogs |          0 | PRIMARY            |            1 | player_gamelogs_id | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
| player_gamelogs |          1 | player_name        |            1 | player_name        | A         |        3375 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | points          |            1 | points          | A         |         506 |     NULL | NULL   | YES  | BTREE      |         ## Heading ##|               |
| player_gamelogs |          1 | game_id            |            1 | game_id            | A         |       37133 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | season             |            1 | season             | A         |          30 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | team_abbreviation  |            1 | team_abbreviation  | A         |          70 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | player_id          |            3 | dk_points          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            1 | game_id            | A         |       41258 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            2 | player_id          | A         |      371330 |     NULL | NULL   | YES  | BTREE      |         |               |
| player_gamelogs |          1 | game_player_season |            3 | season_id          | A         |      371330 |     NULL | NULL   |      | BTREE      |         |               |
+-----------------+------------+--------------------+--------------+--------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

我正在尝试计算赛季和球员在比赛开始前的平均得分。因此,对于第三场比赛,平均得分将是第一场比赛和第二场比赛的平均值。比赛编号按顺序排列,因此较早的比赛小于较晚的比赛。我还可以使用日期字段,但我认为数字比较会更快吗?

我的查询如下:

SELECT game_id, 
       player_id, 
       player_name, 
       (SELECT avg(points) 
          FROM player_gamelogs t2
         WHERE t2.game_id < t1.game_id 
           AND t1.player_id = t2.player_id 
           AND t1.season_id = t2.season_id) AS avg_points
  FROM player_gamelogs t1
 ORDER BY player_name, game_id;

EXPLAIN的输出如下:

| id | select_type        | table | type | possible_keys                        | key  | key_len | ref  | rows   | Extra                                           |
+----+--------------------+-------+------+--------------------------------------+------+---------+------+--------+-------------------------------------------------+
|  1 | PRIMARY            | t1    | ALL  | NULL                                 | NULL | NULL    | NULL | 371330 | Using filesort                                  |
|  2 | DEPENDENT SUBQUERY | t2    | ALL  | game_id,player_id,game_player_season | NULL | NULL    | NULL | 371330 | Range checked for each record (index map: 0xC8) |

我不确定这是因为任务本身的性质还是因为我的查询效率低下。感谢任何建议!


平均分数(avg_points)是在向Stack Overflow发布问题时出现的错误 - 它不在实际查询中。 - Eric Truett
你能否在问题中解释一下你试图获取哪些数据?对我来说t2.game_id < t1.game_id这个操作似乎不太合理。 - Jorge Campos
游戏ID是唯一的吗? - Sebas
game_id在每个组(即单个玩家)中是唯一的,但在整个表中不是唯一的,因为多个玩家在同一场游戏中。 - Eric Truett
1
t2.game_id < t1.game_id 限制了游戏只能是当前游戏之前的游戏。 - Eric Truett
显示剩余4条评论
3个回答

7
请考虑以下查询:
SELECT t1.season_id, t1.game_id, t1.player_id, t1.player_name, AVG(COALESCE(t2.points, 0)) AS average_player_points
FROM player_gamelogs t1
        LEFT JOIN player_gamelogs t2 ON 
                t1.game_id > t2.game_id 
            AND t1.player_id = t2.player_id
            AND t1.season_id = t2.season_id 
GROUP BY
    t1.season_id, t1.game_id, t1.player_id, t1.player_name
ORDER BY t1.player_name, t1.game_id;

注:

  • 为了达到最佳性能,您需要在(season_id,game_id,player_id,player_name)上添加一个额外的索引。
  • 更好的方法是拥有一个球员表,从中检索id获取名称。对于我们必须从日志表中获取球员姓名的情况,我认为这是多余的,特别是如果它在索引中是必需的。
  • Group by 已经按分组列排序。如果可以,请避免之后的排序,因为它会产生无用的开销。正如评论中所述,这不是一种官方行为,并且应权衡假定其随时间的一致性的结果与突然失去排序的风险。

它最终怎么样了? - Sebas
1
"按组排序已经按分组列排序"是一个冒险的假设,可能会产生错误。唯一保证提供结果排序的子句是ORDER BY。 - Paul Maxwell
@Used_By_Already 没错。我认为这仍然是有价值的反馈,因为它确实会影响性能。 - Sebas
只要该人知道存在风险(所以我已经提到了:),我就对此感到满意。 - Paul Maxwell
谢谢。我应该自己提到它,因为这只是一种副作用,而不是正式的行为。让我更新我的答案。 - Sebas

2

您的查询语句写得很好:

SELECT game_id, player_id, player_name, 
       (SELECT avg(t2.points) 
        FROM player_gamelogs t2
        WHERE t2.game_id < t1.game_id AND
              t1.player_id = t2.player_id AND
              t1.season_id = t2.season_id
      ) AS avg_points
FROM player_gamelogs t1
ORDER BY player_name, game_id;

但是,为了获得最佳性能,您需要在其上创建两个组合索引:(player_id, season_id, game_id, points)(player_name, game_id, season_id)

第一个索引应加快子查询速度。第二个索引用于外部的order by


尽管有普遍的“覆盖索引”理论,但我在想这些点对于这个查询是否有意义。我想先不使用它来尝试一下... - Sebas
@Sebas...显然,“points”并不像“player_id”和“season_id”那样重要。但是,覆盖索引可以使引擎避免使用数据页,从而节省I/O。 - Gordon Linoff

1

现在您已经有了查询,对于每个玩家,您正在运行每个游戏和所有游戏... 因此,例如,如果每个人有10场比赛,您将获得每个赛季/人的以下结果

Game 10, Game 10 points, avg of games 1-9
Game 9, Game 9 points, avg of games 1-8...
...
...
Game 2, Game 2 points, avg of thus final game 1 only.

你说你想要最新的游戏,并且所有内容的平均值都在其下面。因此,我假设你不关心每个人的低级游戏水平。
你还查询了所有赛季。如果一个赛季已经结束,你是否关心旧赛季?或者只关心当前赛季。否则,你将浏览所有赛季,所有球员...
以上所述,我提供以下建议。首先,使用WHERE子句将查询限制为最新赛季,但我故意在查询/分组中保留了赛季,以防你需要其他赛季。然后,我将对于给定的人/赛季获取最大游戏作为最终1行(每个人赛季),然后获取其下面所有内容的平均值。因此,在10个游戏到2个游戏的场景示例中,我不会抓取底层的9-2行,只返回我的场景中的第10个游戏。
select
      pgMax.Player_ID,
      pgMax.Season_ID,
      pgMax.mostRecentGameID,
      pgl3.points as mostRecentGamePoints,
      pgl3.player_name,
      coalesce( avg( pgl2.points ), 0 ) as AvgPointsPriorToCurrentGame
   from
      ( select pgl1.player_id,
               pgl1.season_id,
               max( pgl1.game_id ) as mostRecentGameID
           from
              player_gameLogs pgl1
           where
               pgl1.season_id = JustOneSeason
           group by
              pgl1.player_id,
              pgl1.season_id ) pgMax

         JOIN player_gamelogs pgl pgl2
            on pgMax.player_id = pgl2.player_id
           AND pgMax.season_id = pgl2.season_id
           AND pgMax.mostRecentGameID > pgl2.game_id

         JOIN player_gamelogs pgl pgl3
            on pgMax.player_id = pgl3.player_id
           AND pgMax.season_id = pgl3.season_id
           AND pgMax.mostRecentGameID = pgl3.game_id
   group by
      pgMax.Player_ID,
      pgMax.Season_ID
   order by
      pgMax.Player_ID

现在,为了优化查询,最好使用一个组合索引 (player_id, season_id, game_id, points)。然而,如果你只是想查找当前赛季的数据,则将索引放在 (season_id, player_id, game_id, points) 上,并将赛季 ID 放在第一位置以预先验证 WHERE 子句。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接