每个分组最新N条记录的平均值

11

我的当前应用程序基于每个用户的所有记录计算平均点数:

SELECT `user_id`, AVG(`points`) AS pts 
FROM `players` 
WHERE `points` != 0 
GROUP BY `user_id`

业务需求已更改,我需要基于每个用户的最后30条记录计算平均值。

相关表格具有以下结构:

表:players; 列:player_id、user_id、match_id、points

表:users; 列:user_id

以下查询不起作用,但它展示了我试图实现的逻辑。

SELECT @user_id := u.`id`, (
    -- Calculate the average for last 30 records
    SELECT AVG(plr.`points`) 
    FROM (
        -- Select the last 30 records for evaluation
        SELECT p.`points` 
        FROM `players` AS p 
        WHERE p.`user_id`=@user_id 
        ORDER BY `match_id` DESC 
        LIMIT 30
    ) AS plr
) AS avg_points 
FROM `users` AS u

有没有一种相当有效的方法,可以基于每个用户的最新30条记录来计算平均值?


http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ - YK1
很好,业务需求已经改变了——因为在你的第一个查询中,“WHERE points!= 0”是不正确的。对于“平均值”,您应该计算那些玩家得分为零的尝试次数。 - YK1
YK1,排除积分为0的记录是一个特定的要求。对于积分为0的记录,该应用程序具有特殊含义。 - JV-
5个回答

11

没有理由重新发明轮子并冒险使用有缺陷、效率低下的代码。你的问题是常见的按组限制问题的简单扩展。已经有经过测试和优化的解决方案来解决这个问题,我建议从以下两个方案中选择一个。这些查询为每个玩家生成最新的30条记录(根据你的表重新编写):

select user_id, points
from players
where (
   select count(*) from players as p
   where p.user_id = players.user_id and p.player_id >= players.player_id
) <= 30;

只是为了确保我理解你的结构:我认为player_id是玩家表中的唯一键,并且一个用户可以作为多个玩家存在于该表中。

第二个经过测试和优化的解决方案是使用MySQL变量:

set @num := 0, @user_id := -1;

select user_id, points,
      @num := if(@user_id = user_id, @num + 1, 1) as row_number,
      @user_id := user_id as dummy
from players force index(user_id) /* optimization */
group by user_id, points, player_id /* player_id should be necessary here */
having row_number <= 30;

第一个查询不是最优的(是二次方),而第二个查询是最优的(一遍扫描),但只适用于MySQL。选择权在你手中。如果你选择第二种技术,请注意并正确测试它与你的键和数据库设置;他们建议在某些情况下可能会停止工作
你的最终查询是微不足道的:
select user_id, avg(points)
from ( /* here goes one of the above solutions; 
          the "set" commands should go before this big query */ ) as t
group by user_id

请注意,我没有将您在第一次查询中的条件 (points!= 0) 合并进来,因为我不理解您的要求(您没有描述它)。同时我认为该回答应该足够通用,以帮助其他遇到类似问题的人。请注意保留原有的HTML标签。

8

试试这个:

SELECT user_id, AVG(points) AS pts 
FROM (SELECT user_id, IF(@uid = (@uid := user_id), @auto:=@auto + 1, @auto := 1) autoNo, points
      FROM players, (SELECT @uid := 0, @auto:= 1) A 
      WHERE points != 0 
      ORDER BY user_id, match_id DESC
     ) AS A 
WHERE autoNo <= 30
GROUP BY user_id;

1
@JV- 惊讶于你对优化解决方案没有兴趣... 好吧,这取决于你。 - Tomas

0

这应该可以正常运行:

SELECT p1.user_id, avg(points) as pts
  FROM players p1, (
    SELECT u.user_id, (
         SELECT match_id
           FROM players p2
          WHERE p2.user_id = u.user_id
          ORDER BY match_id DESC
          LIMIT 29, 1 ) mid
      FROM users u
    HAVING mid IS NOT NULL) m
 WHERE p1.user_id = m.user_id
   AND p1.match_id >= m.mid
 GROUP BY p1.user_id

 UNION ALL

SELECT user_id, avg(points) AS pts 
  FROM players
 GROUP BY user_id
HAVING count(*) < 30

UNION ALL 后的部分只有在您需要包括少于30个记录的用户时才是必需的。


0
SELECT 
u.`id`, 
(SELECT AVG(p.`points`) FROM FROM `players` AS p WHERE p.`user_id`=u.`id` 
ORDER BY p.`user_id` DESC LIMIT 30) AS AVG
FROM `users` AS u Group by u.`id`

还可以试试这个...


0
如果我理解您的逻辑正确,您需要根据最后30个(按match_id排序的)非零分记录,计算每个用户的平均分数。
首先,您需要为每个用户返回最后30条记录,并且您可以使用以下查询:
SELECT p.user_id, p.match_id, p.points
FROM
  players p INNER JOIN players c
  ON p.user_id=c.user_id AND p.match_id<=c.match_id
     AND p.points!=0 and c.points!=0
GROUP BY
  p.user_id, match_id, points
HAVING
  COUNT(c.user_id)<=30

然后您需要在前一个查询中计算平均值:

SELECT user_id, AVG(points)
FROM (
  SELECT p.user_id, p.match_id, p.points
  FROM
    players p INNER JOIN players c
    ON p.user_id=c.user_id AND p.match_id<=c.match_id
       AND p.points!=0 and c.points!=0
  GROUP BY
    p.user_id, match_id, points
  HAVING
    COUNT(c.user_id)<=30
  ) l
GROUP BY user_id

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接