计算平均时间间隔长度

3

我已准备好一个简单的SQL Fiddle,演示了我的问题 -

在PostgreSQL 10.3中,我将用户信息、双人游戏和移动存储在以下3个表中:

CREATE TABLE players (
    uid SERIAL PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE games (
    gid SERIAL PRIMARY KEY,
    player1 integer NOT NULL REFERENCES players ON DELETE CASCADE,
    player2 integer NOT NULL REFERENCES players ON DELETE CASCADE
);

CREATE TABLE moves (
    mid BIGSERIAL PRIMARY KEY,
    uid integer NOT NULL REFERENCES players ON DELETE CASCADE,
    gid integer NOT NULL REFERENCES games ON DELETE CASCADE,
    played timestamptz NOT NULL
);

假设有两位玩家,爱丽丝和鲍勃,他们已经进行了三局游戏:

INSERT INTO players (name) VALUES ('Alice'), ('Bob');
INSERT INTO games (player1, player2) VALUES (1, 2);
INSERT INTO games (player1, player2) VALUES (1, 2);
INSERT INTO games (player1, player2) VALUES (1, 2);

假设第一场比赛很快结束,每分钟都有棋子移动。

但是后来他们变得冷静了 :-) 并且玩了两个缓慢的游戏,每10分钟才有棋子移动:

INSERT INTO moves (uid, gid, played) VALUES
(1, 1, now() + interval '1 min'),
(2, 1, now() + interval '2 min'),
(1, 1, now() + interval '3 min'),
(2, 1, now() + interval '4 min'),
(1, 1, now() + interval '5 min'),
(2, 1, now() + interval '6 min'),

(1, 2, now() + interval '10 min'),
(2, 2, now() + interval '20 min'),
(1, 2, now() + interval '30 min'),
(2, 2, now() + interval '40 min'),
(1, 2, now() + interval '50 min'),
(2, 2, now() + interval '60 min'),

(1, 3, now() + interval '110 min'),
(2, 3, now() + interval '120 min'),
(1, 3, now() + interval '130 min'),
(2, 3, now() + interval '140 min'),
(1, 3, now() + interval '150 min'),
(2, 3, now() + interval '160 min');

在一个游戏统计网页上,我想为每个玩家显示平均移动时间。所以我想使用PostgreSQL的LAG窗口函数。由于可以同时进行多个游戏,因此我正在尝试按gid(即“游戏ID”)PARTITION BY gid。不幸的是,我的SQL查询出现语法错误窗口函数调用不能嵌套
SELECT AVG(played - LAG(played) OVER (PARTITION BY gid order by played))
OVER (PARTITION BY gid order by played)
FROM moves
-- trying to calculate average thinking time for player Alice
WHERE uid = 1;

更新:

由于我的数据库中游戏数量庞大并且每天都在增长,因此我尝试(在这里使用新的SQL Fiddle)向内部选择查询添加条件:

SELECT AVG(played - prev_played)
FROM (SELECT m.*,
      LAG(m.played) OVER (PARTITION BY m.gid ORDER BY played) AS prev_played
      FROM moves m
      JOIN games g ON (m.uid in (g.player1, g.player2))
      WHERE m.played > now() - interval '1 month'
     ) m
WHERE uid = 1;

然而由于某些原因,这使得返回值发生了很大的变化,变成了1分45秒。

我想知道,为什么内部SELECT查询突然返回了更多的行,是不是我的JOIN中缺少了一些条件?

更新2:

哦,好吧,我明白为什么平均值会降低:因为有多个具有相同时间戳的行(即played - prev_played = 0),但如何修复JOIN呢?

更新3:

没关系,我在SQL JOIN中漏掉了m.gid = g.gid AND条件,现在它可以工作:

SELECT AVG(played - prev_played)
FROM (SELECT m.*,
      LAG(m.played) OVER (PARTITION BY m.gid ORDER BY played) AS prev_played
      FROM moves m
      JOIN games g ON (m.gid = g.gid AND m.uid in (g.player1, g.player2))
      WHERE m.played > now() - interval '1 month'
     ) m
WHERE uid = 1;

1
期望的结果会有所帮助。 - Gordon Linoff
期望的结果是两位玩家都需要7分钟 (1 + 10 + 10) / 3。在我的网页上,我想列出每个玩家下棋的速度有多快或有多慢。 - Alexander Farber
1
第二场和第三场比赛的平均移动时间会是10吗? - Juan Carlos Oropeza
是的,但我只想打印一个值:所有玩家所执行的移动的平均思考时间,即7分钟。 - Alexander Farber
你应该删除这些更新。它们属于另一个问题,而且看起来你已经解决了? - Juan Carlos Oropeza
1
我认为m.uid不会过滤太多。现在更大的过滤器是m.played > now()...我认为你想要这个版本http://sqlfiddle.com/#!17/73a57/32,只需加入uid = 1是player1或player2的游戏即可。 - Juan Carlos Oropeza
2个回答

2

您需要使用子查询来嵌套窗口函数。我认为以下内容可以满足您的需求:

select avg(played - prev_played)
from (select m.*,
             lag(m.played) over (partition by gid order by played) as prev_played
      from moves m
     ) m
where uid = 1;

注意: where 子句需要放在外部查询中,这样它就不会影响 lag() 的结果。

1

可能@gordon的答案已经足够好了。但这并不是你在评论中所要求的结果。仅仅因为数据在每个游戏中有相同的行数,所以游戏的平均值与完整平均值相同。但如果你想要游戏的平均值,你需要再增加一个层级。

With cte as (
    SELECT gid, AVG(played - prev_played) as play_avg
    FROM (select m.*,
                 lag(m.played) over (partition by gid order by played) as prev_played
          from moves m      
         ) m
    WHERE uid = 1
    GROUP BY gid
)
   SELECT AVG(play_avg)
   FROM cte
;

谢谢,但实际上Gordon的建议已经满足我的需求。 - Alexander Farber

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接