POSTGRESQL:如何选择每个组的第一行?

4

使用以下查询:

WITH responsesNew AS
(
  SELECT DISTINCT responses."studentId", notation, responses."givenHeart", 
  SUM(notation + responses."givenHeart") OVER (partition BY responses."studentId" 
  ORDER BY responses."createdAt") AS total, responses."createdAt",  
  FROM responses
)
SELECT responsesNew."studentId", notation, responsesNew."givenHeart", total, 
responsesNew."createdAt"
FROM responsesNew
WHERE total = 3
GROUP BY responsesNew."studentId", notation, responsesNew."givenHeart", total, 
responsesNew."createdAt"
ORDER BY responsesNew."studentId" ASC

我得到了这个数据表:

studentId | notation | givenHeart | total |      createdAt     |
----------+----------+------------+-------+--------------------+
 374      | 1        | 0          | 3     | 2017-02-13 12:43:03   
 374      | null     | 0          | 3     | 2017-02-15 22:22:17
 639      | 1        | 2          | 3     | 2017-04-03 17:21:30 
 790      | 1        | 0          | 3     | 2017-02-12 21:12:23
 ...

我的目标是仅保留数据表中每个组的早期行,如下所示:

studentId | notation | givenHeart | total |      createdAt     |
----------+----------+------------+-------+--------------------+
 374      | 1        | 0          | 3     | 2017-02-13 12:43:03 
 639      | 1        | 2          | 3     | 2017-04-03 17:21:30 
 790      | 1        | 0          | 3     | 2017-02-12 21:12:23
 ...

我该怎么做才能到那里?

我已经阅读了很多相关主题,但是在使用DISTINCTDISTINCT ON、子查询中的WHERELIMIT等方法时都没有成功(可能是因为我的理解不够好)。我遇到了与窗口函数有关的错误,在ORDER BY中缺少列以及其他一些我记不清的错误。


在你的with语句中使用窗口函数来添加一个row_number并通过它进行过滤Row_number() over (partition by studentID order by createdAt) RN,然后在你的查询中添加where RN = 1。 - xQbert
2个回答

6

您可以使用distinct on来实现此操作。查询语句应如下所示:

WITH responsesNew AS (
      SELECT DISTINCT r."studentId", notation, r."givenHeart", 
             SUM(notation + r."givenHeart") OVER (partition BY r."studentId" 
                                                  ORDER BY r."createdAt") AS total,
             r."createdAt" 
      FROM responses r
     )
SELECT DISTINCT ON (r."studentId") r."studentId", notation, r."givenHeart", total, 
r."createdAt"
FROM responsesNew r
WHERE total = 3
ORDER BY r."studentId" ASC, r."createdAt";

我很肯定这可以简化。我只是不明白公共表表达式的目的。以这种方式使用SELECT DISTINCT非常奇怪。
如果您想要一个简化的查询,请提出另一个问题,附带有样本数据、所需结果和解释您正在做什么以及包含查询或链接到此问题的信息。

3
使用Row_number()窗口函数为每个分区添加行号,然后只显示第一行。
如果只涉及一个表,则不需要完全限定名称。在限定时使用别名以简化可读性。
WITH responsesNew AS
(
  SELECT "studentId"
       , notation
       , "givenHeart"
       , SUM(notation + "givenHeart") OVER (partition BY "studentId" ORDER BY "createdAt") AS total
       , "createdAt"
       , Row_number() OVER ("studentId" ORDER BY "createdAt") As RNum
  FROM responses r
)
SELECT RN."studentId"
     , notation, RN."givenHeart"
     , total
     , RN."createdAt"
FROM responsesNew RN
WHERE total = 3
  AND RNum = 1
GROUP BY RN."studentId"
       , notation
       , RN."givenHeart", total
       , RN."createdAt"
ORDER BY RN."studentId" ASC

1
必须要欣赏到窗口函数的威力。 - AguThadeus

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接