PostgreSQL - 在满足条件的行中选择count(*)

3

我有以下表格和其中一些示例记录:

  id  | attr1_id | attr2_id |      user_id      | rating_id | override_comment
------+----------+----------+-------------------+-----------+------------------
 1    |      188 |      201 | user_1@domain.com |         3 |
 2    |      193 |      201 | user_2@domain.com |         2 |
 3    |      193 |      201 | user_2@domain.com |         1 |
 4    |      194 |      201 | user_2@domain.com |         1 |
 5    |      194 |      201 | user_1@domain.com |         1 |
 6    |      192 |      201 | user_2@domain.com |         1 |

(attr1_id, attr2_id, user_id)的组合是UNIQUE,这意味着每个用户只能使用一对特定的属性ID创建一条记录。
我的目标是统计rating_id = 1的行数,但是每个attr1_idattr2_id组合只计算一次,并且仅在不存在其他用户引用相同的attr1_idattr2_idrating_id > 1的行时才计算。请注意,attr1_idattr2_id的组合可以交换位置,所以考虑以下两个记录:
  id  | attr1_id | attr2_id |      user_id       | rating_id | override_comment
------+----------+----------+--------------------+-----------+------------------
  20  |       5  |       2  | user_1@domain.com  |         3 |
------+----------+----------+--------------------+-----------+------------------
  21  |       2  |       5  | user_2@domain.com  |         1 |

不应计算任何一行,因为这些行指向相同的 attr_ids 结合,并且其中一个具有 rating_id > 1

如果存在以下两行,则:

  id  | attr1_id | attr2_id |      user_id       | rating_id | override_comment
------+----------+----------+--------------------+-----------+------------------
  20  |       5  |       2  | user_1@domain.com  |         1 |
------+----------+----------+--------------------+-----------+------------------
  21  |       2  |       5  | user_2@domain.com  |         1 |
------+----------+----------+--------------------+-----------+------------------
  22  |       2  |       5  | user_3@domain.com  |         1 |

所有行都应该被视为一行,因为它们都共享相同的 attr1_idattr2_id 组合,并且所有行都具有 rating_id = 1

到目前为止,我的方法是这样的,但结果没有选择任何行。

SELECT *
FROM compatibility c
WHERE rating_id > 1
  AND NOT EXISTs
    (SELECT *
     FROM compatibility c2
     WHERE c.rating_id > 1
       AND (
             (c.attr1_id = c2.attr1_id) AND (c.attr2_id = c2.attr2_id)
             OR
             (c.attr1_id = c2.attr2_id) AND (c.attr2_id = c2.attr1_id)
           )
    )

我该如何实现这个目标?

1
为什么你的标题要求使用Postgres,而你却使用了MsSQL标签? - Erwin Brandstetter
@a_horse_with_no_name:目前Gordon提供的答案有一些不必要的部分,但没有窗口函数。它也适用于MySQL。 - Erwin Brandstetter
@ErwinBrandstetter:抱歉,我以为我在那里看到了first_value - a_horse_with_no_name
4个回答

3
我的目标是计算行数,其中rating_id = 1,但每个attr1_idattr2_id的组合只计算一次,并且仅在不存在任何其他行(由其他用户)具有rating_id > 1的情况下计算。

基于您的原始查询构建

您的原始查询已经朝着排除有问题的行的方向前进。您只需要将>更改为=,并且还需要进行计数。

SELECT count(*) AS ct
FROM  (
   SELECT 1
   FROM   compatibility c
   WHERE  rating_id = 1
   AND    NOT EXISTS (
      SELECT 1
      FROM   compatibility c2
      WHERE  c2.rating_id > 1
      AND   (c2.attr1_id = c.attr1_id AND c2.attr2_id = c.attr2_id OR
             c2.attr1_id = c.attr2_id AND c2.attr2_id = c.attr1_id))
   GROUP  BY least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
   ) sub;

更短更快

SELECT count(*) AS ct
FROM  (
   SELECT FROM compatibility  -- empty SELECT list is enough for count(*)
   GROUP  BY least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
   HAVING max(rating_id) <= 1
   ) sub;

类似于这个 更详细的早期答案
HAVING max(rating_id) <= 1 精确地实现了您的要求。

fiddle
旧版sqlfiddle


1

如果我理解正确,您想要评分始终为“1”的属性对。

这将为您提供以下属性:

select least(attr1_id, attr2_id) as a1, greatest(attr1_id, attr2_id) as a2,
       min(rating_id) as minri, max(rating_id) as maxri
from compatibility c
group by least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
having min(rating_id) = 1 and max(rating_id) = 1;

获取计数,只需将此用作子查询即可:

select count(*)
from (select least(attr1_id, attr2_id) as a1, greatest(attr1_id, attr2_id) as a2,
             min(rating_id) as minri, max(rating_id) as maxri
      from compatibility c
      group by least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
      having min(rating_id) = 1 and max(rating_id) = 1
     ) c

太棒了,谢谢。我如何才能不仅选择 count(*),而是选择需要评分的兼容性 c.id? - doque
1
只需在子查询中放置min(c.id)并单独运行它,而不需要外部查询。 - Gordon Linoff
我表述不清楚了 - 我的意思是如何选择所有这些不同的(attr1_id和attr2_id)对,它们仍然需要评分。这可能吗? - doque
@doque......我认为你应该提出另一个问题。 - Gordon Linoff
所以我有:http://stackoverflow.com/questions/26908868/postgresql-select-distinctcolumn1-column2-where-a-condition-holds - doque

1
在Postgresql中,可以这样做。SQLFiddle目前无法正常工作。
select count(*)
from (
    select least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
    from compatibility
    group by 1, 2
    having not bool_or(rating_id > 1)
) s
;
 count 
-------
     2
(1 row)

0
我会使用 CASE .. WHEN 来重新排列属性,使得较小的属性始终排在第一位,并按照此顺序排序。以下是示例查询语句。
SELECT attrSmall, 
       attrLarge,            
       MAX(rating_id) as ratingMax
  FROM (
   SELECT CASE WHEN c.attr1_id < c.attr2_id 
               THEN c.attr1_id 
               ELSE c.attr2_id END as attrSmall,
          CASE WHEN c.attr1_id < c.attr2_id 
               THEN c.attr2_id 
               ELSE c.attr1_id END as attrLarge,
          c.rating_id
    FROM compatibility c) as c1
  GROUP BY atrrSmall, attrLarge
  HAVING ratingMax = 1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接