SQL使用group by和intersect

3
考虑到这两个不同的项目组,如何找到在set2中跨越一个以上组的set1组?换句话说,如何找到set1中无法被set2中单个组覆盖的组?
例如,对于下面的表格,A(1、2、5)是唯一横跨s1(1、2、3)和s2(2、3、4、5)的组。B和C都不是答案,因为它们都被单个组s2覆盖。
我更喜欢使用SQL(可用的Sql Server 2008 R2)。
谢谢。
set1                            set2
 +---------+----------+          +---------+----------+
 | group   |  item    |          | group   |  item    |
 `````````````````````+          `````````````````````+
 |   A     |    1     |          |   s1    |    1     |
 |   A     |    2     |          |   s1    |    2     |
 |   A     |    5     |          |   s1    |    3     |
 |   B     |    4     |          |   s2    |    2     |
 |   B     |    5     |          |   s2    |    3     |
 |   C     |    3     |          |   s2    |    4     |
 |   C     |    5     |          |   s2    |    5     |
 +---------+----------+          +---------+----------+

请使用此sqlfiddle进行尝试:http://sqlfiddle.com/#!6/fac8a/3 或者使用下面的脚本生成临时表来尝试答案:
create table #set1 (grp varchar(5),item int)
create table #set2 (grp varchar(5),item int)

insert into #set1 select 'a',1 union select 'a',2 union select 'a',5 union select 'b',4 union select 'b',5 union select 'c',3 union select 'c',5
insert into #set2 select 's1',1 union select 's1',2 union select 's1',3 union select 's2',2 union select 's2',3 union select 's2',4 union select 's2',5

select * from #set1
select * from #set2

--drop table #set1
--drop table #set2

那么你期望的结果是什么?A?还是... - MrSimpleMind
这是正确的。B和C都可以仅通过s2满足。 - S2L
C语言中的s1和s2没有跨度吗?因为3也在s1中。 - Giorgi Nakeuri
我理解你的意思。我应该重新表述为“找到无法由set2中单个组满足的set1组”。谢谢。 - S2L
7个回答

3

set1中选择组,使得在set2中不存在所有set1项目的组:

select s1.grp from set1 s1
where not exists(
  select * from set2 s2 where not exists(
    select item from set1 s11 
    where s11.grp = s1.grp 
    except 
    select item from set2 s22
    where s22.grp = s2.grp))
group by s1.grp

我猜你会给出A和C,但S2L只想要A? - MrSimpleMind
@MrSimpleMind,不会,因为对于C来说,count(distinct set2.group)将只有1。 - Giorgi Nakeuri
不好意思,你是正确的。它也会给出C。我猜测OP在测试数据上有误。因为C也跨越了set2中的2个组。 - Giorgi Nakeuri
测试数据是正确的。我的问题不太明确,我已经编辑过了。 - S2L

2

好的。虽然不太美观,但应该能用。我在fiddle上试过了。我认为可以通过窗口处理来完成,但我还需要考虑一下。

暂时先用这个不太美观的方案。

WITH d1 AS (
SELECT set1.grp
     , COUNT(*) cnt
  FROM set1
 GROUP BY set1.grp
), d2 AS (  
SELECT set1.grp grp1
     , set2.grp grp2
     , COUNT(set1.item) cnt
  FROM set1
 INNER JOIN set2
    ON set1.item = set2.item
 GROUP BY set1.grp
     , set2.grp
 )
SELECT grp
  FROM d1
EXCEPT  
SELECT d1.grp 
  FROM d1
 INNER JOIN d2
    ON d2.grp1 = d1.grp
   AND d2.cnt = d1.cnt

我一点也不认为那很丑。我比接受的答案更喜欢它。 - paparazzo

0
您可以通过以下查询找到解决方案:
SELECT A.GROUP AS G1, A.ITEM AS T1, B.GROUP, B.ITEM
FROM SET1 A RIGHT JOIN SET2 B ON A.ITEM=B.ITEM
WHERE A.GROUP IS NULL

没成功。请尝试使用我在问题中添加的示例表格。 - S2L

0

你能检查一下吗?

SELECT DISTINCT a.Group1, a.Item, b.CNT
FROM SET1 a
INNER JOIN
(SELECT GroupA, COUNT(*) CNT
 FROM
 (
    SELECT DISTINCT a.Group1 GroupA, b.Group1 GroupB
    FROM SET1 a
        INNER JOIN SET2 b ON a.Item = b.Item
 ) a GROUP BY GroupA
) b ON a.Group1 = b.GroupA
 WHERE b.CNT > 1

这将返回A和C,这是不正确的。我在SQL中放置了示例数据以帮助尝试。 - S2L
C 有 3 和 5。其中 3 被 S1 包含,而 5 被 S2 包含。 - ps_prakash02
希望你能从编辑中理解我的意思(在set1中找到那些不能被set2中的单个组覆盖的组)。 - S2L

0

感谢您的评论。我相信下面编辑过的查询将会起作用:

Select distinct grp1, initialRows, max(MatchedRows) from 
  (
  select a.grp as grp1, b.grp as grp2
  , count(distinct case when b.item is not null then a.item end) as MatchedRows
  , d.InitialRows
  from set1 a
  left join set2 b
  on a.item = b.item
  left join 
    (select grp, count(distinct Item) as InitialRows from set1
     group by grp) d
  on a.grp = d.grp
  group by a.grp, b.grp, InitialRows
) c
group by grp1, InitialRows
having max(MatchedRows) < InitialRows

我尝试修复了一下你的查询,但是它没有给出正确的答案。这是 SQLFiddle 的链接:http://sqlfiddle.com/#!6/90511/2 - S2L
谢谢@S2L!有了您的编辑,我成功让SqlFiddle工作了,我认为我已经做出了必要的更正!周末没有SQL服务器副本访问时,这总是更难做到的... - APH

0

我认为这个可以解决问题。子查询返回每个set1组中与所有set1项都匹配的set2组,通过计算匹配项并将其与set1组数量进行比较。

select s.grp from #set1 s
group by s.grp
having not exists ( 
    select s2.grp from #set2 s2 inner join #set1 s1 on s2.item = s1.item
    where s1.grp = s.grp
    group by s2.grp
    having count(s.item) = count(s2.item)   
    )

0

基本上和Robert Co一样
我没有从他的回答中得到这个 - 这是我独立想出来的

    select set1.group  
      from set1
except
    select set1count.group 
      from ( select set1.group            , count(*) as [count]  
               from set1 
           ) as set1count 
      join ( select set1.group as [group1], count(*) as [count]  
               from set1 
               join set2 
                 on set2.item = set1.item 
              group by set1.group, set2.group -- this is the magic
           ) as set1count 
        on set1count.group = set2count.[group1] -- note no set2.group match
       and set1count.count = set12count.count -- the items in set1 are in at least on set2 group

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接