在特定列上交叉选择语句

12

我有一个名为SalesDetails的表格,看起来像这样:

InvoiceID, LineID, Product

1,1,Apple
1,2,Banana
2,1,Apple
2,2,Mango
3,1,Apple
3,2,Banana
3,3,Mango

我的要求是返回包含苹果和香蕉销售的发票所对应的行,但如果该发票中还有其他产品,则不需要这些产品。

因此结果应为:

1,1,Apple
1,2,Banana
3,1,Apple
3,2,Banana

我尝试了以下方法:

Select * from SalesDetails where Product = 'Apple'
Intersect
Select * from SalesDetails where Product = 'Banana'

没能成功,因为似乎交集需要匹配所有列。

我希望做的是:

Select * from SalesDetails where Product = 'Apple'
Intersect ----On InvoiceID-----
Select * from SalesDetails where Product = 'Banana'

有没有一种方法可以做到这一点?

还是我必须先仅在发票编号上使用我的条件进行交集,然后再选择那些符合条件的发票编号的行,即:

Select * From SalesDetails
Where Product In ('Apple', 'Banana') And InvoiceID In 
  (
  Select InvoiceID from SalesDetails where Product = 'Apple'
  Intersect
  Select InvoiceID from SalesDetails where Product = 'Banana'
  )

这似乎有些浪费,因为它要对标准进行两次检查。


你正在使用哪个版本的SQL Server? - Arion
SQL Server 2014,开发人员版 - Xinneh
union 怎么样?@Xinneh - nevra
@nevra Union会返回销售了苹果但没有香蕉(或者反过来)的发票,我需要返回销售了苹果和香蕉的行在同一张发票上的发票。 - Xinneh
11个回答

4

这次我使用了CTE(公共表表达式),成功实现了对Apple/Banana信息的重复利用。

with sd as (
Select * from SalesDetails 
where (Product in ('Apple', 'Banana'))
)
Select * from sd where invoiceid in (Select invoiceid from 
  sd group by invoiceid having Count(distinct product) = 2)

SQL Fiddle


1
我喜欢这种方法。但是,你应该使用in而不是or,而我更喜欢exists而不是in(但我认为后者是个人偏好)。 - Gordon Linoff
我同意使用“in”更好。直到我保存了我的答案并看到你的答案后,我才想到这一点,我认为,“in”更好。但是我会更新我的答案,改用“in”而不是“or”。 - Leon Bambrick
1
@LeonBambrick 谢谢,这个代码确实能够得到结果,但它和我的上一个代码一样,会检查两次条件。虽然这不是问题,但我想知道是否有更好的方法来完成它。 - Xinneh
1
@Xinneh,我找到了一种方法,只需说明一次条件...如果有帮助的话。 - Leon Bambrick
1
@LeonBambrick,我喜欢它,谢谢。我将稍微修改一下以支持多个产品的条件,然后使用它。谢谢。就性能而言,它比我的原始代码快了1秒,但是嘿,更快就是更好。 - Xinneh

3
使用条件聚合实现它:
select * 
from SalesDetails
where product in ('apple', 'banana') and invoiceid in(
select invoiceid
from SalesDetails
group by invoiceid
having sum(case when product in('apple', 'banana') then 1 else 0 end) >= 2)

3

我认为OP的建议是最好的选择。以下方法可能会更快,但我预计差异微小,并且我还没有做过任何基准测试。

Select * From SalesDetails
Where Product ='Apple' And InvoiceID In
(
Select InvoiceID from SalesDetails where Product = 'Banana'
)
union all
select * from SalesDetails
Where Product ='Banana' And InvoiceID In
(
Select InvoiceID from SalesDetails where Product = 'Apple'
)

能够返回预期结果,但速度慢了大约3倍。不管怎样,谢谢,我想这已经是最优化的了。 - Xinneh

3
自我连接可以解决这个问题。
SELECT T1.*
FROM SalesDetails T1 
INNER JOIN SalesDetails T2 ON T1.InvoiceId = T2.InvoiceId
   AND (T1.Product = 'Apple' AND T2.Product = 'Banana'
     OR T1.Product = 'Banana' AND t2.Product = 'Apple')

应该将 Select * 改为 select T1.* - Taemyr

2
首先,您需要对符合条件的Product = 'Apple' or 'Banana'的每个InvoiceID的行数进行COUNT。然后进行SELF-JOIN并过滤行,使得COUNT必须为>= 2或者是您的条件中Product的数量。

SQL Fiddle

SELECT sd.*
FROM (
    SELECT InvoiceID, CC = COUNT(*)
    FROM SalesDetails
    WHERE Product IN('Apple', 'Banana')
    GROUP BY InvoiceID
)t
INNER JOIN SalesDetails sd
    ON sd.InvoiceID = t.InvoiceID
WHERE
    t.CC >= 2
    AND sd.Product IN('Apple', 'Banana')

它没有返回发票3吗?我以为它会的。=) - Felix Pamittan
虽然我不确定这是否比OP的建议更快。 - Taemyr

2

另一种做法是像这样使用PIVOT

DECLARE @DataSource TABLE
(
    [InvoiceID] TINYINT
   ,[LineID] TINYINT
   ,[Product] VARCHAR(12)
);

INSERT INTO @DataSource ([InvoiceID], [LineID], [Product])
VALUES (1,1,'Apple')
      ,(1,2,'Banana')
      ,(2,1,'Apple')
      ,(2,2,'Mango')
      ,(3,1,'Apple')
      ,(3,2,'Banana')
      ,(3,3,'Mango');

SELECT *
FROM @DataSource
PIVOT
(
    MAX([LineID]) FOR [Product] IN ([Apple], [Banana])
) PVT
WHERE [Apple] IS NOT NULL 
    AND [Banana] IS NOT NULL;

它会以这种格式给你结果,但是如果你想要的话,可以使用 UNPIVOT 将它们转换:

enter image description here

或者你可以像这样使用 window 函数:

;WITH DataSource AS
(
    SELECT  *
            ,SUM(1) OVER (PARTITION BY [InvoiceID]) AS [Match]
    FROM @DataSource
    WHERE [Product] = 'Apple' OR [Product] = 'Banana'
)
SELECT *
FROM DataSource
WHERE [Match] =2 

2

这里是一种使用窗口函数的方法:

select sd.*
from (select sd.*,
             max(case when product = 'Apple' then 1 else 0 end) over (partition by invoiceid) as HasApple,
             max(case when product = 'Banana' then 1 else 0 end) over (partition by invoiceid) as HasBanana
      from salesdetails sd
     ) sd
where (product = 'Apple' and HasBanana > 0) or
      (product = 'Banana' and HasApple > 0);

2
declare @t table (Id int,val int,name varchar(10))
insert into @t (id,val,name)values
 (1,1,'Apple'),
(1,2,'Banana'),
(2,1,'Apple'),
(2,2,'Mango'),
(3,1,'Apple'),
(3,2,'Banana'),
(3,3,'Mango')
;with cte as (
select ID,val,name,ROW_NUMBER()OVER (PARTITION BY id ORDER BY val)RN from @t)
,cte2 AS(
select TOP 1 c.Id,c.val,c.name,C.RN from cte c
WHERE RN = 1
UNION ALL
select  c.Id,c.val,c.name,C.RN from cte c
WHERE c.Id <> c.val)
select Id,val,name from (
select Id,val,name,COUNT(RN)OVER (PARTITION BY Id )R from cte2 )R
WHERE R = 2

2
WITH cte
AS
(
SELECT  *
FROM [dbo].[SalesDetails]
WHERE [Product]='banana')
,cte1
AS
(SELECT  *
FROM [dbo].[SalesDetails]
WHERE [Product]='apple')

SELECT *
FROM cte c INNER JOIN cte1 c1
ON  c.[InvoiceID]=c1.[InvoiceID]

enter image description here


2

如果您只想编写一次条件,并且确信每个产品在任何订单中只出现一次,则可以使用以下代码:

SELECT * FROM (
  SELECT InvoiceID, Product
        ,COUNT(*) OVER (PARTITION BY InvoiceID) matchcount
  FROM SalesDetails
WHERE Product IN ('Apple','Banana') ) WHERE matchcount = 2;

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,