基于另一列的不同值聚合一列

4

这个场景基于另一个问题的架构,我不感兴趣讨论架构的有效性!

我想知道在SQL Server中是否有任何好的技术,可以根据另一列(id1)的不同值对一列(amount1)进行聚合。

Plan1扫描了table1两次,通过p_id进行两次聚合,然后将结果连接起来。似乎这可以得到改进。查询2在某些情况下可能会返回错误的结果,并且计划也更糟糕!

有什么想法吗?

DDL

IF OBJECT_ID('tempdb..#table1') IS NOT NULL DROP TABLE #table1;
IF OBJECT_ID('tempdb..#table2') IS NOT NULL DROP TABLE #table2;

CREATE TABLE #table1 (id1 int primary key nonclustered, amount1 int, p_id int);
CREATE CLUSTERED INDEX ix ON #table1 (p_id,id1);
INSERT INTO #table1
SELECT 1,500,10 UNION ALL
SELECT 2,700,20 UNION ALL
SELECT 3,500,10 UNION ALL
SELECT 4,450,20 UNION ALL
SELECT 5,300,10;

CREATE TABLE #table2 (id2 int primary key, amount2 int, id1 int);
INSERT INTO #table2
SELECT 1,300,1 UNION ALL
SELECT 2,200,1 UNION ALL
SELECT 3,200,2 UNION ALL
SELECT 4,500,2 UNION ALL
SELECT 5,400,3 UNION ALL
SELECT 6,150,4 UNION ALL
SELECT 7,300,4 UNION ALL
SELECT 8,300,5;

查询1

WITH t1
     AS (SELECT p_id,SUM(amount1) AS total1
         FROM   #table1
         GROUP  BY p_id),
     t2
     AS (SELECT p_id,SUM(amount2) AS total2
         FROM   #table2 table2
                JOIN #table1 table1
                  ON table1.id1 = table2.id1
         GROUP  BY p_id)
SELECT t1.p_id,total1,total2
FROM   t1
       JOIN t2
         ON t1.p_id = t2.p_id  

计划1

执行计划1

查询2

SELECT table1.p_id, 
       FLOOR(SUM(DISTINCT amount1 + table1.id1/100000000.0)) AS total1, 
       SUM(amount2) AS total2
FROM #table1 table1 JOIN #table2 table2 ON table1.id1=table2.id1
GROUP BY table1.p_id

计划 2

执行计划 1

2个回答

2

这个方法仅会扫描每个表中的记录一次:

SELECT  p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2
FROM    #table1 t1
CROSS APPLY
        (
        SELECT  SUM(amount2) AS s_amount2
        FROM    #table2 t2
        WHERE   t2.id1 = t1.id1
        ) t2
GROUP BY
        p_id

  |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1026]=(0) THEN NULL ELSE [Expr1027] END, [Expr1007]=CASE WHEN [Expr1028]=(0) THEN NULL ELSE [Expr1029] END))
       |--Stream Aggregate(GROUP BY:([t1].[p_id]) DEFINE:([Expr1026]=COUNT_BIG([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1027]=SUM([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1028]=COUNT_BIG([Expr1005]), [Expr1029]=SUM([Expr1005])))
            |--Nested Loops(Left Outer Join, OUTER REFERENCES:([t1].[id1]))
                 |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table1] AS [t1]), ORDERED FORWARD)
                 |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1024]=(0) THEN NULL ELSE [Expr1025] END))
                      |--Stream Aggregate(DEFINE:([Expr1024]=COUNT_BIG([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2]), [Expr1025]=SUM([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2])))
                           |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table2] AS [t2]), WHERE:([tempdb].[dbo].[#table2].[id1] as [t2].[id1]=[tempdb].[dbo].[#table1].[id1] as [t1].[id1]))

尽管这并不一定更有效率。

这个:

SELECT  p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2
FROM    #table1 t1
JOIN    (
        SELECT  id1, SUM(amount2) AS s_amount2
        FROM    #table2
        GROUP BY
                id1
        ) t2
ON      t2.id1 = t1.id1
GROUP BY
        p_id

将使用更多的选项来完成相同的连接操作,但如果选择 t2,则计划中可能会使用额外的 spool。


+1 那就是我想要的计划。我明白你的观点,效率需要评估。 - Martin Smith

2

嗯,@Quassnoi的解决方案似乎很不错。无论如何,对于SQL Server 2005+,您可以使用PARTITION BY子句尝试简化查询,但执行计划并不更好,虽然这并不一定意味着更高效。

SELECT A.p_id, MIN(amount1) total1, SUM(amount2) total2
FROM (SELECT p_id, id1, SUM(amount1) OVER(PARTITION BY p_id) amount1 FROM #table1) A
JOIN #table2 B
ON A.id1 = B.id1
GROUP BY A.p_id

SUM() OVER 意味着对 #table1 进行两次遍历,我相信这正是 @op 所试图避免的。虽然这可能确实更好。 - Quassnoi
@Quassnoi,是的,我知道,我喜欢你的解决方案更好。我只是想给@op提供另一个选项,可能会更快。 - Lamak

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接