使用递归SQL(CTE)导致性能变慢

3

非常抱歉,这是一个有点长的问题,但没有简单的表达方式。

我有以下查询:

SELECT 
    S.*
FROM 
    Stock S
LEFT JOIN 
    Stock_Category SC ON SC.StockId = S.Id
WHERE 
    S.Published = 1 
    AND (@CategoryId IS NULL OR 
         (SELECT COUNT(*) 
          FROM GetParentCategoriesByCategoryId(SC.CategoryId) 
          WHERE Id = @CategoryId) > 0) 

GetParentCategoriesByCategoryId() 函数中,我使用了以下通用表达式 (CTE):

DECLARE @TableOutput TABLE(Id UNIQUEIDENTIFIER, 
                            PosDissectionId INT,
                            PosFamilyClassId INT,
                            ParentId UNIQUEIDENTIFIER,
                            Code NVARCHAR(25),
                            [Name] NVARCHAR(100),
                            Description NVARCHAR(1000),
                            AzureId UNIQUEIDENTIFIER,
                            Extension NVARCHAR(10),
                            Visible BIT,
                            OrderIndex INT,
                            StockCount INT,
                            Depth INT)
BEGIN
    DECLARE @TotalVisible INT,
            @TotalRows INT

    ;WITH CategoryStructure (Id, ParentId, ParentName, Name, Depth, Visible)
    As 
    ( 
        SELECT 
            C.Id, 
            C.ParentId, 
            CAST('' AS NVARCHAR(500)) AS ParentName, 
            C.Name, 
            0 AS Depth, 
            C.Visible
        FROM 
            Category C
        WHERE 
            Id = @LocalCategoryId

        UNION ALL

        SELECT 
            ParentCategory.Id, 
            ParentCategory.ParentId, 
            CategoryStructure.Name AS ParentName, 
            ParentCategory.Name, 
            CategoryStructure.Depth + 1,
            ParentCategory.Visible
        FROM 
            Category ParentCategory
        INNER JOIN 
            CategoryStructure ON ParentCategory.Id = CategoryStructure.ParentId
    )
    INSERT INTO @TableOutput
        SELECT          
            C.*,
            SC.StockCount,
            CS.Depth
        FROM 
            CategoryStructure CS 
        INNER JOIN 
            Category C ON  C.Id = CS.Id
        LEFT JOIN 
            (SELECT CategoryId, COUNT(*) AS StockCount 
             FROM Stock_Category SC
             INNER JOIN Stock S ON S.Id = SC.StockId
             WHERE S.Published = 1 AND 
                 ((S.WidthMM IS NOT NULL AND 
                   S.HeightMM IS NOT NULL AND 
                   S.DepthMM IS NOT NULL AND
                    S.WeightG IS NOT NULL)) AND
                CategoryId IN(SELECT CategoryId FROM CategoryStructure)
        GROUP BY CategoryId

    ) SC ON SC.CategoryId = CS.Id

    WHERE (@IncludeSelf = 1 OR CS.Id != @CategoryId) 

    SELECT 
        @TotalVisible = SUM(CONVERT(INT, Visible)),
        @TotalRows = COUNT(*) 
    FROM @TableOutput

    IF @TotalVisible <> @TotalRows
        DELETE FROM @TableOutput    

    RETURN
END

我的查询执行计划看起来是这样的。
不幸的是,我对于2000行查询需要超过7秒的时间。我相信我已经添加了正确的索引(并且似乎显示查询正在使用它们)。
我已经能够将问题缩小到CTE中的LEFT JOIN。
   SELECT CategoryId, COUNT(*) AS StockCount 
   FROM Stock_Category SC
   INNER JOIN Stock S ON S.Id = SC.StockId
   WHERE S.Published = 1 AND blah blah blah....

因为当我删除它时,性能显著提高,但这是我目前能推断出的全部。
我并不期望得到解决方案,因为我知道这基于许多因素,但我离SQL专家还很远,希望有人能提供任何关于我需要查找什么的指导?
表模式可以在此处找到:https://www.dropbox.com/s/tpetq6fky58fhti/schemas.sql?dl=0

1
你能提供涉及表的样本数据和架构吗? - TheGameiswar
1
请使用Paste The Plan @ brentozar.com分享您的执行计划,以下是使用说明:如何使用Paste the Plan - SqlZim
3
你可能不想使用 (SELECT COUNT(*) FROM GetParentCategoriesByCategoryId(SC.CategoryId) WHERE Id = @CategoryId) > 0),而是使用 EXISTS (SELECT 1 from GetParentCategoriesByCategoryId(SC.CategoryId) WHERE Id = @CategoryId。此外,如果可以的话,最好消除该函数调用,因为递归CTE不太快。 - SQLMason
2
你的多语句表值函数在这里起到了反效果,我会将其重写为内联表值函数。"何时一个SQL函数不是函数?如果它不是内联的,那就是垃圾。" - Rob Farley - SqlZim
3
你能否在不为每个单独的类别ID调用多语句TVF的情况下,提前将整个层次结构实体化以获得满足你条件的所有类别ID的结果?然后使用类似于 SELECT S.* FROM Stock S LEFT JOIN Stock_Category SC ON SC.StockId = S.Id WHERE S.Published = 1 AND (@CategoryId IS NULL OR SC.CategoryId IN (SELECT CategoryId FROM #Cats)) OPTION (RECOMPILE) 的语句进行查询。 - Martin Smith
显示剩余8条评论
3个回答

3

如果有人感兴趣,最终的解决方案包括重新制作索引,利用上面评论中的一些建议,重要的是删除了临时表。

最终,我成功将查询时间减少到不到1秒,这也是目标。但我对Group By不是很确定,想知道是否有更好的方法?还有其他人有什么进一步的改进吗?

    WITH categorystructure (id, parentid, parentname, NAME, depth, visible) 
     AS (SELECT C.id, 
                C.parentid, 
                Cast('' AS NVARCHAR(500)) AS ParentName, 
                C.NAME, 
                0                         AS Depth, 
                C.visible 
         FROM   category C 
         WHERE  id = @CategoryId 
         UNION ALL 
         SELECT ParentCategory.id, 
                ParentCategory.parentid, 
                categorystructure.NAME      AS ParentName, 
                ParentCategory.NAME, 
                categorystructure.depth + 1 AS Depth, 
                ParentCategory.visible 
         FROM   category ParentCategory 
                INNER JOIN categorystructure 
                        ON ParentCategory.id = categorystructure.parentid) 
SELECT C.*, 
       Isnull(SC.stockcount, 0) AS StockCount, 
       CS.depth 
FROM   categorystructure CS 
       INNER JOIN category C 
               ON C.id = CS.id 
       LEFT JOIN (SELECT categoryid, 
                         Count(*) AS StockCount 
                  FROM   stock_category SC 
                         INNER JOIN stock S 
                                 ON S.id = SC.stockid 
                  WHERE  S.published = 1 
                         AND ( @AustPostShippingEnabled = 0 
                                OR ( S.widthmm IS NOT NULL 
                                     AND S.heightmm IS NOT NULL 
                                     AND S.depthmm IS NOT NULL 
                                     AND S.weightg IS NOT NULL ) ) 
                  GROUP  BY categoryid) SC 
              ON SC.categoryid = CS.id 
WHERE  ( @IncludeSelf = 1 
          OR CS.id != @CategoryId ) 
GROUP  BY C.id, 
          C.posdissectionid, 
          C.posfamilyclassid, 
          C.parentid, 
          C.code, 
          C.NAME, 
          C.description, 
          C.azureid, 
          C.extension, 
          C.visible, 
          C.orderindex, 
          SC.stockcount, 
          CS.depth 
HAVING Sum(CONVERT(INT, CS.visible)) = Count(*) 

HAVING Sum(CONVERT(INT, CS.visible)) = Count(*) 等同于 WHERE CS.visible = cast(1 as bit),只是这么说而已:) 这将让您摆脱分组。 - Anand
@Anand 感谢您的建议,但我不认为您是正确的。WHERE CS.visible = cast(1 as bit) 如何实现与 Sum(CONVERT(INT, CS.visible)) = Count(*) 相同的功能?您确定您理解意图了吗? - Maxim Gershkovich
如果您将布尔值转换为整数并对其进行求和,然后将其与记录数进行比较,只有当所有行都为真时它们才会匹配。这不难理解;另外,如果您从上面的建议中获得了有用的答案,请给它们点赞。谢谢 :) - Anand
你使用什么格式化程序?格式非常好。这不是手动的,对吧? - Paul-Sebastian Manole

1

我已经做出了两个更改:

1)使函数内联(HAVING子句)

2)使用outer apply替换LEFT JOIN。

WITH CategoryStructure (Id, ParentId, ParentName, Name, Depth, Visible)
As 
( 
    SELECT 
        C.Id, 
        C.ParentId, 
        CAST('' AS NVARCHAR(500)) AS ParentName, 
        C.Name, 
        0 AS Depth, 
        C.Visible
    FROM 
        Category C
    WHERE 
        Id = @LocalCategoryId

    UNION ALL

    SELECT 
        ParentCategory.Id, 
        ParentCategory.ParentId, 
        CategoryStructure.Name AS ParentName, 
        ParentCategory.Name, 
        CategoryStructure.Depth + 1,
        ParentCategory.Visible
    FROM 
        Category ParentCategory
    INNER JOIN 
        CategoryStructure ON ParentCategory.Id = CategoryStructure.ParentId
)
INSERT INTO @TableOutput
    SELECT          
        C.*,
        SC.StockCount,
        CS.Depth
    FROM 
        CategoryStructure CS 
    INNER JOIN 
        Category C ON  C.Id = CS.Id
    OUTER APPLY
        (SELECT CategoryId, COUNT(*) AS StockCount 
         FROM Stock_Category SC
         INNER JOIN Stock S ON S.Id = SC.StockId
         WHERE S.Published = 1 AND 
             ((S.WidthMM IS NOT NULL AND 
               S.HeightMM IS NOT NULL AND 
               S.DepthMM IS NOT NULL AND
                S.WeightG IS NOT NULL)) AND
            CategoryId = CS.Id
        ) SC

WHERE (@IncludeSelf = 1 OR CS.Id != @CategoryId) 
HAVING SUM(CONVERT(INT, Visible)) = COUNT(*)

P.S. 第一个查询看起来很奇怪(你有@CategoryId参数,但没有用它来查找。你先构建所有可能的树,然后再进行过滤)。我认为你的算法有误,是否可以编写GetParentCategoriesByCategoryId(@CategoryId)


-1

两件事:

将in子句更改为内连接:
SELECT CategoryId, COUNT(*) AS StockCount 
FROM Stock_Category SC
INNER JOIN Stock S ON S.Id = SC.StockId
WHERE S.Published = 1 AND 
((S.WidthMM IS NOT NULL AND 
S.HeightMM IS NOT NULL AND 
S.DepthMM IS NOT NULL AND
S.WeightG IS NOT NULL)) AND
CategoryId IN(SELECT CategoryId FROM CategoryStructure)
GROUP BY CategoryId

到 -

SELECT CategoryId, COUNT(*) AS StockCount 
FROM Stock_Category SC
INNER JOIN Stock S ON S.Id = SC.StockId
inner join CategoryStructure as CS
on CS.CategoryId = SC.CategoryId
WHERE S.Published = 1 AND 
((S.WidthMM IS NOT NULL AND 
S.HeightMM IS NOT NULL AND 
S.DepthMM IS NOT NULL AND
S.WeightG IS NOT NULL)) AND
GROUP BY CategoryId
  1. 你的查询主要花费时间在 IX_StockAllColumns 的索引查找上。如果它确实是所有列的非聚集索引,请创建一个新的非聚集索引,包括 Published、WidthMM、HeightMM、DepthMM 和 WeightG 列。

我认为你并没有完全理解原始查询、问题、模式或表值函数。 - SqlZim
INNER JOIN替换IN子句不会影响查询计划。 - Mikhail Lobanov

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接