根据相关节点选择层次结构中的行

3

我有一张自引用的表格 Foo

[Id] int NOT NULL,
[ParentId] int NULL,   --Foreign key to [Id]
[Type] char(1) NOT NULL

[Id]是集群化主键,[ParentId][Type]列上有索引。

假设层级的最大深度为1(子节点不能有子节点)。


我想要获取所有满足以下条件的Foo行:

  • Type为A
  • 其家族树中有B
  • 其家族树中有CD

下面使用JOIN的查询返回所需结果,但性能非常差。

SELECT DISTINCT [Main].*

FROM Foo AS [Main]

--[Main] may not be root node
LEFT OUTER JOIN Foo AS [Parent]
    ON [Parent].[Id] = [Main].[ParentId]

--Must have a B in tree
INNER JOIN Foo AS [NodeB]
    ON (
        [NodeB].[Pid] = [Main].[Pid]            --Sibling
            OR [NodeB].[ParentId] = [Main].[Id] --Child
            OR [NodeB].[Id] = [Parent].[Id]     --Parent
    )
        AND [NodeB].[Type] = 'B'

--Must have a C or D in tree
INNER JOIN Foo AS [NodeCD]
    ON (
        [NodeCD].[Pid] = [Main].[Pid]            --Sibling
            OR [NodeCD].[ParentId] = [Main].[Id] --Child
            OR [NodeCD].[Id] = [Parent].[Id]     --Parent
    )
        AND [NodeCD].[Type] IN ('C', 'D')

WHERE [Main].[Type] = 'A'

从实际执行计划来看,仅查看了65万行中的前1万行。 查询执行计划


如果我从查询中删除--Parent行。

OR [NodeB].[Id] = [Parent].[Id]  --Parent
OR [NodeCD].[Id] = [Parent].[Id] --Parent

那么执行就几乎瞬间完成了,但它错过了A是一个仅有一个兄弟的子节点的情况。

Misses this:    Catches this:
B               B
├A              ├A
└C              ├B
                └C

我尝试使用CTE来实现这个功能,因为从性能方面来看它似乎更有前途,但我无法想出如何排除不符合条件的树。

目前的CTE:

WITH [Parent] AS 
(
SELECT  *
FROM    [Foo]
WHERE   [ParentId] IS NULL

UNION ALL
SELECT  [Child].*
FROM    Foo AS [Child]
JOIN    [Parent]
ON      [Child].[ParentId] = [Parent].Id
WHERE   [Child].[Type] = 'P'

UNION ALL
SELECT  [ChildCD].*
FROM    Foo AS [ChildCD]
JOIN    [Parent]
ON      [ChildCD].[ParentId] = [Parent].Id
WHERE   [ChildCD].[Type] IN ('C', 'D')
)

SELECT  *
FROM [Parent]
WHERE [Type] = 'I';

然而,如果我尝试添加兄弟-子代-父代OR语句,我会达到100次递归的最大限制。


带有测试数据的SQL Fiddle


根据所需的结果(在sqlfiddle上),是的,它确实可以。 - Lamak
6个回答

4
像这样的东西怎么样?
select
    [F].[Id]
from
    [Foo] [F]
where
    [F].[Type] = 'A' and
    (
        (
            [F].[ParentId] is null and
            exists (select 1 from [Foo] [Child] where [F].[Id] = [Child].[ParentId] and [Child].[Type] = 'B') and
            exists (select 1 from [Foo] [Child] where [F].[Id] = [Child].[ParentId] and [Child].[Type] in ('C', 'D'))
        ) or
        (
            [F].[ParentId] is not null and
            exists (select 1 from [Foo] [ParentOrSibling] where [F].[ParentId] in ([ParentOrSibling].[Id], [ParentOrSibling].[ParentId]) and [ParentOrSibling].[Type] = 'B') and
            exists (select 1 from [Foo] [ParentOrSibling] where [F].[ParentId] in ([ParentOrSibling].[Id], [ParentOrSibling].[ParentId]) and [ParentOrSibling].[Type] in ('C', 'D'))
        )
    );

哇!这个运行瞬间完成了! - Aaroninus

4

如果要考察的节点是根节点,则与其作为子节点时有很大不同,因此最好分别查询这两种情况并形成两个集合的UNION ALL。但是,你可以使用一个公共表达式来简化这个过程,并确定包含所需节点的树形结构。总体上看起来可能像这样:

WITH [TargetFamilies] AS (
    SELECT
      COALESCE(ParentId, Id) AS FamilyId
    FROM Foo
    GROUP BY COALESCE(ParentId, Id)
    HAVING 
      COUNT(CASE Type WHEN 'B' THEN 1 END) > 0
      AND COUNT(CASE Type WHEN 'C' THEN 1 WHEN 'D' THEN 1 END) > 0
)

-- root nodes
SELECT [Main].*
FROM
  Foo AS [Main]
  JOIN [TargetFamilies] ON [Main].Id = [TargetFamilies].FamilyId
WHERE
  [Main].Type = 'A'

UNION ALL

-- child nodes
SELECT 
  [Main].*
FROM
  Foo AS [Main]
  JOIN [TargetFamilies] ON [Main].ParentId = [TargetFamilies].FamilyId
WHERE
  [Main].Type = 'A'

这是非常快的!它在1.5秒内完成了我整个650,000行测试表。 - Aaroninus
1
非常聪明!合并ID以获取家庭ID是我无法实现的部分。 - Paul Griffin

4
我无法预测效率,但这里有另一种解决方案:
SELECT *
FROM Foo AS f
WHERE Type = 'A'
  AND ParentId IS NULL
  AND EXISTS 
      ( SELECT *
        FROM Foo AS ch
        WHERE ch.ParentId = f.Id
          AND ch.Type = 'B'
      )
  AND EXISTS 
      ( SELECT *
        FROM Foo AS ch
        WHERE ch.ParentId = f.Id
          AND ch.Type IN ('C', 'D')
      ) 
UNION ALL

SELECT *
FROM Foo AS f
WHERE Type = 'A'
  AND ParentId IS NOT NULL
  AND EXISTS
    ( SELECT 1
      FROM
          ( SELECT *
            FROM (VALUES ('B'), ('C'), ('D')) AS x (Type)
          EXCEPT
            SELECT p.Type
            FROM Foo AS p
            WHERE f.ParentId = p.Id
          EXCEPT
            SELECT sib.Type
            FROM Foo AS sib
            WHERE f.ParentId = sib.ParentId
          ) AS x
       HAVING MIN(Type) = MAX(Type) AND MIN(Type) <> 'B'
           OR MIN(Type) IS NULL
    ) ; 

SQLfiddle中进行了测试。


效率非常出色。在我的650K测试表上只花了2秒钟。 - Aaroninus

1
这组数据很难进行优化,但也许可以尝试一下。LEFT OUTER JOIN 似乎是多余的。此外,执行计划在内部循环中没有显示出96%的命中率。
SELECT DISTINCT [Main].*
FROM Foo AS [Main]


--Must have a B in tree
INNER JOIN Foo AS [NodeB]
    ON (
        [NodeB].[ParentId] = [Main].[ParentId]            --Sibling
            OR [NodeB].[ParentId] = [Main].[Id] --Child
            OR [NodeB].[Id] = [Main].[ParentId]     --Parent
    )
        AND [NodeB].[Type] = 'B'

--Must have a C or D in tree
INNER JOIN Foo AS [NodeCD]
    ON (
        [NodeCD].[ParentId] = [Main].[ParentId]            --Sibling
            OR [NodeCD].[ParentId] = [Main].[Id] --Child
            OR [NodeCD].[Id] = [Main].[ParentId]     --Parent
    )
        AND [NodeCD].[Type] IN ('C', 'D')

WHERE [Main].[Type] = 'A'

请发布您的结果。希望这能有所帮助。

嗯,根据 sqlfiddle 的说法,这确实运行正常。+1 - Lamak
不幸的是,这实际上性能更差。执行时间从1:20变成了13:20,实际行数(如问题中的图片)从约2亿增加到约30亿。 - Aaroninus
我应该澄清一下,这是针对在650k行表格上运行10k行测试,而不是在fiddle上运行的。 - Aaroninus

1

天啊,这比我想象的要花费更长时间,肯定有更好的方法,但这是我的看法:

WITH CTE AS
(
    SELECT Id, ParentId FamilyId, [Type]
    FROM dbo.Foo
    UNION
    SELECT A.Id, B.Id, B.[Type]
    FROM dbo.Foo A
    INNER JOIN dbo.Foo B
        ON A.ParentId = B.Id
)
SELECT DISTINCT B.Id
FROM CTE A
INNER JOIN dbo.Foo B
    ON A.Id = B.Id
    OR A.FamilyId = B.Id
WHERE B.[Type] = 'A'
AND EXISTS( SELECT 1 FROM CTE
            WHERE FamilyId = A.FamilyId
            AND [Type] = 'B')
AND EXISTS( SELECT 1 FROM CTE
            WHERE FamilyId = A.FamilyId
            AND [Type] IN ('C','D'));

这里是修改后的 sqlfiddle。


0

使用递归CTE。这对于任何多级层次结构都可以工作:

DECLARE @t TABLE
    (
      ID INT ,
      ParentID INT ,
      Type CHAR(1)
    )

INSERT  INTO @t
VALUES  ( 1, NULL, 'A' ),
        ( 2, 1, 'B' ),
        ( 3, NULL, 'C' ),
        ( 4, NULL, 'A' ),
        ( 5, 4, 'B' ),
        ( 6, 4, 'C' ),
        ( 7, NULL, 'A' ),
        ( 8, 7, 'B' ),
        ( 9, 8, 'D' ),
        ( 10, NULL, 'D' ),
        ( 11, 10, 'A' ),
        ( 12, 11, 'B' ),
        ( 13, 8, 'D' );

WITH    cte1
          AS ( SELECT   ID ,
                        ParentID ,
                        Type ,
                        ID AS GroupID ,
                        0 AS B ,
                        0 AS CD
               FROM     @t
               WHERE    Type = 'A'
               UNION ALL
               SELECT   t.ID ,
                        t.ParentID ,
                        t.Type ,
                        c.GroupID ,
                        CASE WHEN t.Type = 'B' THEN 1
                             ELSE 0
                        END ,
                        CASE WHEN t.Type IN ( 'C', 'D' ) THEN 1
                             ELSE 0
                        END
               FROM     @t t
                        JOIN cte1 c ON t.ParentID = c.ID
             ),
        cte2
          AS ( SELECT   ID ,
                        ParentID ,
                        Type ,
                        ID AS GroupID ,
                        0 AS B ,
                        0 AS CD
               FROM     @t
               WHERE    Type = 'A'
               UNION ALL
               SELECT   t.ID ,
                        t.ParentID ,
                        t.Type ,
                        c.GroupID ,
                        CASE WHEN t.Type = 'B' THEN 1
                             ELSE 0
                        END ,
                        CASE WHEN t.Type IN ( 'C', 'D' ) THEN 1
                             ELSE 0
                        END
               FROM     @t t
                        JOIN cte2 c ON t.ID = c.ParentID
             ),
        filter
          AS ( SELECT   ID ,
                        Type ,
                        SUM(B) OVER ( PARTITION BY GroupID ) AS B ,
                        SUM(CD) OVER ( PARTITION BY GroupID ) AS CD
               FROM     ( SELECT    *
                          FROM      cte1
                          UNION
                          SELECT    *
                          FROM      cte2
                        ) t
             )
    SELECT  t.*
    FROM    filter f
            JOIN @t t ON t.ID = f.ID
    WHERE   f.Type = 'A'
            AND B > 0
            AND cd > 0

输出:

ID  ParentID    Type
4   NULL        A
7   NULL        A
11  10          A

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接