查找CTE中的无限递归循环

20

我不是SQL专家,但如果有人能帮我解决问题。

我使用递归CTE获取以下值。

Child1 --> Parent 1

Parent1 --> Parent 2

Parent2 --> NULL

如果数据填充错误,那么由于此原因,CTE可能会进入无限递归循环并引发最大递归错误。由于数据量很大,我无法手动检查这个坏数据。请告诉我是否有一种方法可以找出它。

Child1 --> Parent 1

Parent1 --> Child1

或者

Child1 --> Parent 1

Parent1 --> Parent2

Parent2 --> Child1


1
你正在使用哪种数据库管理系统?Postgres?Oracle? - user330315
7个回答

45
使用Postgres很容易通过将所有访问过的节点收集到一个数组中来防止这种情况发生。
设置:
create table hierarchy (id integer, parent_id integer);

insert into hierarchy
values
(1, null), -- root element
(2, 1), -- first child
(3, 1), -- second child
(4, 3), 
(5, 4), 
(3, 5); -- endless loop

递归查询:
with recursive tree as (
  select id, 
         parent_id, 
         array[id] as all_parents
  from hierarchy
  where parent_id is null
  
  union all
  
  select c.id, 
         c.parent_id,
         p.all_parents||c.id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
     and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
)
select *
from tree;

要同时对多棵树进行此操作,您需要将根节点的ID传递给子节点。
with recursive tree as (
  select id, 
         parent_id, 
         array[id] as all_parents, 
         id as root_id
  from hierarchy
  where parent_id is null
  
  union all
  
  select c.id, 
         c.parent_id,
         p.all_parents||c.id, 
         p.root_id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
     and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
     and c.root_id = p.root_id
)
select *
from tree;

Postgres 14 更新

Postgres 14 引入了(符合标准的)CYCLE 选项来检测循环:

with recursive tree as (
  select id, 
         parent_id
  from hierarchy
  where parent_id is null

  union all

  select c.id, 
         c.parent_id
  from hierarchy c
     join tree p
      on c.parent_id = p.id 
)
cycle id -- track cycles for this column
   set is_cycle -- adds a boolean column is_cycle
   using path -- adds a column that contains all parents for the id
select *
from tree
where not is_cycle

文档中所述, 这个语法是一个快捷方式,相当于手动添加is_cyclepath数组行。

最简洁的解决方案。 - Adrian Mitev
同意@adrian-mitev的观点:这是优雅的简洁,我认为应该被接受为答案。 - Victoria Stuart
1
@VictoriaStuart:被接受的答案是针对SQL Server的,虽然Interstellar从未确认过,但可以安全地假设她/他正在使用它,因此这个答案可能不适用。 - user330315
鉴于我有多个父层次结构(https://dwbi1.wordpress.com/2017/10/18/hierarchy-with-multiple-parents/),是否有任何选项可以使用类似的解决方案?显然,我不能将“id”添加到“all_parents”,因为这可能仍然是我的多个父层次结构中不同“parent”的有效“id”。 - NeverEndingQueue
@NeverEndingQueue:你需要在检查中包含根节点的ID。请参考我的编辑(未经测试)。 - user330315

10

您没有指定方言或列名,因此很难提供完美的示例...

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 2, 'SubChild')
-- End random data

;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
(
    SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
    UNION ALL
    SELECT R.StartingID, R.Level + 1, 
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.*
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description 
    FROM RecursiveCTE 
    ORDER BY StartingID, Level

如果/在递归CTE中存在循环,类似这样的内容将显示循环。查看列Loop。按照数据的方式,没有循环。在注释中,有关于如何更改值以引起循环的示例。

最终,递归CTE创建了一个形式为|id1|id2|id3|(称为Parents)的ids的VARCHAR(MAX),然后检查当前ID是否已经在该“列表”中。如果是,则将Loop列设置为1。此列在递归连接中进行检查(ABD R.Loop = 0)。

结束查询使用MAX() OVER (PARTITION BY ...)来为整个“块”的链设置Loop列为1。

稍微复杂一些,可以生成“更好”的报告:

-- Some random data
IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
    DROP TABLE #MyTable

CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
INSERT INTO #MyTable (ID, ParentID, Description) VALUES
(1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
(2, 1, 'Child'), -- Try changing the second value (1) to 2 
(3, 3, 'SubChild')
-- End random data

-- The "terminal" childrens (that are elements that don't have childrens
-- connected to them)
;WITH WithoutChildren AS
(
    SELECT MT1.* FROM #MyTable MT1
        WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
)

, RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
(
    SELECT ID, -- StartingID 
        1, -- Level
        '|' + CAST(ID AS VARCHAR(MAX)) + '|', 
        '|' + CAST(Description AS VARCHAR(MAX)) + '|', 
        0, -- Loop
        ParentID
        FROM WithoutChildren
    UNION ALL
    SELECT R.StartingID, -- StartingID
        R.Level + 1, -- Level
        R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
        R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|', 
        CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
        MT.ParentID
        FROM #MyTable MT
        INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
)

SELECT * FROM RecursiveCTE 
    WHERE ParentID IS NULL OR Loop = 1

这个查询应该返回所有“最后一个子节点”的行,并带有完整的父级链。如果没有循环,则Loop列为0,如果存在循环,则为1


2
如果问题本身存在很大的缺陷,那就不要试图给出完美的答案。我们也会感激你所做出的努力。 - Radu Gheorghiu

6

下面介绍一种检测邻接表中循环(父/子关系)的替代方法,其中节点只能有一个父节点,并可以通过对子列(在下面的表格中为id)强制执行唯一约束来实现。该方法通过递归查询计算邻接表的闭包表来实现。它首先将每个节点作为其自身在第0级的祖先添加到闭包表中,然后迭代地遍历邻接表以扩展闭包表。当新记录的子项和祖先在原始级别零(0)以外的任何级别上相同时,便会检测到循环:

-- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
-- with RECURSIVE cte(ancestor, child, lev, cycle) as (

with cte(ancestor, child, lev, cycle) as (
  select id, id, 0, 0 from Table1
  union all
  select cte.ancestor
       , Table1.id
       , case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
       , case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
    from Table1
    join cte
      on cte.child = Table1.PARENT_ID
   where cte.cycle = 0
) -- In oracle uncomment the next line
-- cycle child set isCycle to 'Y' default 'N'
select distinct
       ancestor
     , child
     , lev
     , max(cycle) over (partition by ancestor) cycle
  from cte

给定Table1的以下邻接表:

| parent_id | id |
|-----------|----|
|    (null) |  1 |
|    (null) |  2 |
|         1 |  3 |
|         3 |  4 |
|         1 |  5 |
|         2 |  6 |
|         6 |  7 |
|         7 |  8 |
|         9 | 10 |
|        10 | 11 |
|        11 |  9 |

上述查询在SQL Server(和Oracle、PostgreSQL以及MySQL 8作为指导进行修改时)正确检测到节点9、10和11参与了一个长度为3的循环。

可以在以下链接中找到演示不同数据库中此功能的SQL(/DB)代码片段:


5
你可以使用Knuth描述的相同方法来检测链表中的循环。在一个列中,跟踪子项,子项的子项,子项的子项的子项等。在另一列中,跟踪孙子,孙子的孙子,孙子的孙子的孙子等。
对于初始选择,“Child”和“Grandchild”列之间的距离为1。每次从“union all”进行选择时,“Child”的深度增加1,“Grandchild”的深度增加2。它们之间的距离增加1。
如果存在任何循环,则由于距离每次仅增加1,因此在“Child”进入循环后,距离将在某个点上是循环长度的倍数。当这种情况发生时,“Child”和“Grandchild”列将是相同的。使用它作为额外条件停止递归,并在代码的其余部分中将其检测为错误。
SQL Server示例:
declare @LinkTable table (Parent int, Child int);
insert into @LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);

with cte as (
    select lt1.Parent, lt1.Child, lt2.Child as Grandchild
    from @LinkTable lt1
    inner join @LinkTable lt2 on lt2.Parent = lt1.Child
    union all
    select cte.Parent, lt1.Child, lt3.Child as Grandchild
    from cte
    inner join @LinkTable lt1 on lt1.Parent = cte.Child
    inner join @LinkTable lt2 on lt2.Parent = cte.Grandchild
    inner join @LinkTable lt3 on lt3.Parent = lt2.Child
    where cte.Child <> cte.Grandchild
)
select Parent, Child
from cte
where Child = Grandchild;

移除导致循环的LinkTable记录之一,你会发现select不再返回任何数据。


如果循环的情况是孩子节点永远不等于孙子节点,即循环长度大于2个节点,那么这个解决方案还有效吗? - avl_sweden
1
@avl_sweden 是的。已编辑以包含更清晰的解释。 - user743382

2

尝试限制递归结果

WITH EMP_CTE AS
( 

    SELECT 
        0 AS [LEVEL],   
        ManagerId, EmployeeId, Name
    FROM Employees
    WHERE ManagerId IS NULL

    UNION ALL

    SELECT 
        [LEVEL] + 1 AS [LEVEL],
        ManagerId, EmployeeId, Name
    FROM Employees e
    INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId 
 AND s.LEVEL < 100 --RECURSION LIMIT
) 

    SELECT  * FROM EMP_CTE WHERE [Level] = 100

这个解决方案利用了一个“level”计数器,也适用于那些没有“ARRAY”函数的SQL方言(不像Postgres)。例如,对我来说,它可以在#snowflake上工作。 +1 谢谢! - user2148414

1
这里是SQL Server的解决方案:
表插入脚本:
CREATE TABLE MyTable
(
    [ID] INT,
    [ParentID] INT,
    [Name] NVARCHAR(255)
);

INSERT INTO MyTable
(
    [ID],
    [ParentID],
    [Name]
)
VALUES
(1, NULL, 'A root'),
(2, NULL, 'Another root'),
(3, 1, 'Child of 1'),
(4, 3, 'Grandchild of 1'),
(5, 4, 'Great grandchild of 1'),
(6, 1, 'Child of 1'),
(7, 8, 'Child of 8'),
(8, 7, 'Child of 7'), -- This will cause infinite recursion
(9, 1, 'Child of 1');

查找确切的记录是罪魁祸首的脚本:

;WITH RecursiveCTE
AS (
   -- Get all parents: 
   -- Any record in MyTable table could be an Parent
   -- We don't know here yet which record can involve in an infinite recursion.
   SELECT ParentID AS StartID,
          ID,
          CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
   FROM MyTable
   UNION ALL

   -- Recursively try finding all the childrens of above parents
   -- Keep on finding it until this child become parent of above parent.
   -- This will bring us back in the circle to parent record which is being
   -- keep in the StartID column in recursion
   SELECT RecursiveCTE.StartID,
          t.ID,
          CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
   FROM RecursiveCTE
       INNER JOIN MyTable AS t
           ON t.ParentID = RecursiveCTE.ID
   WHERE RecursiveCTE.StartID != RecursiveCTE.ID)

-- FInd the ones which causes the infinite recursion
SELECT StartID,
       [ParentChildRelationPath],
       RecursiveCTE.ID
FROM RecursiveCTE
WHERE StartID = ID
OPTION (MAXRECURSION 0);

以上查询的输出:

enter image description here


0
在你的例子中,看起来你可以只使用UNION而不是UNION ALL来防止无限循环。这里是为什么https://dev59.com/mFUK5IYBdhLWcg3w_z20#77165059。然而,如果返回的数据比仅仅是子节点和父节点的ID更复杂,无限循环仍然可能发生。

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接