在T-SQL中实现嵌套集合层次结构的路径材料化

3

我有一张包含公司会计科目表详细信息的表格 - 这些数据基本上是以嵌套集合的形式存储的(在SQL Server 2014上),每条记录都有左和右锚定点 - 没有父ID。

示例数据:

ID  LeftAnchor  RightAnchor  Name     
 1           0           25  Root     
 2           1           16  Group 1  
 3           2            9  Group 1.1
 4           3            4  Account 1
 5           5            6  Account 2
 6           7            8  Account 3
 7          10           15  Group 1.2
 8          11           12  Account 4
 9          13           14  Account 5
10          17           24  Group 2  
11          18           23  Group 2.1
12          19           20  Account 1
13          21           22  Account 1

我需要为每个记录实现路径的具体化,以便我的输出看起来像这样:
ID  LeftAnchor  RightAnchor  Name       MaterializedPath
 1           0           25  Root       Root
 2           1           16  Group 1    Root > Group 1
 3           2            9  Group 1.1  Root > Group 1 > Group 1.1
 4           3            4  Account 1  Root > Group 1 > Group 1.1 > Account 1
 5           5            6  Account 2  Root > Group 1 > Group 1.1 > Account 2
 6           7            8  Account 3  Root > Group 1 > Group 1.1 > Account 3
 7          10           15  Group 1.2  Root > Group 1 > Group 1.2
 8          11           12  Account 4  Root > Group 1 > Group 1.2 > Acount 4
 9          13           14  Account 5  Root > Group 1 > Group 1.2 > Account 5
10          17           24  Group 2    Root > Group 2
11          18           23  Group 2.1  Root > Group 2 > Group 2.1
12          19           20  Account 1  Root > Group 2 > Group 2.1 > Account 10
13          21           22  Account 1  Root > Group 2 > Group 2.1 > Account 11

虽然我已经使用CTE实现了这一点,但查询速度非常慢。当输出大约有1200条记录时,运行时间接近两分钟。

以下是我代码的简化版本:

;with accounts as
(
    -- Chart of Accounts
    select AccountId, LeftAnchor, RightAnchor, Name
    from ChartOfAccounts
    -- dirty great where clause snipped
)
, parents as
(
    -- Work out the Parent Nodes
    select c.AccountId, p.AccountId [ParentId]
    from accounts c
    left join accounts p on (p.LeftAnchor = (
        select max(i.LeftAnchor)
        from accounts i
        where i.LeftAnchor<c.LeftAnchor
        and i.RightAnchor>c.RightAnchor
    ))
)
, path as
(
    -- Calculate the Account path for each node

    -- Root Node
    select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
    from accounts c
    where c.LeftAnchor = (select min(LeftAnchor) from chart)

    union all

    -- Children
    select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.path + ' > ' + n.name
    from accounts n
    inner join parents x on (n.AccountId=x.AccountId)
    inner join path p on (x.ParentId=p.AccountId)
)
select * from path order by LeftAnchor

理想情况下,这个查询只需要几秒钟(最多)就可以运行。由于数据库连接是只读的,我无法对数据库本身进行任何更改,所以有没有人能想出更好的方法来编写这个查询?
3个回答

3

在你的评论之后,我意识到不需要使用CTE了...因为已经有了范围键。

例如

Select A.*
      ,Path = Replace(Path,'&gt;','>')
 From YourTable A
 Cross Apply (
                Select Path = Stuff((Select ' > ' +Name 
                                      From (
                                            Select LeftAnchor,Name
                                             From  YourTable
                                             Where A.LeftAnchor between LeftAnchor and RightAnchor 
                                           ) B1
                                      Order By LeftAnchor
                                      For XML Path (''))
                                    ,1,6,'')
             ) B
 Order By LeftAnchor

返回值

enter image description here


1
优秀 - 使用XML路径将执行时间从近2分钟降至300毫秒。这就是我所说的结果! - Pete
我在我的数据上运行了这个脚本(mssql 2014),但出现了一些奇怪的问题,根文件夹的输出路径像“root > root > root > root”(在此示例中有4个级别的深度)。 - barbara.post
@barbara.post,您难道没有明确的父/子关系吗? - John Cappelletti
JohnCappelletti,不好意思,我不确定我理解你的问题。除了LeftAnchor和RighAnchor之外,我还有一个ParentId。我刚刚尝试了@BogdanSahlean下面的建议,它完全可以。它可能会慢一些,但似乎很方便。 - barbara.post

0

对我来说似乎很奇怪你没有父ID,但是借助于初始的OUTER APPLY,我们可以生成一个父ID,然后运行标准的递归CTE。

示例

Declare @Top int = null  --<<  Sets top of Hier Try 12 (Just for Fun)

;with cte0 as (
                Select A.* 
                      ,B.*
                 From  YourTable A
                 Outer Apply ( 
                                Select Top 1 Pt=ID 
                                 From  YourTable 
                                 Where A.LeftAnchor between LeftAnchor and RightAnchor and LeftAnchor<A.LeftAnchor 
                                 Order By LeftAnchor Desc
                              ) B
              ) 
     ,cteP as (
                  Select ID
                        ,Pt
                        ,LeftAnchor
                        ,RightAnchor
                        ,Lvl=1
                        ,Name 
                        ,Path = cast(Name as varchar(max))
                  From   cte0
                  Where  IsNull(@Top,-1) = case when @Top is null then isnull(Pt ,-1) else ID end
                  Union  All
                  Select r.ID
                        ,r.Pt
                        ,r.LeftAnchor
                        ,r.RightAnchor
                        ,p.Lvl+1
                        ,r.Name 
                        ,cast(p.path + ' > '+r.Name as varchar(max))
                  From   cte0 r
                  Join   cteP p on r.Pt  = p.ID
              )
Select *
 From  cteP
 Order By LeftAnchor

返回

enter image description here


哎呀,数据结构是由应用程序供应商定义的,由于某种原因,他们似乎不关心邻接表。谢谢你的SQL - 我会试一下,看看性能如何。 - Pete
@Pete 递归CTE非常好,但是在较大的层次结构中性能可能会受到影响。例如,使用临时表方法可以在4秒内构建一个200K点层次结构,而递归方法可能需要数分钟。 - John Cappelletti
是的 - 我看到了。查询将在几秒钟内确定父ID,关键是递归CTE会实现路径。不幸的是,使用Outer Apply生成Parent ID的方法对总体执行时间没有产生影响。 - Pete

0

首先,您可以尝试重新排列准备好的CTE(账户和父级),使得每个CTE都包含前一个CTE中的所有数据,因此您只需要在路径CTE中使用最后一个CTE - 无需多次连接:

;with accounts as
(
    -- Chart of Accounts
    select AccountId, LeftAnchor, RightAnchor, Name
    from ChartOfAccounts
    -- dirty great where clause snipped
)
, parents as
(
    -- Work out the Parent Nodes
    select c.*, p.AccountId [ParentId]
    from accounts c
    left join accounts p on (p.LeftAnchor = (
        select max(i.LeftAnchor)
        from accounts i
        where i.LeftAnchor<c.LeftAnchor
        and i.RightAnchor>c.RightAnchor
    ))
)
, path as
(
    -- Calculate the Account path for each node

    -- Root Node
    select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
    from parents c
    where c.ParentID IS NULL

    union all

    -- Children
    select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.[MaterializedPath] + ' > ' + n.name
    from parents n
    inner join path p on (n.ParentId=p.AccountId)
)
select * from path order by LeftAnchor

这应该会有所改善(在我的测试中达到了50%),但要真正让它变得更好,您可以将准备数据的前半部分拆分到 #temp 表中,在 #temp 表的 ParentID 列上放置聚集索引,并在第二部分中使用它。

if (Object_ID('tempdb..#tmp') IS NOT NULL) DROP TABLE #tmp;
with accounts as
(
    -- Chart of Accounts
    select AccountId, LeftAnchor, RightAnchor, Name
    from ChartOfAccounts
    -- dirty great where clause snipped
)
, parents as
(
    -- Work out the Parent Nodes
    select c.*, p.AccountId [ParentId]
    from accounts c
    left join accounts p on (p.LeftAnchor = (
        select max(i.LeftAnchor)
        from accounts i
        where i.LeftAnchor<c.LeftAnchor
        and i.RightAnchor>c.RightAnchor
    ))
)
select * into #tmp
from parents;

CREATE CLUSTERED INDEX IX_tmp1 ON #tmp (ParentID);

With path as
(
    -- Calculate the Account path for each node

    -- Root Node
    select c.AccountId, c.LeftAnchor, c.RightAnchor, 0 [Level], convert(varchar(max), c.name) [MaterializedPath]
    from #tmp c
    where c.ParentID IS NULL

    union all

    -- Children
    select n.AccountId, n.LeftAnchor, n.RightAnchor, p.level+1, p.[MaterializedPath] + ' > ' + n.name
    from #tmp n
    inner join path p on (n.ParentId=p.AccountId)
)
select * from path order by LeftAnchor

很难在小样本数据上确定,但应该会有所改善。如果您尝试了,请告诉我。


谢谢您的回答,但不幸的是我只有一个只读连接,所以无法创建临时表或索引。虽然我可以在代码中声明@temp表,但性能问题似乎更多地涉及在这个大小的数据集上使用递归CTE。 - Pete
1
@Pete 你确定吗?创建临时表不需要特殊权限。任何能够连接到数据库的人都应该可以做到。 - Nenad Zivkovic
@Pete 添加了第二个答案。 - John Cappelletti

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接