PostgreSQL递归CTE向函数传递数据

3
我有以下问题:我正在尝试发现从源节点(node_s)到目标节点(node_t)的所有可能路径。
原始图边表的格式很简单:| node_x | node_y | strength |,其中“node_x” ->“node_y”是具有权重“weight”的直接边。
思路是,如果在探索路径的任何点上,我们发现其子节点中有目标node_t,则记录此路径并停止从此节点探索路径,否则继续探索。
简单的解决方案是使用PostgreSQL的递归CTE构建图的传递闭包:
  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
  )
  SELECT * 
  FROM transitive_closure tc
  WHERE tc.target = node_t --target_node

上述代码将找出源节点node_s到所有可能路径。只有在传递闭包构建完成后,我才能从源节点选择到目标节点的所需路径行(请参见最后一个SELECT语句)。
示例:
best_path表具有以下数据:
 node_x | node_y | strength 
--------+--------+----------
      1 |      2 |        1
      1 |      3 |        1
      2 |      4 |        1
      2 |      5 |        1
      3 |      6 |        1
      3 |      7 |        1
      4 |      8 |        1
      4 |      9 |        1
      5 |     10 |        1
      5 |     11 |        1

查询:

查找从源节点=1到目标节点=4的路径

结果:

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      5 |        1 | 1.2.5.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.
      1 |      8 |        1 | 1.2.4.8.
      1 |      9 |        1 | 1.2.4.9.
      1 |     10 |        1 | 1.2.5.10.
      1 |     11 |        1 | 1.2.5.11.

这不是我需要的。由于节点2到节点4(目标)已经有直接边了,所以我不需要路径1.2.5、1.2.4.8、1.2.4.9、1.2.5.10和1.2.5.11,当发现从2到4的路径时,对节点2的路径探索应该停止。
总之,如果节点已经有直接边连接目标节点,我不想发现该节点的路径。这意味着在CTE的递归术语中,我希望有一些条件来表示以下内容,伪代码如下:
  WITH RECURSIVE transitive_closure (source, target, weight, path_string) AS
  ( 
   --non-recurive term (as before)
   SELECT b.node_x, b.node_y, b.strength, b.node_x || '.' || b.node_y || '.' AS path_string
   FROM best_path b
   WHERE b.node_x = node_s --source_node

   UNION

   --recurive term
   SELECT tc.source, b.node_y, least(tc.weight,b.strength), tc.path_string || b.node_y || '.' AS path_string
   FROM best_path AS b JOIN transitive_closure AS tc ON b.node_x = tc.target
   WHERE tc.path_string NOT LIKE '%' || b.node_y || '.%'
     AND b.node_y = node_t 
     --will select only rows with direct edge to target

   UNION (second union is not allowed in CTE)

   SELECT those rows which do not have direct edge to target 
   AND which parents did not contribute to constructing the query above. 
   i.e. b.node_x = tc.target where there is no direct edge between b.node_x to node_t
  )

为了查询从源节点=1到目标节点=4的路径,我希望有以下内容:

 source | target | strength | path_string
--------+--------+----------+------------
      1 |      2 |        1 | 1.2.
      1 |      3 |        1 | 1.3.
      1 |      4 |        1 | 1.2.4.
      1 |      6 |        1 | 1.3.6.
      1 |      7 |        1 | 1.3.7.

感谢你的帮助!
我已经尝试了很多方法:例如在FROM / WHERE子句中使用条件,尝试将CTE传递到函数中,但是没有成功。
任何建议都将不胜感激。
我有自己的递归函数可以实现我想要的目标,但是在大量数据上非常缓慢;而PostgreSQL的CTE显然经过了良好的优化,因此我想更深入地研究它。

如果Source = 2且Target = 5,结果是什么?请在您的问题中附加它。我正在尝试完善这个查询:https://dev59.com/C2PVa4cB1Zd3GeqP4VDr#10395130 - Michael Buen
第一个CTE将输出不希望的结果: 2 - > 4(2.4.),2 - > 5(2.5.),2 - > 8(2.4.8.),2 - > 9(2.4.9.),2 - > 10(2.5 .10.),2 - > 11(2.5.11.)。期望输出是: 2 - > 5(2.5.) - aza07
@MichaelBuen 谢谢,目前正在测试,测试结果出来后会回来! - aza07
另一种查询优化方法,虽然不适用于其他数据库,但仅适用于Postgresql,它使用DISTINCT ON: https://dev59.com/C2PVa4cB1Zd3GeqP4VDr#10401572 - Michael Buen
4个回答

2

如果您从底部开始搜索路径,可以使搜索更加高效。从孩子节点开始搜索,而不是从父节点开始搜索。如果从父节点开始搜索,需要遍历所有的子节点;而如果从子节点开始搜索,它只有一个父节点,因此在查找源节点和目标节点之间的路径时不会浪费时间。

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source
),
construct_path(source, target, recentness, path) as
(
  select source, target, recentness, source || '.' || target
  from find_parent 
  where recentness = (select max(recentness) from find_parent)

  union

  select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
  from find_parent dd
  join construct_path cp on dd.recentness = cp.recentness - 1  
)
select source, target, path 
from construct_path
order by recentness desc

输出:

SOURCE   TARGET   PATH
1        2        1.2
2        4        1.2.4
4        9        1.2.4.9

在线测试: http://www.sqlfiddle.com/#!1/13e6b/1

与此类似: 如何在SQL SERVER 2005中找到一个子节点的父节点


该方法进行了优化,如果已经找到特定的源,则将递归到其父级(parent)。

Source = 2

Target = 9

with recursive find_parent(source, target, recentness) as
(
    select source, target, 0 
    from tbl
    where target = 9

    union all

    select i.source, i.target, fp.recentness + 1
    from tbl i
    join find_parent fp on i.target = fp.source 
         -- despite the name, this target is another one's source
         and i.target <> 2
)
,construct_path(source, target, recentness, path) as
(
    select source, target, recentness, source || '.' || target
    from find_parent 
    where recentness = (select max(recentness) from find_parent)

    union

    select dd.source, dd.target, dd.recentness, cp.path || '.' || dd.target
    from find_parent dd
    join construct_path cp on dd.recentness = cp.recentness - 1  

)
select source, target, path
from construct_path
order by recentness desc

输出:

SOURCE   TARGET  PATH
2        4       2.4
4        9       2.4.9

现场测试:http://www.sqlfiddle.com/#!1/13e6b/16


你不觉得在最终查询的WHERE子句中使用限制条件而不是将它们推入WITH源中会更好吗? - vyegorov
最好将限制条件放在根源处,这样更有效率。如果将限制条件放在最终查询处,最终查询将不得不处理几乎所有的seq scan'd结果,这会很慢。 - Michael Buen
你是怎么想到这是一个分层结构的?OP提到了“source”和“target”。 - Erwin Brandstetter
它不是分层的,我直接使用了原始数据。 - Michael Buen

1

用于测试的临时表:

CREATE TEMP TABLE best_path (node_x int, node_y int, strength int);
INSERT INTO best_path  VALUES
 (1, 2,  1)
,(1, 3,  1)
,(2, 4,  1)
,(2, 5,  1)
,(3, 6,  1)
,(3, 7,  1)
,(4, 8,  1)
,(4, 9,  1)
,(5, 10, 1)
,(5, 11, 1);

查询:
修改以适应有关2-5的评论

WITH RECURSIVE t AS (    -- for readability and convenience:
    SELECT 1 AS node_s   -- source_node
         , 4 AS node_t   -- target_node
    )
    , x AS (
    SELECT node_x
    FROM   t, best_path
    WHERE  node_y = node_t
    )
    , a AS  ( 
    SELECT b.node_x
         , b.node_y
         , b.strength AS weight
         , b.node_x || '.' || b.node_y || '.' AS path
    FROM   t, best_path b
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  b.node_x = node_s
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node

    UNION ALL
    SELECT a.node_x
         , b.node_y
         , least(a.weight, b.strength)
         , a.path || b.node_y || '.' AS path
    FROM   t, a
    JOIN   best_path AS b ON b.node_x = a.node_y
    LEFT   JOIN x ON x.node_x = b.node_x
    WHERE  a.node_y <> node_t                   -- arrived at target
    AND    a.path !~~ ('%' || b.node_y || '.%') -- not visited yet
    AND    (x.node_x IS NULL                    -- no point with edge to target
            OR b.node_y = node_t)               -- except with target_node
    )
TABLE a;

能够准确地产生所需结果。

我也在初始查询中添加了断点条件,因为我们可以在一步中到达目标。

这很像我对以前类似问题的回答。所有的解释和链接都适用。这里的主要附加技巧是包括一个额外的 CTE x 来收集那些...

直接通向目标的节点。

用于递归 CTE 的断开条件的重复使用。人们不广泛知道即使在关键字 RECURSIVE 之外,您也可以在递归 CTE 之上添加其他(非递归)CTE。


谢谢,Erwin,现在正在进行测试!稍后会对结果进行评论! - aza07

1
在重新阅读OP的问题后,我想到了这个解决方案:
源代码
with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 4 as d_target )
,traverse_path(filter, source, target, path, bingo) as
(
  select bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y,

    max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter

  union

  select tp.filter, bp.node_x, bp.node_y, path || '.' || node_y,   

      max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool   

  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp cross join inputs i    
-- if Postgresql has Oracle KEEP windowing, 
-- we don't need to use WHERE clause here     
where 
  not tp.bingo   
  or
  (
    tp.bingo and tp.target = d_target
  )

上述WHERE子句可以简写为:

  WHERE not bingo or target = 4

输出:

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

实时测试:http://www.sqlfiddle.com/#!1/cdde6/55


这是源 = 2,目标 = 5 的结果:

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1

数据:

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1);

1

优化的解决方案,不再需要在最终结果上使用WHERE子句;虽然这是Postgresql特定的解决方案,即我们使用DISTINCT ON来选择特定的行:

给定这些数据:

CREATE TABLE best_path
    (node_x int, node_y int, strength int);

INSERT INTO best_path
    (node_x, node_y, strength)
VALUES
    (1, 2, 1),
    (1, 3, 1),
    (2, 4, 1),
    (2, 5, 1),
    (3, 6, 1),
    (3, 7, 1),
    (4, 8, 1),
    (4, 9, 1),
    (5, 10, 1),
    (5, 11, 1),
    (5, 12, 1);

查询,第一阶段,显示幕后情况(源 = 1,目标 = 11):

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool            
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool        
      ,coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
)    
select tp.*
from traverse_path tp 

当源为1,目标为11时的输出:http://www.sqlfiddle.com/#!1/db290/56

FILTER   SOURCE   TARGET   PATH     BINGO    DISTINCTER
1        1        2        1.2      0        2
1        1        3        1.3      0        3
1        2        4        1.2.4    0        4
1        2        5        1.2.5    0        5
1        3        6        1.3.6    0        6
1        3        7        1.3.7    0        7
1        4        8        1.2.4.8  0        8
1        4        9        1.2.4.9  0        9
1        5        11       1.2.5.11 1        1
1        5        10       1.2.5.10 1        1
1        5        12       1.2.5.12 1        1

正如我们所看到的,目标11是源数据中的第一行。这是怎么发生的呢?在我们的DISTINCTER列上,我们使用了MAX进行ORDER BY排序,这是一个很少出现的情况,因为在MAX窗口上进行ORDER排序是有意义的。我们只是用它来对结果进行排序。我尝试将ORDER BY放在查询的末尾,但数据库抱怨无法在CTE上使用ORDER。CTE禁止放置ORDER BY子句。但众所周知,我们可以影响OVER()子句的排序,这就是我们的结果被排序的方式。在上面的结果中,在源数据中的5个数字中,数字11排在10和12之前,因为11是我们的目标。因此,当我们使用DISTINCT ON(Postgresql特定功能)时,Postgres会选择第一行,因此它会选择目标11。


这是我们最终的查询,经过优化,尽管它是针对Postgresql特定的:

with recursive -- this denotes that at least one CTE is using recursion    
inputs as ( select 1 as d_source, 11 as d_target )
,traverse_path(filter, source, target, path, bingo) as (
  select 
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )          
      bp.node_x, bp.node_x,  bp.node_y, bp.node_x || '.' || bp.node_y      
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter
  union
  select       
      distinct on (
        bp.node_x,            
        coalesce(
               nullif(
                   max((bp.node_y = i.d_target)::int) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , 0)          
               ,bp.node_y)
      )    
      tp.filter, bp.node_x, bp.node_y, path || '.' || node_y       
      ,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
      and not tp.bingo
) select tp.* from traverse_path tp

当源=1,目标=11时的输出 http://www.sqlfiddle.com/#!1/db290/55

FILTER   SOURCE   TARGET   PATH     BINGO
1        1        2        1.2      0
1        1        3        1.3      0
1        2        4        1.2.4    0
1        2        5        1.2.5    0
1        3        6        1.3.6    0
1        3        7        1.3.7    0
1        4        8        1.2.4.8  0
1        4        9        1.2.4.9  0
1        5        11       1.2.5.11 1

当源为1,目标为4时的输出:http://www.sqlfiddle.com/#!1/db290/53

FILTER  SOURCE  TARGET  PATH    BINGO
1       1       2       1.2     0
1       1       3       1.3     0
1       2       4       1.2.4   1
1       3       6       1.3.6   0
1       3       7       1.3.7   0

源码为2,目标为5的输出:http://www.sqlfiddle.com/#!1/db290/54

FILTER  SOURCE  TARGET  PATH    BINGO
2       2       5       2.5     1


另一种方法,而不是BINGO方法。使用continue_search作为继续遍历的逻辑,并使用EVERY聚合函数来确定是否需要继续遍历图形。

它的工作原理如下:http://www.sqlfiddle.com/#!1/db290/84

最终查询:http://www.sqlfiddle.com/#!1/db290/85

有趣的是,EVERY非常像英语:

与此形成对比:

,max((bp.node_y = i.d_target)::int) over(partition by bp.node_x) ::bool  

使用 EVERY:

,every(bp.node_y <> i.d_target) over(partition by bp.node_x)

哪一个更容易阅读?
以下是输出,说明使用EVERY原则来促进DISTINCT ON的方法:
源=1,目标=5。请注意,在属于同一源的其他数字之前,5会首先进行排序,这将稍后被DISTINCT ON选中。
FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH     DISTINCTER
1         1         2         1.2       1                   2
1         1         3         1.3       1                   3
1         2         5         1.2.5     0                   0
1         2         4         1.2.4     0                   0
1         3         6         1.3.6     1                   6
1         3         7         1.3.7     1                   7

这是执行该原则的查询:

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search, distincter) as
(
  select                      
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
      ,coalesce(        
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                           
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select 
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)    
      ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp 

最终查询:

with recursive -- this denotes that at least one CTE is using recursion

inputs as ( select 1 as d_source, 5 as d_target )
,traverse_path(filter, source, target, path, continue_search) as
(
  select                      
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )    
      bp.node_x, bp.node_x,  bp.node_y, concat(bp.node_x , '.' , bp.node_y )
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  where bp.node_x = i.d_source -- filter      
  union      
  select   
      distinct on
      (
        bp.node_x
        ,coalesce(
              cast(nullif(
                   every(bp.node_y <> i.d_target) 
                   over(partition by bp.node_x order by bp.node_x, bp.node_y = d_target desc, bp.node_y)
                   , true) as int)
               ,bp.node_y)                       
      )  
      tp.filter, bp.node_x, bp.node_y, concat(path , '.' , node_y)
      ,every(bp.node_y <> i.d_target) over(partition by bp.node_x)          
  from best_path bp cross join inputs i
  join traverse_path tp on bp.node_x = tp.target
     and tp.continue_search
)    
select tp.*
from traverse_path tp 

输出:

FILTER    SOURCE    TARGET    PATH      CONTINUE_SEARCH
1         1         2         1.2       1
1         1         3         1.3       1
1         2         5         1.2.5     0
1         3         6         1.3.6     1
1         3         7         1.3.7     1

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接