无向图中最小的ID

Question

无向图中最小的ID

4

我正在使用Oracle 11g，尝试找到由ID对组成的无向图中最小的ID。

使用以下ID对：

create table unique_pairs ( ID1 INT, ID2 INT );
insert all 
    into unique_pairs (ID1,ID2) VALUES ( 1, 2 )  
    into unique_pairs (ID1,ID2) VALUES ( 1, 3 )
    into unique_pairs (ID1,ID2) VALUES ( 4, 2 )
    into unique_pairs (ID1,ID2) VALUES ( 4, 5 )
    into unique_pairs (ID1,ID2) VALUES ( 7, 6 )
    into unique_pairs (ID1,ID2) VALUES ( 8, 10 )
    into unique_pairs (ID1,ID2) VALUES ( 10, 7 )
select * from dual;
commit;

我正在使用两个Connect By查询的联合，试图在地图的两个方向上旅行：

select distinct a.ID1, a.ID2, min(a.ultimate_parent_id ) as ultimate_parent_id
from
(SELECT distinct ID1, ID2,
    connect_by_root(ID2) ultimate_parent_id
  FROM unique_pairs
CONNECT BY ID2 = prior ID1 AND ID2 != PRIOR ID2 
union
SELECT distinct ID1, ID2,
    connect_by_root(ID1) ultimate_parent_id
  FROM unique_pairs
CONNECT BY ID1 = prior ID2 AND ID1 != PRIOR ID1) a
group by a.ID1, a.ID2
order by 3;

I get this:

   ID1        ID2 ULTIMATE_PARENT_ID

     1          2                  1
     1          3                  1
     4          2                  2
     4          5                  4
    10          7                  6
     7          6                  6
     8         10                  6

应该是这样的：

   ID1        ID2 ULTIMATE_PARENT_ID

     1          2                  1
     1          3                  1
     4          2                  1
     4          5                  1
    10          7                  6
     7          6                  6
     8         10                  6

我认为我的问题与使用connect by函数时暗示了层次结构有关，而我的数据是非定向的。希望能得到帮助，谢谢！

- Tim

2个回答

1

概念

你的问题基本上包括以下3个步骤：

识别无向图 G 的连通分量
用最小 id 的顶点表示每个连通分量
将图的每条边映射到它所属的连通分量的代表。

该任务通过以下方式解决：

考虑基于 G 的顶点 id 的自然数序列的 '等效' 有向图 D_1。
通过抽象掉由 D_1 表示的排序中的链，将 D_1 缩减为包含所有候选 id 的二分图 B。
确定 B 中的最小代表 id。
建立从 G 的边到 B 的边及其弱连通分量 (WCCs) 的映射。

详细信息

G的边集恰好由unique_pairs的记录给出的一组对(id1, id2)组成。G的顶点集与unique_pairs的两个id列的值的集合相匹配。由于G是无向的，因此无论使用记录(id1，id2)或(id2，id1)表示边都没有关系；因此我们假设对于unique_pairs中的每个记录(id1，id2)，都有id1 < id2成立。

我们通过构建一个基于有向图D_1的辅助结构来完成任务，该有向图源自于根据其关联顶点的id的数值顺序定向每条无向图的边。让D_2成为所有边方向相反的D_1。识别原始图的连通分量相当于在D_1或D_2中识别WCCs。注意，按照构造，D_1是无环的。

辅助辅助结构是将G中的边集映射到D_1中的极大链幂集(=不包含在另一条路径中的路径)上，使图像中每个边e的所有链都包含e并最大化其终端(=起点和终点顶点)的数字ID之间的差异。使用幂集是为了避免当两个极大链共享相同的端点但遵循“替代路线”时，映射可能不是明确定义的情况（通常，图像将只包含单个元素）。

接下来，我们粗略地映射：图像将是原始映射图像中任何极大链的终端ID对。

给定此映射，我们建立一个辅助二分图B，其边缘恰好表示终端ID的有序对。该图是二分的，因为最大链的任何结束顶点都不能是另一个最大链的起始顶点，反之亦然 - 如果发生这种情况，则这些链将不是最大的。

然而，根据构造，B 是 D_1 的传递闭包的一部分。因此，B 的WCC的顶点集是 D_1 的WCC的顶点集的子集。我们最终感兴趣的是每个后者WCC中具有最小id的顶点。由于不包含在最大链中的任何终端id都不能小于起始顶点id，所有这些顶点必须包含在 B 的顶点集中。因此，我们只考虑 B 的 WCC，并且对它们的每个最小终端ID 进行取值。通过将所有 G 边缘映射到 B 边缘，从而映射到 B 的 WCC，我们解决了原始问题。

SQL实现

在预处理步骤中，将 G 的边规范化为其关联顶点的自然排序。假设边缘的方向为 id1 -> id2，test_unique_pairs 也表示 D_1。

update test_unique_pairs
   set id1 = id2
     , id2 = id1
 where id1 > id2
     ;

第一个查询将D_1的边缘映射到B的边缘，即在D_1中最大链的终端ID对应于B中的边缘。这些链由关于test_unique_pairs的分层查询计算，并且将由其起始和结束顶点表示，这些对也表示B中的边缘。如果它们共享两个终端，则此映射发生在替代链模数上 - 如果它们共享两个终端，则认为不同的最大路径等效。 Oracle中的分层查询允许轻松提取最大元素。然而，最小元素是具有父子关系反转的层次结构的最大元素（考虑到这个计算是考虑到两个有向图D_1，D_2）。为了简化第二个处理阶段，查询ID被包装到视图定义中：

  create or replace view test_up_chains as
           select *
             from ( -- collect minimal/maximal reachable id for each edge in G
                          select distinct
                                 inter.id1
                               , inter.id2
                               , min(minelt)   chainstart
                               , max(maxelt)   chainend
                            from ( -- collect minimum/maximum terminal id for chains starting/ending with any of G's edges
                                         select distinct
                                                inter_min.id1
                                              , inter_min.id2
                                              , inter_min.minelt
                                              , inter_max.maxelt
                                           from ( -- hierarchical query for maximal chains in D_1
                                                        select distinct
                                                               hup.ID1
                                                             , hup.ID2
                                                             , CONNECT_BY_ROOT hup.id2   maxelt
                                                          from test_unique_pairs hup
                                                    connect by nocycle prior hup.id1 = hup.id2
                                                    start with hup.id2 in (
                                                                 select ref1.id2
                                                                   from test_unique_pairs ref1
                                                                  where not exists (
                                                                           select 1
                                                                             from test_unique_pairs ref2
                                                                            where ref2.id1 = ref1.id2
                                                                        )
                                                               )
                                                 ) inter_max
                                            join ( -- hierarchical query for maximal chains in D_2 ( = D_1 with reversed edge orientations )
                                                        select distinct
                                                               hup.ID1
                                                             , hup.ID2
                                                             , CONNECT_BY_ROOT hup.id1   minelt
                                                          from test_unique_pairs hup
                                                    connect by nocycle prior hup.id2 = hup.id1
                                                    start with hup.id1 in (
                                                                 select ref1.id1
                                                                   from test_unique_pairs ref1
                                                                  where not exists (
                                                                           select 1
                                                                             from test_unique_pairs ref2
                                                                            where ref2.id2 = ref1.id1
                                                                        )
                                                               )
                                                 ) inter_min
                                              on ( inter_min.id1 = inter_max.id1 AND inter_min.id2 = inter_max.id2 )
                                  ) inter
                         group by grouping sets ( (inter.id1, inter.id2) )
                   ) base
          order by base.id1
                 , base.id2
;

接下来，将 B 的边缘合并成 WCC，并提取它们各自的最小 ID。下面的分层查询由于缺少 STARTS WITH 子句，基于边缘关系构建完整的可传递闭包，并为 B 中的每个边缘生成到同一 WCC 中所有其他边缘的根路径。因此，在所有层次结构的 'root' 中最小化给定 G 中的一条边的起始顶点，可以找到在 G 中达到的最小 ID。通过 NOCYCLE 指令压制循环。

                 select *
                   from (
                          select distinct
                                 id1
                               , id2
                               , min(chainlim)   chainrep
                            from (
                                         select distinct
                                                id1
                                              , id2
                                              , connect_by_root chainstart chainlim
                                           from test_up_chains
                                     connect by nocycle ( ( prior chainstart = chainstart and prior chainend != chainend ) OR ( prior chainstart != chainstart and prior chainend = chainend ) )
                                 ) base
                          group by grouping sets ( ( base.id1, base.id2 ) )
                        )
               order by id1
                      , id2
     ;

- collapsar

非常详细的回复，非常感谢！我从代码和解释中学到了很多，真的很感激您抽出时间将其详细地阐述。这个解决方案完美地符合了我的请求。再次感谢！ - Tim

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Sentinel · Accepted Answer

Collapsar有非常冗长的描述，但是以下是一个非常简洁的解决方案，符合您的要求。我已经远离图论的日子太久了，无法像他那样解释它，但简单的解释是，由于unique_pairs表示无向图中的边，因此它可以表示为ID1、ID2和ID2、ID1的并集，在有向图中，这就是我的第一个公共表达式。通过这个有向图，你可以像我的第二个公共表达式一样计算所有连接的节点对。最后，你可以将原始的unique_pairs表与我的连接对CTE在ID1上进行连接，并选择最小的根节点作为你的ULTIMATE_PARENT_ID。

with dg1 as (
  select id1, id2 from unique_pairs
  union all
  select id2, id1 from unique_pairs
), cp as (
select id1
     , CONNECT_BY_ROOT id1 root
  from dg1
connect by nocycle prior id1 = id2 
)
select dg1.*
     , min(root) ULTIMATE_PARENT_ID
  from unique_pairs dg1
  join cp
    on cp.id1 = dg1.id1
  group by dg1.id1, dg1.id2
order by 1,2

请查看SQL Fiddle。

| ID1 | ID2 | ULTIMATE_PARENT_ID |
|-----|-----|--------------------|
|   1 |   2 |                  1 |
|   1 |   3 |                  1 |
|   4 |   2 |                  1 |
|   4 |   5 |                  1 |
|   7 |   6 |                  6 |
|   8 |  10 |                  6 |
|  10 |   7 |                  6 |