我有一个任意人数的身份
(即别名)表格。每行都有一个先前的名称和一个新名称。在生产环境中,有大约100万行。例如:
id, old, new
---
1, 'Albert', 'Bob'
2, 'Bob', 'Charles'
3, 'Mary', 'Nancy'
4, 'Charles', 'Albert'
5, 'Lydia', 'Nancy'
6, 'Zoe', 'Zoe'
我希望生成“用户”列表,并引用它们各自的身份信息。这类似于查找连接身份图中的所有节点,或查找生成森林。
User 1: Albert, Bob, Charles (identities: 1,2,4)
User 2: Mary, Nancy, Lydia (identities: 3,5)
User 3: Zoe (identities: 6)
我一直在尝试使用PostgreSQL的WITH RECURSIVE
,但它会产生每个集合和子集。例如:
1,2,4 <-- spanning tree: good
2 <-- subset: discard
3,5 <-- spanning tree: good
4 <-- subset: discard
5 <-- subset: discard
6 <-- spanning tree: good
我需要做什么才能为每个用户只生成完整的身份集(即生成树)?
SQLFiddle:http://sqlfiddle.com/#!15/9eaed/4 这是我的最新尝试。以下是代码:
WITH RECURSIVE search_graph AS (
SELECT id
, id AS min_id
, ARRAY[id] AS path
, ARRAY[old,new] AS emails
FROM identities
UNION
SELECT identities.id
, LEAST(identities.id, sg.min_id)
, (sg.path || identities.id)
, (sg.emails || identities.old || identities.new)
FROM search_graph sg
JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
WHERE identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;
而且结果如下:
1,2,4
2
3,5
4
5
6