如何将扁平表转换为树形结构的最高效/最优雅的方法是什么？

Question

如何将扁平表转换为树形结构的最高效/最优雅的方法是什么？

sqlalgorithmrecursiontreehierarchical-data

574

假设您有一张扁平的表格，用于存储一个有序的树形层次结构：

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

这是一个图表，其中我们有[id] Name。根节点0是虚构的。

                       [0] ROOT
                          /    \ 
              [1] Node 1          [3] Node 2
              /       \                   \
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

你会使用什么最简主义方法将其输出到HTML（或文本），作为正确排序的正确缩进的树？

进一步假设您只有基本数据结构（数组和哈希映射），没有具有父/子引用的花哨对象，没有ORM，没有框架，只有您的双手。该表格表示为结果集，可以随机访问。

伪代码或纯英语都可以，这只是一个概念性问题。

奖励问题：在RDBMS中存储此类树形结构是否有根本上更好的方法？

编辑和添加

回答一个评论者（Mark Bessey）的问题：根节点并不是必需的，因为它永远不会被显示出来。ParentId = 0 是表示“这些是顶层”的约定。Order列定义了具有相同父级的节点如何排序。

我所说的“结果集”可以被想象成一个哈希映射数组（为了保持这个术语）。对于我的例子来说，它已经存在了。有些答案会走额外的路线先构建它，但那也没关系。

树可以任意深入。每个节点可以有N个孩子。虽然我没有想到一个“数百万条目”的树。

不要把我的节点命名选择（“节点1.1.1”）误认为是可以依赖的东西。节点也可以叫做“Frank”或“Bob”，没有暗示任何命名结构，这只是为了使其可读性更强。

我已经发布了自己的解决方案，这样你们就可以对它进行分析了。

- Tomalak

2

“不使用带有父/子引用的花哨对象” - 为什么？创建一个基本的Node对象，使用.addChild()和.getParent()方法，可以很好地建模节点关系。 - matt b

2

这是一个普通的树（其中n个孩子，n可以大于2），还是二叉树（节点可以有0、1或2个孩子）？ - BKimmel

由于您可以使用哈希表实现适当的节点数据结构，因此在这里没有实际限制，只需要更多的工作。 - Svante

...这正是你所做的。 - Svante

1

@dreftymac，从技术上讲，传递闭包表是非规范化的。避免数据异常比传统的邻接列表设计更困难。但是，像非规范化设计一样，它使某些类型的查询更快。 - Bill Karwin

显示剩余5条评论

15个回答

65

如果您使用嵌套集（有时称为修改的前序树遍历），则可以使用单个查询以树顺序提取整个树结构或其中任何子树，但插入操作的代价更高，因为您需要管理描述树结构中按顺序路径的列。

对于django-mptt，我使用了这样的结构：

id  parent_id  tree_id  level  lft  rght
--  ---------  -------  -----  ---  ----
 1       null        1      0    1    14
 2          1        1      1    2     7
 3          2        1      2    3     4
 4          2        1      2    5     6
 5          1        1      1    8    13
 6          5        1      2    9    10
 7          5        1      2    11   12

它描述了一个如下所示的树（其中id表示每个项）：

 1
 +-- 2
 |   +-- 3
 |   +-- 4
 |
 +-- 5
     +-- 6
     +-- 7

或者，作为一个嵌套集图表，使lft和rght值更加明显：

____________________________________________________________________________
|  根节点1                                                                 |
|   ________________________________    ________________________________   |
|  |  子节点1.1                     |  |  子节点1.2                     |  |
|  |   ___________    ___________   |  |   ___________    ___________   |  |
|  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  |  |  C 1.2.2  |  |  |
1  2  3___________4  5___________6  7  8  9___________10 11__________12 13 14
|  |________________________________|  |________________________________|  |
|____________________________________________________________________________|

如您所见，要按树顺序获取给定节点的整个子树，只需选择所有具有在其lft和rght值之间的lft和rght值的行。检索给定节点的祖先树也很简单。

level列只是为了方便而进行的一些去规范化，而tree_id列允许您重新启动每个顶级节点的lft和rght编号，从而减少受插入、移动和删除影响的列数，因为当这些操作发生时，必须相应地调整lft和rght列以创建或关闭间隙。我在当时尝试理解每个操作所需的查询时做了一些开发注释。

就实际使用此数据来显示树而言，我创建了一个tree_item_iterator实用程序函数，对于每个节点，它都应该提供足够的信息，以生成您想要的任何类型的显示。

更多关于MPTT的信息：

- insin

14

我希望我们能停止使用像“lft”和“rght”这样的缩写作为列名称，我的意思是我们省了多少字符不用打？只有一个！ - orustammanapov

6

因为在 SQL 中，“left” 和 “right” 是保留字。 - meecect

1

@meecect 这就是为什么总是要用双引号“EVERYTHING”。 - Ciro Santilli OurBigBook.com

1

我想知道左/右表示法相对于左/大小表示法的优势是什么，来自 https://dev59.com/BnVC5IYBdhLWcg3ww0Hr#42781302。左/大小似乎更容易更新兄弟节点。 - Ciro Santilli OurBigBook.com

我发现了一篇有趣的文章，比较了Postgres 8中嵌套集和邻接列表的性能：https://explainextended.com/2009/09/24/adjacency-list-vs-nested-sets-postgresql/。简而言之，邻接列表在某些操作上表现更快，而当它们表现较差时，差距是两倍。 - Zach Bloomquist

32

这是一个相当古老的问题，但由于它有很多观点，我认为提供一种替代方案非常值得，并且在我看来非常优雅。

为了读取树形结构，您可以使用递归公共表达式（CTE）。它可以一次获取整个树形结构，获得节点级别、父节点和父节点子节点中的顺序信息。

让我向您展示如何在PostgreSQL 9.1中实现这一点。

Create a structure

CREATE TABLE tree (
    id int  NOT NULL,
    name varchar(32)  NOT NULL,
    parent_id int  NULL,
    node_order int  NOT NULL,
    CONSTRAINT tree_pk PRIMARY KEY (id),
    CONSTRAINT tree_tree_fk FOREIGN KEY (parent_id) 
      REFERENCES tree (id) NOT DEFERRABLE
);


insert into tree values
  (0, 'ROOT', NULL, 0),
  (1, 'Node 1', 0, 10),
  (2, 'Node 1.1', 1, 10),
  (3, 'Node 2', 0, 20),
  (4, 'Node 1.1.1', 2, 10),
  (5, 'Node 2.1', 3, 10),
  (6, 'Node 1.2', 1, 20);

Write a query

WITH RECURSIVE 
tree_search (id, name, level, parent_id, node_order) AS (
  SELECT 
    id, 
    name,
    0,
    parent_id, 
    1 
  FROM tree
  WHERE parent_id is NULL

  UNION ALL 
  SELECT 
    t.id, 
    t.name,
    ts.level + 1, 
    ts.id, 
    t.node_order 
  FROM tree t, tree_search ts 
  WHERE t.parent_id = ts.id 
) 
SELECT * FROM tree_search 
WHERE level > 0 
ORDER BY level, parent_id, node_order;

这里是结果：

     id |    name    | level | parent_id | node_order 
    ----+------------+-------+-----------+------------
      1 | Node 1     |     1 |         0 |         10
      3 | Node 2     |     1 |         0 |         20
      2 | Node 1.1   |     2 |         1 |         10
      6 | Node 1.2   |     2 |         1 |         20
      5 | Node 2.1   |     2 |         3 |         10
      4 | Node 1.1.1 |     3 |         2 |         10
    (6 rows)

树节点按深度级别排序。在最终输出中，我们会将它们呈现在随后的行中。

对于每个级别，它们按parent_id和父节点内的node_order排序。这告诉我们如何按此顺序在输出中将节点链接到父节点。

有了这样的结构，制作一个非常漂亮的HTML演示并不困难。

递归CTE可在PostgreSQL、IBM DB2、MS SQL Server、Oracle和SQLite中使用。

如果您想了解更多有关递归SQL查询的信息，可以查看您喜欢的数据库管理系统的文档，或阅读我的两篇涵盖此主题的文章：

- Michał Kołodziejski

我喜欢这种方法的简单性，以及它如何通过最终的ORDER BY生成确定性输出（PostgreSQL 14.5指出递归查询不支持ORDER BY：error: ORDER BY in a recursive query is not implemented）。不幸的是，它产生了广度优先遍历，并且书的树形表示需要先序遍历。 - Ciro Santilli OurBigBook.com

最后两个链接显示“404 - 未找到”。 - Shadi

请注意，如果您想要添加分页功能，构建整个递归树的成本非常高昂。因为即使您只选择树的一小部分，每次都需要构建整个树。 - undefined

19

从Oracle 9i开始，您可以使用CONNECT BY。

SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name"
FROM (SELECT * FROM TMP_NODE ORDER BY "Order")
CONNECT BY PRIOR "Id" = "ParentId"
START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0)

从SQL Server 2005开始，您可以使用递归公共表达式（CTE）。

WITH [NodeList] (
  [Id]
  , [ParentId]
  , [Level]
  , [Order]
) AS (
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , 0 AS [Level]
    , CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
  WHERE [Node].[ParentId] = 0
  UNION ALL
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , [NodeList].[Level] + 1 AS [Level]
    , [NodeList].[Order] + '|'
      + CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
    INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[ParentId]
) SELECT REPLICATE(' ', [NodeList].[Level] * 4) + [Node].[Name] AS [Name]
FROM [Node]
  INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[Id]
ORDER BY [NodeList].[Order]

两个代码块的输出结果相同。

名称
'节点1'
'    节点1.1'
'        节点1.1.1'
'    节点1.2'
'节点2'
'    节点2.1'

- Eric Weilnau

cte 可以在 sqlserver 和 oracle 中使用 @Eric Weilnau - Nisar

10

Bill的回答非常好，但是我还想支持树形结构和Order属性。我在每个节点中包含了一个名为leftSibling的单一属性，它与原始问题中的Order完成相同的工作（维护从左到右的顺序）。

mysql> desc nodes ;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| name        | varchar(255) | YES  |     | NULL    |                |
| leftSibling | int(11)      | NO   |     | 0       |                |
+-------------+--------------+------+-----+---------+----------------+
3行记录（0.00秒）
mysql> desc adjacencies;
+------------+---------+------+-----+---------+----------------+
| Field      | Type    | Null | Key | Default | Extra          |
+------------+---------+------+-----+---------+----------------+
| relationId | int(11) | NO   | PRI | NULL    | auto_increment |
| parent     | int(11) | NO   |     | NULL    |                |
| child      | int(11) | NO   |     | NULL    |                |
| pathLen    | int(11) | NO   |     | NULL    |                |
+------------+---------+------+-----+---------+----------------+
4行记录（0.00秒）

更多细节和SQL代码请参见我的博客。

感谢Bill的回答，它帮助我入门了！

- bobobobo

8

有一些非常好的解决方案利用了SQL索引内部B树表示法。这是基于1998年左右进行的一些重要研究。

以下是一个示例表（在MySQL中）。

CREATE TABLE `node` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `tw` int(10) unsigned NOT NULL,
  `pa` int(10) unsigned DEFAULT NULL,
  `sz` int(10) unsigned DEFAULT NULL,
  `nc` int(11) GENERATED ALWAYS AS (tw+sz) STORED,
  PRIMARY KEY (`id`),
  KEY `node_tw_index` (`tw`),
  KEY `node_pa_index` (`pa`),
  KEY `node_nc_index` (`nc`),
  CONSTRAINT `node_pa_fk` FOREIGN KEY (`pa`) REFERENCES `node` (`tw`) ON DELETE CASCADE
)

树形表示所需的唯一字段为：

tw: 左到右DFS先序索引，其中根=1。
pa: 引用（使用tw）到父节点，根节点为空。
sz: 节点分支的大小，包括自身。
nc: 用作语法糖。它是tw+sz，表示节点的“下一个子节点”的tw。

以下是一个按tw排序的24个节点示例：

+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|   2 | A       |  2 |    1 |   14 |   16 |
|   3 | AA      |  3 |    2 |    1 |    4 |
|   4 | AB      |  4 |    2 |    7 |   11 |
|   5 | ABA     |  5 |    4 |    1 |    6 |
|   6 | ABB     |  6 |    4 |    3 |    9 |
|   7 | ABBA    |  7 |    6 |    1 |    8 |
|   8 | ABBB    |  8 |    6 |    1 |    9 |
|   9 | ABC     |  9 |    4 |    2 |   11 |
|  10 | ABCD    | 10 |    9 |    1 |   11 |
|  11 | AC      | 11 |    2 |    4 |   15 |
|  12 | ACA     | 12 |   11 |    2 |   14 |
|  13 | ACAA    | 13 |   12 |    1 |   14 |
|  14 | ACB     | 14 |   11 |    1 |   15 |
|  15 | AD      | 15 |    2 |    1 |   16 |
|  16 | B       | 16 |    1 |    1 |   17 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
|  18 | D       | 23 |    1 |    1 |   24 |
|  19 | E       | 24 |    1 |    1 |   25 |
+-----+---------+----+------+------+------+

每个树结果都可以非递归地完成。例如，要获取 tw='22' 节点的祖先列表：祖先

select anc.* from node me,node anc 
where me.tw=22 and anc.nc >= me.tw and anc.tw <= me.tw 
order by anc.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

兄弟姐妹和子节点都很琐碎 - 只需使用按tw字段顺序排列的pa。

后代节点

例如，根节点为tw = 17的节点集合（分支）。

select des.* from node me,node des 
where me.tw=17 and des.tw < me.nc and des.tw >= me.tw 
order by des.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

附加说明

当读取的数量远大于插入或更新时，这种方法非常有用。

由于在树中插入、移动或更新节点需要调整树，因此在开始操作之前必须锁定表。

插入/删除成本很高，因为 tw 索引和 sz（分支大小）值需要在插入点后的所有节点以及所有祖先中进行更新。

分支移动涉及将分支的 tw 值移出范围，因此在移动分支时还需要禁用外键约束。移动分支需要四个查询：

将分支移出范围。
关闭它留下的间隙。（剩余的树现在已经规范化了）。
打开它要去的间隙。
将分支移动到其新位置。

调整树查询

在 create/update/delete 方法中使用的树中间隙的打开/关闭是一个重要的子函数，因此我在此处包含它。

我们需要两个参数——表示我们是否正在缩小或扩大的标志，以及节点的 tw 索引。例如，tw=18（其分支大小为5）。假设我们正在缩小（删除 tw），这意味着我们在以下示例的更新中使用“-”而不是“+”。

我们首先使用一个（稍作修改的）祖先函数来更新 sz 值。

update node me, node anc set anc.sz = anc.sz - me.sz from 
node me, node anc where me.tw=18 
and ((anc.nc >= me.tw and anc.tw < me.pa) or (anc.tw=me.pa));

然后我们需要调整那些tw值高于要移除分支的人的tw值。

update node me, node anc set anc.tw = anc.tw - me.sz from 
node me, node anc where me.tw=18 and anc.tw >= me.tw;

接下来，我们需要调整父级元素，对于那些父级 tw 值高于要移除的分支的元素。

update node me, node anc set anc.pa = anc.pa - me.sz from 
node me, node anc where me.tw=18 and anc.pa >= me.tw;

- Konchog

7

如果可以选择，我会使用对象。我会为每个记录创建一个对象，其中每个对象都有一个children集合，并将它们全部存储在一个关联数组（/哈希表）中，其中Id是键。然后遍历整个集合一次，将子元素添加到相应的子元素字段中。简单明了。

但是因为您限制了某些好的面向对象编程，让人感觉不太愉快，所以我可能会基于以下迭代：

function PrintLine(int pID, int level)
    foreach record where ParentID == pID
        print level*tabs + record-data
        PrintLine(record.ID, level + 1)

PrintLine(0, 0)

编辑：这与其他几个条目类似，但我认为它更加简洁。我要补充一点：这非常依赖SQL，非常糟糕。如果可以选择，最好采用面向对象编程（OOP）的方法。

- Oli

这就是我所说的“没有框架”的意思 - 你在使用LINQ，对吧？关于你的第一段：结果集已经存在了，为什么要先将所有信息复制到一个新的对象结构中呢？（我对这个事实表述不够清楚，抱歉） - Tomalak

Tomalak - 不，这不是真正的代码，而是伪代码。当然，你需要将其拆分为适当的选择器和迭代器……以及实际的语法！为什么要使用OOP？因为你可以精确地反映出结构。它可以让事情保持良好的结构并且恰好更有效率（只有一个选择器）。 - Oli

我也没有考虑重复选择。关于OOP：Mark Bessey在他的答案中说：“你可以用哈希图模拟任何其他数据结构，所以这并不是一个可怕的限制。”你的解决方案是正确的，但我认为即使没有OOP，还有一些改进的空间。 - Tomalak

5

这篇文章写得比较快，既不美观也不高效（而且它自动装箱很多，转换 int 和 Integer 很麻烦！），但它能够工作。

由于我正在从真正的工作中分心，所以这可能违反了一些规则，因为我正在创建自己的对象。

这还假设 resultSet/table 在开始构建节点之前已经完全读入某种结构中，如果你有成千上万行数据，这并不是最好的解决方案。

public class Node {

    private Node parent = null;

    private List<Node> children;

    private String name;

    private int id = -1;

    public Node(Node parent, int id, String name) {
        this.parent = parent;
        this.children = new ArrayList<Node>();
        this.name = name;
        this.id = id;
    }

    public int getId() {
        return this.id;
    }

    public String getName() {
        return this.name;
    }

    public void addChild(Node child) {
        children.add(child);
    }

    public List<Node> getChildren() {
        return children;
    }

    public boolean isRoot() {
        return (this.parent == null);
    }

    @Override
    public String toString() {
        return "id=" + id + ", name=" + name + ", parent=" + parent;
    }
}

public class NodeBuilder {

    public static Node build(List<Map<String, String>> input) {

        // maps id of a node to it's Node object
        Map<Integer, Node> nodeMap = new HashMap<Integer, Node>();

        // maps id of a node to the id of it's parent
        Map<Integer, Integer> childParentMap = new HashMap<Integer, Integer>();

        // create special 'root' Node with id=0
        Node root = new Node(null, 0, "root");
        nodeMap.put(root.getId(), root);

        // iterate thru the input
        for (Map<String, String> map : input) {

            // expect each Map to have keys for "id", "name", "parent" ... a
            // real implementation would read from a SQL object or resultset
            int id = Integer.parseInt(map.get("id"));
            String name = map.get("name");
            int parent = Integer.parseInt(map.get("parent"));

            Node node = new Node(null, id, name);
            nodeMap.put(id, node);

            childParentMap.put(id, parent);
        }

        // now that each Node is created, setup the child-parent relationships
        for (Map.Entry<Integer, Integer> entry : childParentMap.entrySet()) {
            int nodeId = entry.getKey();
            int parentId = entry.getValue();

            Node child = nodeMap.get(nodeId);
            Node parent = nodeMap.get(parentId);
            parent.addChild(child);
        }

        return root;
    }
}

public class NodePrinter {

    static void printRootNode(Node root) {
        printNodes(root, 0);
    }

    static void printNodes(Node node, int indentLevel) {

        printNode(node, indentLevel);
        // recurse
        for (Node child : node.getChildren()) {
            printNodes(child, indentLevel + 1);
        }
    }

    static void printNode(Node node, int indentLevel) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < indentLevel; i++) {
            sb.append("\t");
        }
        sb.append(node);

        System.out.println(sb.toString());
    }

    public static void main(String[] args) {

        // setup dummy data
        List<Map<String, String>> resultSet = new ArrayList<Map<String, String>>();
        resultSet.add(newMap("1", "Node 1", "0"));
        resultSet.add(newMap("2", "Node 1.1", "1"));
        resultSet.add(newMap("3", "Node 2", "0"));
        resultSet.add(newMap("4", "Node 1.1.1", "2"));
        resultSet.add(newMap("5", "Node 2.1", "3"));
        resultSet.add(newMap("6", "Node 1.2", "1"));

        Node root = NodeBuilder.build(resultSet);
        printRootNode(root);

    }

    //convenience method for creating our dummy data
    private static Map<String, String> newMap(String id, String name, String parentId) {
        Map<String, String> row = new HashMap<String, String>();
        row.put("id", id);
        row.put("name", name);
        row.put("parent", parentId);
        return row;
    }
}

- matt b

当我看到大量源代码时，我总是很难将算法特定部分与实现特定部分区分开来。这就是为什么我一开始就要求一个不特定于语言的解决方案。但它确实完成了工作，所以感谢您的时间！ - Tomalak

我现在明白你的意思了，如果不明显的话，主算法在NodeBuilder.build()中 - 我可能本可以更好地总结一下。 - matt b

4

假设您已经了解根元素的编号为零，以下是将输出文本的伪代码:

function PrintLevel (int curr, int level)
    //print the indents
    for (i=1; i<=level; i++)
        print a tab
    print curr \n;
    for each child in the table with a parent of curr
        PrintLevel (child, level+1)


for each elementID where the parentid is zero
    PrintLevel(elementID, 0)

- wcm

3

您可以使用哈希映射模拟任何其他数据结构，因此这并不是一个可怕的限制。从上到下扫描，为数据库的每一行创建一个哈希映射，其中包含每个列的条目。将这些哈希映射添加到“主”哈希映射中，以id为键。如果任何节点有一个您尚未看到的“父级”，请在主哈希映射中创建一个占位符条目，并在看到实际节点时填充它。

要打印它，只需通过数据进行简单的深度优先遍历，同时跟踪缩进级别。您可以通过为每一行保留一个“子项”条目并在扫描数据时填充它来使此过程更加容易。

至于是否有更好的方法在数据库中存储树形结构，这取决于您如何使用数据。我见过一些系统，其具有已知的最大深度，这些系统在层次结构的每个级别中使用不同的表格。如果树中的级别毕竟不相等（顶级类别与叶子不同），那么这是很有意义的。

- Mark Bessey

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Bill Karwin · Accepted Answer

现在MySQL 8.0支持递归查询, 我们可以说所有流行的SQL数据库都支持标准语法下的递归查询。

WITH RECURSIVE MyTree AS (
    SELECT * FROM MyTable WHERE ParentId IS NULL
    UNION ALL
    SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

我在2017年的演示中测试了MySQL 8.0中的递归查询Recursive Query Throwdown。

以下是我2008年的原始答案：

在关系数据库中存储树形数据有几种方法。你在示例中展示了两种方法：

相邻列表（“parent”列）和
路径枚举（名称列中的点号数字）。

另一种解决方案称为嵌套集，它也可以存储在同一张表中。阅读Joe Celko的《SQL for Smarties中的树和层次结构》以获取更多关于这些设计的信息。

我通常更喜欢使用称为闭包表（也称为“相邻关系”）的设计来存储树形数据。它需要另一张表，但查询树很容易。

我在我的演示文稿使用SQL和PHP模型处理分层数据和我的书籍 SQL Antipatterns Volume 1：避免数据库编程陷阱中介绍了闭包表。

CREATE TABLE ClosureTable (
  ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
  descendant_id INT NOT NULL REFERENCES FlatTable(id),
  PRIMARY KEY (ancestor_id, descendant_id)
);

在闭包表中存储所有路径，其中一个节点直接从另一个节点继承。为每个节点包括一行引用自身的记录。例如，使用您在问题中展示的数据集：

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
  (1,1), (1,2), (1,4), (1,6),
  (2,2), (2,4),
  (3,3), (3,5),
  (4,4),
  (5,5),
  (6,6);

现在您可以像这样从节点1开始获取一棵树：

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

输出结果（在MySQL客户端中）如下所示：

+----+
| id |
+----+
|  1 | 
|  2 | 
|  4 | 
|  6 | 
+----+

换句话说，节点3和5被排除在外，因为它们属于一个单独的层级结构，不是从节点1下降的。

关于直接子代（或直接父代）的评论，您可以在 ClosureTable 中添加一个 "path_length" 列，以便更轻松地查询特定的直接子代或父代（或任何其他距离）。

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
  (1,1,0), (1,2,1), (1,4,2), (1,6,1),
  (2,2,0), (2,4,1),
  (3,3,0), (3,5,1),
  (4,4,0),
  (5,5,0),
  (6,6,0);

然后，您可以在搜索中添加一个术语，以查询给定节点的直接子代。这些是路径长度为1的后代。

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
  AND path_length = 1;

+----+
| id |
+----+
|  2 | 
|  6 | 
+----+

针对@ashraf的评论："如何按名称对整个树进行排序？"

以下是一个示例查询，返回所有作为节点1的后代的节点，将它们与包含其他节点属性（例如name）的FlatTable连接，并按名称排序。

SELECT f.name
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;

来自@Nate的评论：

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id) 
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
WHERE a.ancestor_id = 1 
GROUP BY a.descendant_id 
ORDER BY f.name

+------------+-------------+
| name       | breadcrumbs |
+------------+-------------+
| Node 1     | 1           |
| Node 1.1   | 1,2         |
| Node 1.1.1 | 1,2,4       |
| Node 1.2   | 1,6         |
+------------+-------------+

今天有一个用户提出了一个编辑建议。SO主管批准了这个编辑，但我正在撤销它。

该编辑建议上面的最后一个查询中的ORDER BY应该是ORDER BY b.path_length, f.name，可能是为了确保排序与层次结构匹配。但这行不通，因为它会将“节点1.1.1”排在“节点1.2”之后。

如果您想让排序方式以合理的方式匹配层次结构，则有可能实现，但不能仅通过路径长度排序。例如，请参见我的答案MySQL Closure Table hierarchical database - How to pull information out in the correct order。