如何使用R中的igraph获取有向子树中的所有叶节点?

4
给定一个由包含符号边缘列表的数据框构建的有向树(igraph图形):
library(igraph)
library(ggraph)

#Create a symbolic edge list.
edgelist_df <- data.frame("from" = c("A", "A", "A", "B", "B", "B", "C", "D", "D", "E", 
                                     "E", "F", "G", "G", "H", "I", "I", "J", "J", "J"),
                          "to"   = c("B", "C", "D", "E", "F", "G", "H", "I", "J", "K", 
                                     "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U"))

#Create a directed tree from this edgelist. 
graph <- graph_from_data_frame(d = edgelist_df, directed = TRUE)

绘制树形图。这里我使用ggraph包中的ggraph函数。

ggraph(graph = graph, 
       layout = 'dendrogram', 
       circular = FALSE) +
  geom_edge_diagonal() +
  geom_node_point() +
  geom_node_text(aes(label = name),
                 angle = 0,
                 hjust = 1.5,
                 nudge_y = 0,
                 size = 5) +
  theme_void()

enter image description here

问题是如何返回一个字符向量,其中包含由一个节点指定的子树中所有叶节点的名称,该节点表示该子树的根节点。例如:
  • 如果节点为“B”,则属于以“B”为根节点的子树的所有叶节点为:“K”,“L”,“M”,“N”和“O”。
  • 如果节点为“H”,则属于以“H”为根节点的子树的所有叶节点为:“P”。
  • 如果节点为“A”,则属于以“A”为根节点(即原始树)的子树中的所有叶节点为:“K”,“L”,“M”,“N”,“O”,“P”,“Q”,“R”,“S”,“T”和“U”。
3个回答

3
您可以按以下方式使用distances()degree定义函数f
f <- function(g, r) {
  names(V(g))[is.finite(distances(g, r, mode = "out")) & degree(g) == 1]
}

它提供了

> f(g, "B")
[1] "K" "L" "M" "N" "O"

1
假设对象 graph 确实是一棵树,则下面的函数会给出所需的结果。
determine_leaf_nodes_subtree <- function(graph, vertex){

#Determine the name of the root vertex.
root               <- V(graph)$name[degree(graph = graph, v = V(graph), mode = "in") == 0]

#Determine the name(s) of the leaf vertex/vertices.
leafs              <- V(graph)$name[degree(graph = graph, v = V(graph), mode = "out") == 0]

#Calculate the tree depth. That is, the largest path length of all shortest paths from root 
#to leaf vertices.
sh_paths           <- shortest_paths(graph = graph, from = root, to = leafs)$vpath 
max_sh_path_length <- max(sapply(X = sh_paths, FUN = length))

#If 'vertex' is a leaf node itself, then return the name of that vertex.
if(vertex %in% leafs){

  return(vertex)

} else {
  
  #If 'vertex' is not a leaf node, then determine the subset of all vertices that are in the 
  #neighborhood of 'vertex', excluding 'vertex' itself ('mindist' = 1). The maximum order or 
  #'depth' is specified by ('max_sh_path_length')
  vertices_subset <- neighborhood(graph = graph, 
                                  order = max_sh_path_length - 1, 
                                  nodes = V(graph)[V(graph)$name == vertex], 
                                  mode = "out",
                                  mindist = 1)
  
  #Extract the names of the vertices in 'vertices_subset'
  vertices_subset_names <- names(unlist(vertices_subset))
  
  #The overlap/intersection of vertex names between 'vertices_subset_names' and 'leafs' gives 
  #all leaf nodes that are part of the subtree with root vertex 'vertex'.
  result <- intersect(x = vertices_subset_names, leafs)

  return(result)
 }
}

关于问题中提到的示例,此函数给出以下输出:

determine_leaf_nodes_subtree(graph = graph, vertex = "B")
[1] "K" "L" "M" "N" "O"

determine_leaf_nodes_subtree(graph = graph, vertex = "H")
[1] "P"

determine_leaf_nodes_subtree(graph = graph, vertex = "A")
[1] "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"

1
你可以对ego进行两次调用:第一次是获取从父节点到子节点可达的所有节点(mode="out"),第二次是查看这些选定的节点是否有子节点(如果没有,则它们是叶节点)。
fun <- function(graph, node="B"){
  path <- ego(graph, order=length(V(graph)), nodes=node, mode="out")
  nms <- names(path[[1]])
  nms[ego_size(graph, order=1, nodes=nms, mode="out", mindist=1) == 0]
}

这个产生了什么。
fun(graph, "B")
# [1] "K" "L" "M" "N" "O"
fun(graph, "H")
# [1] "P"
fun(graph, "A")
# [1] "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"

实际上,可以借鉴彼得的回答,将对ego_size的调用替换为。
nms[degree(graph, v=nms, mode="out") == 0]

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接