如何从数据框中制作一个googleVis多重桑基图?

5

目的

我打算使用googleVis软件包在R中制作多重桑基图。输出结果应该类似于这个:

enter image description here

数据

我在R中创建了一些虚拟数据:

set.seed(1)

source <- sample(c("North","South","East","West"),100,replace=T)
mid <- sample(c("North ","South ","East ","West "),100,replace=T)
destination <- sample(c("North","South","East","West"),100,replace=T) # N.B. It is important to have a space after the second set of destinations to avoid a cycle
dummy <- rep(1,100) # For aggregation

dat <- data.frame(source,mid,destination,dummy)
aggdat <- aggregate(dummy~source+mid+destination,dat,sum)

我已经尝试过的方法

如果只有起点和终点,我可以很好地构建具有两个变量的桑基图,但是如果涉及到中间节点,我就无法成功:

aggdat <- aggregate(dummy~source+destination,dat,sum)

library(googleVis)

p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
plot(p)

代码生成了以下内容:

enter image description here

问题

如何修改

p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")

需要接受 mid 变量吗?

1个回答

8

gvisSankey函数可以直接接受中间层级。这些层级必须在底层数据中进行编码。

 source <- sample(c("NorthSrc", "SouthSrc", "EastSrc", "WestSrc"), 100, replace=T)
 mid <- sample(c("NorthMid", "SouthMid", "EastMid", "WestMid"), 100, replace=T)
 destination <- sample(c("NorthDes", "SouthDes", "EastDes", "WestDes"), 100, replace=T) 
 dummy <- rep(1,100) # For aggregation

现在,我们将重塑原始数据:
 library(dplyr)

 datSM <- dat %>%
  group_by(source, mid) %>%
  summarise(toMid = sum(dummy) ) %>%
  ungroup()

数据框架datSM总结了从源到中间的单元数量。

  datMD <- dat %>%
   group_by(mid, destination) %>%
   summarise(toDes = sum(dummy) ) %>%
   ungroup()

数据框datMD总结了从中部到目的地的单位数量。此数据框将添加到最终数据框中。数据框需要进行ungroup操作,并具有相同的colnames

  colnames(datSM) <- colnames(datMD) <- c("From", "To", "Dummy")

datMD被附加在最后一个位置,gvisSankey将自动识别中间步骤。

  datVis <- rbind(datSM, datMD)

  p <- gvisSankey(datVis, from="From", to="To", weight="dummy")
  plot(p)

这里是图表: 多级桑基图

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接