确定基于序列（距离）的聚类理想簇数

Question

确定基于序列（距离）的聚类理想簇数

rcluster-analysistraminersequence-alignment

4

我已经编写了以下函数来对基于序列的数据进行聚类：

library(TraMineR)
library(cluster)

clustering <- function(data){
  data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL")
  couts <- seqsubm(data, method = "CONSTANT")
  data.om <- seqdist(data, method = "OM", indel = 3, sm = couts)
  clusterward <- agnes(data.om, diss = TRUE, method = "ward")
  (clusterward)
}

rc <- clustering(rubinius_sequences)

cluster_cut <- function(data, clusterward, n_clusters, name_clusters){
  data <- seqdef(data, left = "DEL", gaps = "DEL", right = "DEL")
  cluster4 <- cutree(clusterward, k = n_clusters)
  cluster4 <- factor(cluster4, labels = c("Type 1", "Type 2", "Type 3", "Type 4"))
  (data[cluster4==name_clusters,])
}

rc1 <- cluster_cut(project_sequences, rc, 4, "Type 1")

然而，在这里聚类的数量是任意指定的。有没有某种方法可以显示在某个聚类数量下所捕获的方差量（或类似的度量）开始在某个聚类数量上达到收益递减点？我想象中会类似于因子分析中的屏幕图。

- histelheim

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- histelheim · Accepted Answer

library(WeightedCluster)  
(agnesRange <- wcKMedRange(rubinius.dist, 2:10))
plot(agnesRange, stat = c("ASW", "HG", "PBC"), lwd = 5)

这将提供多个指标来寻找理想的聚类数量，以及一个图表。有关这些指标的更多信息可以在此处找到（集群质量下）： http://mephisto.unige.ch/weightedcluster/