无法运行Scipy层次聚类算法

Question

无法运行Scipy层次聚类算法

pythonscipycluster-analysishierarchical-clustering

7

我写了一个简单的脚本，旨在对一个简单的测试数据集进行分层聚类。使用的测试数据

我发现函数fclusterdata可以将我的数据聚类成两个簇。它需要两个必填参数：数据集和阈值。

问题是，我找不到一个能产生预期两个簇的阈值。

如果有人能告诉我我做错了什么，我会很高兴。如果有人能指出其他更适合我的聚类方法（我明确希望避免事先指定簇的数量），我也会很高兴。

这是我的代码：

import time
import scipy.cluster.hierarchy as hcluster
import numpy.random as random
import numpy

import pylab
pylab.ion()

data = random.randn(2,200)

data[:100,:100] += 10

for i in range(5,15):
    thresh = i/10.
    clusters = hcluster.fclusterdata(numpy.transpose(data), thresh)
    pylab.scatter(*data[:,:], c=clusters)
    pylab.axis("equal")
    title = "threshold: %f, number of clusters: %d" % (thresh, len(set(clusters)))
    print title
    pylab.title(title)
    pylab.draw()
    time.sleep(0.5)
    pylab.clf()

以下是输出结果：

threshold: 0.500000, number of clusters: 129
threshold: 0.600000, number of clusters: 129
threshold: 0.700000, number of clusters: 129
threshold: 0.800000, number of clusters: 75
threshold: 0.900000, number of clusters: 75
threshold: 1.000000, number of clusters: 73
threshold: 1.100000, number of clusters: 58
threshold: 1.200000, number of clusters: 1
threshold: 1.300000, number of clusters: 1
threshold: 1.400000, number of clusters: 1

- moooeeeep

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Diego · Accepted Answer

请注意，函数参考文档中存在错误。正确的t参数定义为：“聚类函数的截止阈值或最大聚类数（criterion=’maxclust’）”。

因此，请尝试以下操作：

clusters = hcluster.fclusterdata(numpy.transpose(data), 2, criterion='maxclust', metric='euclidean', depth=1, method='centroid')