如何可视化集群边界

3
我生成了多个数据集,并使用分类器预测了簇的分布。我需要在图表上绘制簇之间的边界,可以是线形或填充区域形式 - 没有关系。请告诉我是否有任何方法可以做到这一点。
我的代码:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import make_moons, make_circles
from sklearn.model_selection import train_test_split

n_sample = 2000

def make_square(n_sample):
    data=np.array([0,[]])
    data[0] = np.random.sample((n_sample,2))
    for i in range(n_sample):
        if data[0][i][0] > 0.5 and data[0][i][1] > 0.5 or data[0][i][0] < 0.5 and data[0][i][1] < 0.5:
            data[1].append(1)
        else:
            data[1].append(0)
    return data

datasets = [
    make_circles(n_samples=n_sample, noise=0.09, factor=0.5),
    make_square(n_sample),
    make_moons(n_samples=n_sample, noise=0.12),
]

ks=[]
for data in datasets:
    X,y = data[0],data[1]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=33) 
    classifier = KNeighborsClassifier(n_neighbors=1) 
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    acc =  classifier.score(X_test, y_test)
    accs = []
    for i in range(1, 8):
        knn = KNeighborsClassifier(n_neighbors=i)
        knn.fit(X_train, y_train)
        pred_i = knn.predict(X_test)
        acc0 =  knn.score(X_test, y_test)
        accs.append(acc0)
    plt.figure(figsize=(12, 6))
    plt.plot(range(1, 8), accs, color='red', linestyle='dashed', marker='o',
            markerfacecolor='blue', markersize=10)
    plt.title('accs Score K Value')
    plt.xlabel('K Value')
    plt.ylabel('accs Score')
    print("Max Score:", max(accs), "k=",accs.index(max(accs))+1)
    ks.append(accs.index(max(accs))+1)

for i in range(3):
    data = datasets[i]
    k = ks[i]
    X,y = data[0],data[1]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=33) 
    classifier = KNeighborsClassifier(n_neighbors=k) 
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)
    plt.figure(figsize=(9,9))
    plt.title("Test")
    plt.scatter(X_test[:,0], X_test[:,1], c=y_test)
    plt.figure(figsize=(9,9))
    plt.title("Predict")
    plt.scatter(X_test[:,0], X_test[:,1], c=y_pred)

示例输出:

在此输入图像描述 在此输入图像描述


你需要决策边界吗?即一条或多条线,样本被分配到任何一个簇的概率相等,还是“仅仅”基于每个簇的所有样本的轮廓线? - Raketenolli
1个回答

3

scikit-learn 1.1引入了DecisionBoundaryDisplay来协助完成这种任务。

在使用make_moonsKNeighborsClassifier后,我们可以将分类器拟合到数据集上,调用DecisionBoundaryDisplay.from_estimator()方法,然后在返回的轴上scatter X数据:

import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.neighbors import KNeighborsClassifier
from sklearn.inspection import DecisionBoundaryDisplay

X, y = make_moons(noise=0.2)
clf = KNeighborsClassifier().fit(X, y)

disp = DecisionBoundaryDisplay.from_estimator(clf, X, response_method="predict", alpha=0.3)
disp.ax_.scatter(X[:, 0], X[:, 1], c=y)
plt.show()

导致类似这样的结果:

Noisy moons dataset showing class a two class classification problem and a margin that roughly separates the purple points from the yellow points.


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接