我想在我的脚本中使用轮廓分数,以便从sklearn中自动计算k均值聚类中的簇数。
import numpy as np
import pandas as pd
import csv
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
filename = "CSV_BIG.csv"
# Read the CSV file with the Pandas lib.
path_dir = ".\\"
dataframe = pd.read_csv(path_dir + filename, encoding = "utf-8", sep = ';' ) # "ISO-8859-1")
df = dataframe.copy(deep=True)
#Use silhouette score
range_n_clusters = list (range(2,10))
print ("Number of clusters from 2 to 9: \n", range_n_clusters)
for n_clusters in range_n_clusters:
clusterer = KMeans (n_clusters=n_clusters).fit(?)
preds = clusterer.predict(?)
centers = clusterer.cluster_centers_
score = silhouette_score (?, preds, metric='euclidean')
print ("For n_clusters = {}, silhouette score is {})".format(n_clusters, score)
有人可以帮我解决问号的问题吗?我不明白应该在问号的位置放什么。我已经从一个例子中复制了代码。 被注释掉的部分是之前的版本,其中我使用固定数量为4的聚类进行k-means聚类。这种方式的代码是正确的,但在我的项目中,我需要自动选择聚类的数量。