objectif
Clustering KMeans et variante MiniBatch pour gros jeux.
code minimal
from sklearn.cluster import KMeans, MiniBatchKMeans
from sklearn.datasets import make_blobs
X, _ = make_blobs(n_samples=200, centers=3, random_state=0)
print(KMeans(n_clusters=3, n_init=10, random_state=0).fit(X).inertia_ >= 0.0)
utilisation
from sklearn.cluster import MiniBatchKMeans
print(MiniBatchKMeans(n_clusters=3, random_state=0).fit(X).cluster_centers_.shape[0] == 3)
variante(s) utile(s)
from sklearn.metrics import silhouette_score
print(silhouette_score(X, KMeans(n_clusters=3, n_init=10, random_state=0).fit_predict(X)) <= 1.0)
notes
- Tester plusieurs k; standardiser si échelles différentes.