geomexp.evaluation package¶
Submodules¶
geomexp.evaluation.metrics module¶
Evaluation metrics for clustering experiments.
Provides wrappers around scikit-learn metrics plus custom diagnostics for bootstrap stability and cluster shape analysis.
- geomexp.evaluation.metrics.adjusted_rand_index(y_true, y_pred)[source]¶
Adjusted Rand Index between two partitions.
- geomexp.evaluation.metrics.normalized_mutual_info(y_true, y_pred)[source]¶
Normalised Mutual Information between two partitions.
- geomexp.evaluation.metrics.davies_bouldin(X, labels)[source]¶
Davies–Bouldin index (lower is better).
- geomexp.evaluation.metrics.stability_score(X, clusterer_factory, n_resamples=20, subsample_frac=0.8, rng=None)[source]¶
Bootstrap stability of a clustering method.
Repeatedly draws sub-samples, fits the clusterer on each, and measures the mean pairwise ARI on the intersection of each pair of sub-samples.
- Parameters:
X (
ndarray[tuple[Any,...],dtype[double]]) – Data array of shape(n_samples, n_features).clusterer_factory (
Callable[[],object]) – Zero-argument callable returning a fresh clusterer instance (must have a.fit(X)method returning aClusterResult).n_resamples (
int) – Number of bootstrap resamples.subsample_frac (
float) – Fraction of data to draw per resample.
- Return type:
- Returns:
Mean pairwise ARI across resamples (higher = more stable).
- geomexp.evaluation.metrics.radius_ratio(X, centers, assignments)[source]¶
Per-cluster radius ratio: max distance / median distance to centroid.
A large ratio indicates elongated or “tendril”-like cluster capture regions.
- Parameters:
- Return type:
- Returns:
Array of shape
(n_clusters,)with the radius ratio for each cluster.
- geomexp.evaluation.metrics.variation_of_information(y_true, y_pred)[source]¶
Variation of Information between two partitions.
Defined as \(\mathrm{VI}(U, V) = H(U) + H(V) - 2\,I(U, V)\), using natural logs.
- geomexp.evaluation.metrics.misclassification_error(y_true, y_pred)[source]¶
Minimum misclassification error under optimal label permutation.
Uses the Hungarian algorithm to find the permutation of predicted labels that maximises agreement with the ground truth.
- geomexp.evaluation.metrics.run_methods(X, methods, n_inits=20, base_seed=0)[source]¶
Fit several clustering methods, keeping the best-of-
n_initsrun.Each entry in
methodsis a dict with keys"name","cls", and"kwargs"(passed to the constructor). For each method the algorithm is re-initialisedn_initstimes (viarandom_state) and the run with the lowest objective is kept.- Parameters:
- Return type:
- Returns:
Dict mapping method name to its best
ClusterResult.
Module contents¶
Evaluation utilities for clustering experiments.
- geomexp.evaluation.adjusted_rand_index(y_true, y_pred)[source]¶
Adjusted Rand Index between two partitions.
- geomexp.evaluation.misclassification_error(y_true, y_pred)[source]¶
Minimum misclassification error under optimal label permutation.
Uses the Hungarian algorithm to find the permutation of predicted labels that maximises agreement with the ground truth.
- geomexp.evaluation.normalized_mutual_info(y_true, y_pred)[source]¶
Normalised Mutual Information between two partitions.
- geomexp.evaluation.radius_ratio(X, centers, assignments)[source]¶
Per-cluster radius ratio: max distance / median distance to centroid.
A large ratio indicates elongated or “tendril”-like cluster capture regions.
- Parameters:
- Return type:
- Returns:
Array of shape
(n_clusters,)with the radius ratio for each cluster.
- geomexp.evaluation.run_methods(X, methods, n_inits=20, base_seed=0)[source]¶
Fit several clustering methods, keeping the best-of-
n_initsrun.Each entry in
methodsis a dict with keys"name","cls", and"kwargs"(passed to the constructor). For each method the algorithm is re-initialisedn_initstimes (viarandom_state) and the run with the lowest objective is kept.- Parameters:
- Return type:
- Returns:
Dict mapping method name to its best
ClusterResult.
- geomexp.evaluation.stability_score(X, clusterer_factory, n_resamples=20, subsample_frac=0.8, rng=None)[source]¶
Bootstrap stability of a clustering method.
Repeatedly draws sub-samples, fits the clusterer on each, and measures the mean pairwise ARI on the intersection of each pair of sub-samples.
- Parameters:
X (
ndarray[tuple[Any,...],dtype[double]]) – Data array of shape(n_samples, n_features).clusterer_factory (
Callable[[],object]) – Zero-argument callable returning a fresh clusterer instance (must have a.fit(X)method returning aClusterResult).n_resamples (
int) – Number of bootstrap resamples.subsample_frac (
float) – Fraction of data to draw per resample.
- Return type:
- Returns:
Mean pairwise ARI across resamples (higher = more stable).