geomexp.evaluation package

Submodules

geomexp.evaluation.metrics module

Evaluation metrics for clustering experiments.

Provides wrappers around scikit-learn metrics plus custom diagnostics for bootstrap stability and cluster shape analysis.

geomexp.evaluation.metrics.adjusted_rand_index(y_true, y_pred)[source]

Adjusted Rand Index between two partitions.

Parameters:
Return type:

float

Returns:

ARI in \([-1, 1]\) (1 = perfect agreement).

geomexp.evaluation.metrics.normalized_mutual_info(y_true, y_pred)[source]

Normalised Mutual Information between two partitions.

Parameters:
Return type:

float

Returns:

NMI in \([0, 1]\) (1 = perfect agreement).

geomexp.evaluation.metrics.silhouette(X, labels)[source]

Mean silhouette coefficient.

Parameters:
Return type:

float

Returns:

Silhouette score in \([-1, 1]\).

geomexp.evaluation.metrics.davies_bouldin(X, labels)[source]

Davies–Bouldin index (lower is better).

Parameters:
Return type:

float

Returns:

Davies–Bouldin score (non-negative; 0 is ideal).

geomexp.evaluation.metrics.stability_score(X, clusterer_factory, n_resamples=20, subsample_frac=0.8, rng=None)[source]

Bootstrap stability of a clustering method.

Repeatedly draws sub-samples, fits the clusterer on each, and measures the mean pairwise ARI on the intersection of each pair of sub-samples.

Parameters:
  • X (ndarray[tuple[Any, ...], dtype[double]]) – Data array of shape (n_samples, n_features).

  • clusterer_factory (Callable[[], object]) – Zero-argument callable returning a fresh clusterer instance (must have a .fit(X) method returning a ClusterResult).

  • n_resamples (int) – Number of bootstrap resamples.

  • subsample_frac (float) – Fraction of data to draw per resample.

  • rng (Generator | None) – Optional random generator.

Return type:

float

Returns:

Mean pairwise ARI across resamples (higher = more stable).

geomexp.evaluation.metrics.radius_ratio(X, centers, assignments)[source]

Per-cluster radius ratio: max distance / median distance to centroid.

A large ratio indicates elongated or “tendril”-like cluster capture regions.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Array of shape (n_clusters,) with the radius ratio for each cluster.

geomexp.evaluation.metrics.variation_of_information(y_true, y_pred)[source]

Variation of Information between two partitions.

Defined as \(\mathrm{VI}(U, V) = H(U) + H(V) - 2\,I(U, V)\), using natural logs.

Parameters:
Return type:

float

Returns:

Non-negative VI, with 0 indicating identical partitions.

geomexp.evaluation.metrics.misclassification_error(y_true, y_pred)[source]

Minimum misclassification error under optimal label permutation.

Uses the Hungarian algorithm to find the permutation of predicted labels that maximises agreement with the ground truth.

Parameters:
Return type:

float

Returns:

Misclassification rate in \([0, 1]\) (0 = perfect agreement).

geomexp.evaluation.metrics.run_methods(X, methods, n_inits=20, base_seed=0)[source]

Fit several clustering methods, keeping the best-of-n_inits run.

Each entry in methods is a dict with keys "name", "cls", and "kwargs" (passed to the constructor). For each method the algorithm is re-initialised n_inits times (via random_state) and the run with the lowest objective is kept.

Parameters:
Return type:

dict[str, ClusterResult]

Returns:

Dict mapping method name to its best ClusterResult.

Module contents

Evaluation utilities for clustering experiments.

geomexp.evaluation.adjusted_rand_index(y_true, y_pred)[source]

Adjusted Rand Index between two partitions.

Parameters:
Return type:

float

Returns:

ARI in \([-1, 1]\) (1 = perfect agreement).

geomexp.evaluation.davies_bouldin(X, labels)[source]

Davies–Bouldin index (lower is better).

Parameters:
Return type:

float

Returns:

Davies–Bouldin score (non-negative; 0 is ideal).

geomexp.evaluation.misclassification_error(y_true, y_pred)[source]

Minimum misclassification error under optimal label permutation.

Uses the Hungarian algorithm to find the permutation of predicted labels that maximises agreement with the ground truth.

Parameters:
Return type:

float

Returns:

Misclassification rate in \([0, 1]\) (0 = perfect agreement).

geomexp.evaluation.normalized_mutual_info(y_true, y_pred)[source]

Normalised Mutual Information between two partitions.

Parameters:
Return type:

float

Returns:

NMI in \([0, 1]\) (1 = perfect agreement).

geomexp.evaluation.radius_ratio(X, centers, assignments)[source]

Per-cluster radius ratio: max distance / median distance to centroid.

A large ratio indicates elongated or “tendril”-like cluster capture regions.

Parameters:
Return type:

ndarray[tuple[Any, ...], dtype[double]]

Returns:

Array of shape (n_clusters,) with the radius ratio for each cluster.

geomexp.evaluation.run_methods(X, methods, n_inits=20, base_seed=0)[source]

Fit several clustering methods, keeping the best-of-n_inits run.

Each entry in methods is a dict with keys "name", "cls", and "kwargs" (passed to the constructor). For each method the algorithm is re-initialised n_inits times (via random_state) and the run with the lowest objective is kept.

Parameters:
Return type:

dict[str, ClusterResult]

Returns:

Dict mapping method name to its best ClusterResult.

geomexp.evaluation.silhouette(X, labels)[source]

Mean silhouette coefficient.

Parameters:
Return type:

float

Returns:

Silhouette score in \([-1, 1]\).

geomexp.evaluation.stability_score(X, clusterer_factory, n_resamples=20, subsample_frac=0.8, rng=None)[source]

Bootstrap stability of a clustering method.

Repeatedly draws sub-samples, fits the clusterer on each, and measures the mean pairwise ARI on the intersection of each pair of sub-samples.

Parameters:
  • X (ndarray[tuple[Any, ...], dtype[double]]) – Data array of shape (n_samples, n_features).

  • clusterer_factory (Callable[[], object]) – Zero-argument callable returning a fresh clusterer instance (must have a .fit(X) method returning a ClusterResult).

  • n_resamples (int) – Number of bootstrap resamples.

  • subsample_frac (float) – Fraction of data to draw per resample.

  • rng (Generator | None) – Optional random generator.

Return type:

float

Returns:

Mean pairwise ARI across resamples (higher = more stable).

geomexp.evaluation.variation_of_information(y_true, y_pred)[source]

Variation of Information between two partitions.

Defined as \(\mathrm{VI}(U, V) = H(U) + H(V) - 2\,I(U, V)\), using natural logs.

Parameters:
Return type:

float

Returns:

Non-negative VI, with 0 indicating identical partitions.