geomexp package¶
Subpackages¶
- geomexp.clustering package
- Submodules
- geomexp.clustering.clustering_algorithms module
KMeansGeometricExpectileClusteringGeometricExpectileClustering.n_clustersGeometricExpectileClustering.index_radiusGeometricExpectileClustering.index_strategyGeometricExpectileClustering.tie_break_ruleGeometricExpectileClustering.empty_cluster_ruleGeometricExpectileClustering.n_initGeometricExpectileClustering.max_iterGeometricExpectileClustering.tolGeometricExpectileClustering.center_lrGeometricExpectileClustering.center_stepsGeometricExpectileClustering.random_stateGeometricExpectileClustering.fit()
- geomexp.clustering.clustering_base module
- geomexp.clustering.geometry module
- geomexp.clustering.index_strategies module
- geomexp.clustering.policies module
- Module contents
BaseClustererBestResponseIndexStrategyClusterResultClusterSpecificIndexStrategyCustomIndexStrategyEmptyClusterRuleEuclideanGeometryFarthestPointRuleGeometricExpectileClusteringGeometricExpectileClustering.n_clustersGeometricExpectileClustering.index_radiusGeometricExpectileClustering.index_strategyGeometricExpectileClustering.tie_break_ruleGeometricExpectileClustering.empty_cluster_ruleGeometricExpectileClustering.n_initGeometricExpectileClustering.max_iterGeometricExpectileClustering.tolGeometricExpectileClustering.center_lrGeometricExpectileClustering.center_stepsGeometricExpectileClustering.random_stateGeometricExpectileClustering.fit()
GlobalIndexStrategyGramGeometryHilbertGeometryIndexStrategyIterativeClustererKMeansLowestIndexTieBreakRandomReinitRuleRandomTieBreakTieBreakRuleWeightedEuclideanGeometryexpectile_loss()
- geomexp.evaluation package
- geomexp.kernels package
- Submodules
- geomexp.kernels.kernels module
KernelLinearKernelRBFKernelPolynomialKernelMaternKernelKernelKMeansKernelGeometricExpectileClusteringKernelGeometricExpectileClustering.n_clustersKernelGeometricExpectileClustering.kernelKernelGeometricExpectileClustering.index_radiusKernelGeometricExpectileClustering.n_initKernelGeometricExpectileClustering.max_iterKernelGeometricExpectileClustering.tolKernelGeometricExpectileClustering.center_lrKernelGeometricExpectileClustering.center_stepsKernelGeometricExpectileClustering.random_stateKernelGeometricExpectileClustering.fit()
- Module contents
KernelKernelGeometricExpectileClusteringKernelGeometricExpectileClustering.n_clustersKernelGeometricExpectileClustering.kernelKernelGeometricExpectileClustering.index_radiusKernelGeometricExpectileClustering.n_initKernelGeometricExpectileClustering.max_iterKernelGeometricExpectileClustering.tolKernelGeometricExpectileClustering.center_lrKernelGeometricExpectileClustering.center_stepsKernelGeometricExpectileClustering.random_stateKernelGeometricExpectileClustering.fit()
KernelKMeansLinearKernelMaternKernelPolynomialKernelRBFKernel
- geomexp.utils package
- geomexp.visualization package
- Submodules
- geomexp.visualization.visualization module
PlotStylePlotStyle.figsizePlotStyle.dpiPlotStyle.point_sizePlotStyle.center_sizePlotStyle.center_linewidthPlotStyle.alpha_dataPlotStyle.alpha_contourPlotStyle.color_palettePlotStyle.center_colorPlotStyle.center_markerPlotStyle.arrow_colorPlotStyle.arrow_linewidthPlotStyle.arrow_head_widthPlotStyle.arrow_head_lengthPlotStyle.arrow_scalePlotStyle.contour_colorPlotStyle.contour_linewidthPlotStyle.contour_alphaPlotStyle.curve_linewidthPlotStyle.curve_alphaPlotStyle.kde_colorPlotStyle.kde_linewidthPlotStyle.kde_alphaPlotStyle.loss_linewidthPlotStyle.decision_boundary_resolutionPlotStyle.decision_boundary_show_pointsPlotStyle.decision_boundary_show_contour_linesPlotStyle.decision_boundary_show_centersPlotStyle.decision_boundary_show_regionsPlotStyle.decision_boundary_colorPlotStyle.axis_linewidthPlotStyle.grid_alphaPlotStyle.grid_linewidthPlotStyle.fontsizePlotStyle.use_latexPlotStyle.figsizePlotStyle.dpiPlotStyle.point_sizePlotStyle.center_sizePlotStyle.center_linewidthPlotStyle.alpha_dataPlotStyle.alpha_contourPlotStyle.color_palettePlotStyle.center_colorPlotStyle.center_markerPlotStyle.arrow_colorPlotStyle.arrow_linewidthPlotStyle.arrow_head_widthPlotStyle.arrow_head_lengthPlotStyle.arrow_scalePlotStyle.contour_colorPlotStyle.contour_linewidthPlotStyle.contour_alphaPlotStyle.curve_linewidthPlotStyle.curve_alphaPlotStyle.kde_colorPlotStyle.kde_linewidthPlotStyle.kde_alphaPlotStyle.loss_linewidthPlotStyle.decision_boundary_resolutionPlotStyle.decision_boundary_show_pointsPlotStyle.decision_boundary_show_contour_linesPlotStyle.decision_boundary_show_centersPlotStyle.decision_boundary_show_regionsPlotStyle.decision_boundary_colorPlotStyle.axis_linewidthPlotStyle.grid_alphaPlotStyle.grid_linewidthPlotStyle.fontsizePlotStyle.use_latex
ClusterVisualizerClusterVisualizer.plot_cluster_assignments()ClusterVisualizer.plot_decision_boundaries()ClusterVisualizer.plot_density_contours()ClusterVisualizer.plot_density_with_boundaries()ClusterVisualizer.plot_expectile_curves()ClusterVisualizer.plot_loss_comparison()ClusterVisualizer.create_comparison_figure()ClusterVisualizer.save_figure()
- Module contents
ClusterVisualizerClusterVisualizer.plot_cluster_assignments()ClusterVisualizer.plot_decision_boundaries()ClusterVisualizer.plot_density_contours()ClusterVisualizer.plot_density_with_boundaries()ClusterVisualizer.plot_expectile_curves()ClusterVisualizer.plot_loss_comparison()ClusterVisualizer.create_comparison_figure()ClusterVisualizer.save_figure()
PlotStylePlotStyle.figsizePlotStyle.dpiPlotStyle.point_sizePlotStyle.center_sizePlotStyle.center_linewidthPlotStyle.alpha_dataPlotStyle.alpha_contourPlotStyle.color_palettePlotStyle.center_colorPlotStyle.center_markerPlotStyle.arrow_colorPlotStyle.arrow_linewidthPlotStyle.arrow_head_widthPlotStyle.arrow_head_lengthPlotStyle.arrow_scalePlotStyle.contour_colorPlotStyle.contour_linewidthPlotStyle.contour_alphaPlotStyle.curve_linewidthPlotStyle.curve_alphaPlotStyle.kde_colorPlotStyle.kde_linewidthPlotStyle.kde_alphaPlotStyle.loss_linewidthPlotStyle.decision_boundary_resolutionPlotStyle.decision_boundary_show_pointsPlotStyle.decision_boundary_show_contour_linesPlotStyle.decision_boundary_show_centersPlotStyle.decision_boundary_show_regionsPlotStyle.decision_boundary_colorPlotStyle.axis_linewidthPlotStyle.grid_alphaPlotStyle.grid_linewidthPlotStyle.fontsizePlotStyle.use_latexPlotStyle.figsizePlotStyle.dpiPlotStyle.point_sizePlotStyle.center_sizePlotStyle.center_linewidthPlotStyle.alpha_dataPlotStyle.alpha_contourPlotStyle.color_palettePlotStyle.center_colorPlotStyle.center_markerPlotStyle.arrow_colorPlotStyle.arrow_linewidthPlotStyle.arrow_head_widthPlotStyle.arrow_head_lengthPlotStyle.arrow_scalePlotStyle.contour_colorPlotStyle.contour_linewidthPlotStyle.contour_alphaPlotStyle.curve_linewidthPlotStyle.curve_alphaPlotStyle.kde_colorPlotStyle.kde_linewidthPlotStyle.kde_alphaPlotStyle.loss_linewidthPlotStyle.decision_boundary_resolutionPlotStyle.decision_boundary_show_pointsPlotStyle.decision_boundary_show_contour_linesPlotStyle.decision_boundary_show_centersPlotStyle.decision_boundary_show_regionsPlotStyle.decision_boundary_colorPlotStyle.axis_linewidthPlotStyle.grid_alphaPlotStyle.grid_linewidthPlotStyle.fontsizePlotStyle.use_latex
Module contents¶
Geometric Expectile Clustering package.
- class geomexp.BaseClusterer(n_clusters, max_iter=100, tol=1e-07, random_state=None)[source]¶
Bases:
ABCAbstract base class for clustering algorithms.
Defines the interface that all clustering algorithms must implement. Subclasses provide
_initialize,_fit_iteration,_compute_objective, and_extract_result.The
fitmethod contains the outer optimisation loop with convergence checking based on objective change and an optional partition stability hook.- n_clusters¶
Number of clusters to form.
- max_iter¶
Maximum number of iterations to perform.
- tol¶
Convergence tolerance for objective function change.
- random_state¶
Random seed for reproducibility.
- fit(X)[source]¶
Fit the clustering algorithm to data.
The outer loop follows Algorithm 1 from the thesis: iterate until the objective decrease falls below
tol, an additional convergence criterion is met, ormax_iteriterations are reached.- Parameters:
X (
ndarray) – Data array of shape(n_samples, n_features).- Return type:
- Returns:
ClusterResult containing assignments, centers, and convergence info.
- Raises:
ValueError – If
Xhas fewer samples thann_clustersor invalid shape.
- class geomexp.BestResponseIndexStrategy(geometry=None)[source]¶
Bases:
IndexStrategyAdaptive best-response index vectors (equation 2.2).
Each index vector is set to minimise the cluster-wise risk for fixed partition and centroids. The closed-form solution in a Hilbert space \(H\) is:
\[u_k^* = -r \frac{s_k}{\|s_k\|_H}, \quad s_k = \sum_{i : X_i \in C_k} \|X_i - c_k\|_H \, (X_i - c_k).\]This captures the length-weighted residual direction (skew) of the cluster. When \(s_k = 0\), the current index is retained.
- Parameters:
geometry (HilbertGeometry | None)
- initialize(n_clusters, n_features, index_radius, rng)[source]¶
Create initial index vectors.
- Parameters:
n_clusters (
int) – Number of clusters.n_features (
int) – Dimensionality of the data.index_radius (
float) – Radius constraint \(r\) with \(\|u_k\|_H = r\).rng (
RandomState) – Random state for stochastic initialization.
- Return type:
- Returns:
Array of shape
(n_clusters, n_features).
- update(X, assignments, centers, indices, index_radius)[source]¶
Update index vectors given the current clustering state.
- Parameters:
X (
ndarray) – Data array of shape(n_samples, n_features).assignments (
ndarray) – Cluster assignments of shape(n_samples,).centers (
ndarray) – Cluster centers of shape(n_clusters, n_features).indices (
ndarray) – Current index vectors of shape(n_clusters, n_features).index_radius (
float) – Radius constraint \(r\).
- Return type:
- Returns:
Updated index vectors of shape
(n_clusters, n_features).
- class geomexp.ClusterResult(assignments, centers, objective, n_iterations, converged=True, metadata=None)[source]¶
Bases:
objectContainer for clustering results.
- Parameters:
- assignments¶
Array of shape
(n_samples,)containing cluster indices for each point.
- centers¶
Array of shape
(n_clusters, n_features)containing cluster centers.
- objective¶
Final objective function value achieved by the algorithm.
- n_iterations¶
Number of iterations performed until convergence.
- converged¶
Whether the algorithm converged within
max_iteriterations.
- metadata¶
Optional dictionary containing algorithm-specific information.
- class geomexp.ClusterSpecificIndexStrategy(directions, geometry=None)[source]¶
Bases:
IndexStrategyFixed per-cluster index vectors.
Distinct fixed directions are prescribed for different clusters, each normalised to \(\|u_k\|_H = r\). Appropriate when prior information about cluster-specific asymmetry directions is available.
Zero-norm rows in the provided directions result in \(u_k = 0\).
- Parameters:
directions (np.ndarray)
geometry (HilbertGeometry | None)
- initialize(n_clusters, n_features, index_radius, rng)[source]¶
Create initial index vectors.
- Parameters:
n_clusters (
int) – Number of clusters.n_features (
int) – Dimensionality of the data.index_radius (
float) – Radius constraint \(r\) with \(\|u_k\|_H = r\).rng (
RandomState) – Random state for stochastic initialization.
- Return type:
- Returns:
Array of shape
(n_clusters, n_features).
- update(X, assignments, centers, indices, index_radius)[source]¶
Update index vectors given the current clustering state.
- Parameters:
X (
ndarray) – Data array of shape(n_samples, n_features).assignments (
ndarray) – Cluster assignments of shape(n_samples,).centers (
ndarray) – Cluster centers of shape(n_clusters, n_features).indices (
ndarray) – Current index vectors of shape(n_clusters, n_features).index_radius (
float) – Radius constraint \(r\).
- Return type:
- Returns:
Updated index vectors of shape
(n_clusters, n_features).
- class geomexp.CustomIndexStrategy(update_fn, init_fn=None)[source]¶
Bases:
IndexStrategyUser-provided callable for custom index vector updates.
Allows full flexibility by accepting arbitrary callables for initialisation and update of index vectors.
The update callable should have the signature:
def update_fn( X: np.ndarray, # (n_samples, n_features) assignments: np.ndarray, # (n_samples,) centers: np.ndarray, # (n_clusters, n_features) indices: np.ndarray, # (n_clusters, n_features) index_radius: float, ) -> np.ndarray: # (n_clusters, n_features)
The optional init callable should have the signature:
def init_fn( n_clusters: int, n_features: int, index_radius: float, rng: np.random.RandomState, ) -> np.ndarray: # (n_clusters, n_features)
- Parameters:
update_fn (Callable[..., np.ndarray])
init_fn (Callable[..., np.ndarray] | None)
- initialize(n_clusters, n_features, index_radius, rng)[source]¶
Create initial index vectors.
- Parameters:
n_clusters (
int) – Number of clusters.n_features (
int) – Dimensionality of the data.index_radius (
float) – Radius constraint \(r\) with \(\|u_k\|_H = r\).rng (
RandomState) – Random state for stochastic initialization.
- Return type:
- Returns:
Array of shape
(n_clusters, n_features).
- update(X, assignments, centers, indices, index_radius)[source]¶
Update index vectors given the current clustering state.
- Parameters:
X (
ndarray) – Data array of shape(n_samples, n_features).assignments (
ndarray) – Cluster assignments of shape(n_samples,).centers (
ndarray) – Cluster centers of shape(n_clusters, n_features).indices (
ndarray) – Current index vectors of shape(n_clusters, n_features).index_radius (
float) – Radius constraint \(r\).
- Return type:
- Returns:
Updated index vectors of shape
(n_clusters, n_features).
- class geomexp.EuclideanGeometry[source]¶
Bases:
HilbertGeometryStandard Euclidean geometry: \(\langle a, b \rangle = \sum_i a_i b_i\).
Equivalent to identity weights. This is the default geometry when none is specified.
- class geomexp.GeometricExpectileClustering(n_clusters, index_radius=0.5, index_strategy=None, tie_break_rule=None, empty_cluster_rule=None, geometry=None, n_init=10, max_iter=100, tol=1e-07, center_lr=0.1, center_steps=50, random_state=None)[source]¶
Bases:
BaseClustererGeometric expectile clustering with directional asymmetry (Algorithm 1).
A Lloyd-type block coordinate descent algorithm that minimises the empirical risk based on the geometric expectile loss:
\[\hat{\mathcal{R}}(\Pi, C, U) = \frac{1}{n} \sum_{k=1}^K \sum_{i=1}^n \ell_{u_k}(X_i - c_k)\,\mathbb{1}_{\{X_i \in C_k\}}, \quad \|u_k\| = r \;\forall k.\]Each iteration consists of:
Assignment – assign each point to the cluster minimising \(\ell_{u_k}(X_i - c_k)\).
Empty-cluster handling – reinitialise empty clusters, then re-assign.
Centroid update – for each cluster, minimise the cluster-wise risk over \(c\) via gradient descent.
Index update – update \(U\) via the chosen
IndexStrategy.
Setting \(r = 0\) recovers standard K-means.
- Parameters:
n_clusters (int)
index_radius (float)
index_strategy (IndexStrategy | None)
tie_break_rule (TieBreakRule | None)
empty_cluster_rule (EmptyClusterRule | None)
geometry (HilbertGeometry | None)
n_init (int)
max_iter (int)
tol (float)
center_lr (float)
center_steps (int)
random_state (int | None)
- n_clusters¶
Number of clusters.
- index_radius¶
Radius constraint \(r \in [0, 1)\).
- index_strategy¶
Strategy for index vector selection.
- tie_break_rule¶
Rule for breaking ties in assignment.
- empty_cluster_rule¶
Policy for handling empty clusters.
- n_init¶
Number of random restarts (best result is kept).
- max_iter¶
Maximum number of outer iterations.
- tol¶
Convergence tolerance on objective change.
- center_lr¶
Learning rate for gradient descent on centers.
- center_steps¶
Number of gradient steps per centroid update.
- random_state¶
Random seed.
Example
>>> import numpy as np >>> X = np.random.randn(100, 2) >>> model = GeometricExpectileClustering(n_clusters=3, index_radius=0.5) >>> result = model.fit(X) >>> print(result.metadata["indices"].shape) (3, 2)
- fit(X)[source]¶
Fit geometric expectile clustering with multiple random restarts.
Runs
n_initindependent trials with different random seeds and returns the result with the lowest objective.- Parameters:
X (
ndarray) – Data array of shape(n_samples, n_features).- Return type:
- Returns:
ClusterResult from the best restart.
- class geomexp.GlobalIndexStrategy(direction, geometry=None)[source]¶
Bases:
IndexStrategyFixed single global index vector shared by all clusters.
Sets \(u_1 = \cdots = u_K = u\) for a given direction \(u\). Appropriate when a meaningful asymmetry direction is known a priori; one implicitly assumes a common anisotropy mechanism across clusters.
The provided direction is normalised to have Hilbert norm equal to \(r\).
- Parameters:
direction (np.ndarray)
geometry (HilbertGeometry | None)
- initialize(n_clusters, n_features, index_radius, rng)[source]¶
Create initial index vectors.
- Parameters:
n_clusters (
int) – Number of clusters.n_features (
int) – Dimensionality of the data.index_radius (
float) – Radius constraint \(r\) with \(\|u_k\|_H = r\).rng (
RandomState) – Random state for stochastic initialization.
- Return type:
- Returns:
Array of shape
(n_clusters, n_features).
- update(X, assignments, centers, indices, index_radius)[source]¶
Update index vectors given the current clustering state.
- Parameters:
X (
ndarray) – Data array of shape(n_samples, n_features).assignments (
ndarray) – Cluster assignments of shape(n_samples,).centers (
ndarray) – Cluster centers of shape(n_clusters, n_features).indices (
ndarray) – Current index vectors of shape(n_clusters, n_features).index_radius (
float) – Radius constraint \(r\).
- Return type:
- Returns:
Updated index vectors of shape
(n_clusters, n_features).
- class geomexp.GramGeometry(gram_matrix)[source]¶
Bases:
HilbertGeometryGeneral Gram-matrix geometry: \(\langle a, b \rangle_G = a^\top G\, b\).
Suitable for basis coefficient representations where \(G_{ij} = \langle \varphi_i, \varphi_j \rangle\).
- Parameters:
gram_matrix (
ndarray) – Symmetric positive-definite matrix of shape(d, d).
- class geomexp.HilbertGeometry[source]¶
Bases:
ABCAbstract Hilbert space geometry defining inner product and norm.
All clustering geometry (assignment costs, center gradients, index strategies) flows through this interface, ensuring that no part of the algorithm accidentally falls back to Euclidean operations.
- abstractmethod inner(a, b)[source]¶
Compute the inner product \(\langle a, b \rangle_H\) along the last axis.
- class geomexp.IndexStrategy[source]¶
Bases:
ABCAbstract base class for index vector selection strategies.
Index vectors \(u_k\) with \(\|u_k\|_H = r\) determine the directional asymmetry of the geometric expectile loss. Different strategies offer different trade-offs between flexibility and the need for prior knowledge.
- abstractmethod initialize(n_clusters, n_features, index_radius, rng)[source]¶
Create initial index vectors.
- Parameters:
n_clusters (
int) – Number of clusters.n_features (
int) – Dimensionality of the data.index_radius (
float) – Radius constraint \(r\) with \(\|u_k\|_H = r\).rng (
RandomState) – Random state for stochastic initialization.
- Return type:
- Returns:
Array of shape
(n_clusters, n_features).
- abstractmethod update(X, assignments, centers, indices, index_radius)[source]¶
Update index vectors given the current clustering state.
- Parameters:
X (
ndarray) – Data array of shape(n_samples, n_features).assignments (
ndarray) – Cluster assignments of shape(n_samples,).centers (
ndarray) – Cluster centers of shape(n_clusters, n_features).indices (
ndarray) – Current index vectors of shape(n_clusters, n_features).index_radius (
float) – Radius constraint \(r\).
- Return type:
- Returns:
Updated index vectors of shape
(n_clusters, n_features).
- class geomexp.KMeans(n_clusters, max_iter=100, tol=0.0001, random_state=None)[source]¶
Bases:
IterativeClustererK-means clustering algorithm.
Partitions data into \(K\) clusters by minimising the within-cluster sum of squared distances via Lloyd’s alternating minimisation:
\[J = \sum_{k=1}^K \sum_{i \in C_k} \|x_i - m_k\|^2\]where \(C_k\) denotes the set of points assigned to cluster \(k\), and \(m_k\) is the cluster mean.
- n_clusters¶
Number of clusters to form.
- max_iter¶
Maximum number of iterations.
- tol¶
Convergence tolerance on objective change.
- random_state¶
Random seed for reproducibility.
Example
>>> import numpy as np >>> X = np.random.randn(100, 2) >>> kmeans = KMeans(n_clusters=3, random_state=42) >>> result = kmeans.fit(X) >>> print(result.assignments.shape) (100,)
- class geomexp.Kernel[source]¶
Bases:
ABCAbstract base class for kernel functions.
A kernel \(k : \mathcal{X} \times \mathcal{X} \to \mathbb{R}\) implicitly defines a feature map \(\varphi\) into a reproducing kernel Hilbert space \(\mathcal{H}\) such that \(k(x, y) = \langle \varphi(x), \varphi(y) \rangle_{\mathcal{H}}\).
- class geomexp.KernelGeometricExpectileClustering(n_clusters, kernel, index_radius=0.5, n_init=10, max_iter=100, tol=1e-07, center_lr=0.1, center_steps=50, random_state=None)[source]¶
Bases:
BaseClustererKernelised geometric expectile clustering (Algorithm 2).
In the feature space \(\mathcal{H}\) induced by a kernel \(k\), centroids and index vectors lie in the span of the data (Proposition 3.1):
\[c_k = \sum_{j=1}^n \beta_{kj}\,\varphi(X_j), \quad u_k = \sum_{j=1}^n \alpha_{kj}\,\varphi(X_j).\]All norms and inner products reduce to operations on the Gram matrix \(K_{ij} = k(X_i, X_j)\):
Squared residual norm: \(d_{ik}^2 = K_{ii} - 2(K\beta_k)_i + \beta_k^\top K \beta_k\)
Index-residual inner product: \(p_{ik} = (K\alpha_k)_i - \alpha_k^\top K \beta_k\)
Setting \(r = 0\) recovers kernel K-means.
- Parameters:
- n_clusters¶
Number of clusters.
- kernel¶
Kernel function.
- index_radius¶
Radius constraint \(r \in [0, 1)\).
- n_init¶
Number of random restarts (best result is kept).
- max_iter¶
Maximum number of outer iterations.
- tol¶
Convergence tolerance on objective change.
- center_lr¶
Learning rate for gradient descent on centroid weights.
- center_steps¶
Number of gradient steps per centroid update.
- random_state¶
Random seed.
Example
>>> import numpy as np >>> from geomexp.kernels import KernelGeometricExpectileClustering, RBFKernel >>> X = np.random.randn(100, 2) >>> model = KernelGeometricExpectileClustering( ... n_clusters=3, kernel=RBFKernel(gamma=0.5), index_radius=0.5 ... ) >>> result = model.fit(X) >>> print(result.metadata["index_weights"].shape) (3, 100)
- class geomexp.KernelKMeans(n_clusters, kernel, n_init=10, max_iter=100, tol=1e-07, random_state=None)[source]¶
Bases:
BaseClustererKernel K-means clustering (dual representation).
Minimises the within-cluster sum of squared distances in the feature space \(\mathcal{H}\) induced by a kernel \(k\):
\[J = \sum_{k=1}^K \sum_{i \in C_k} \|\varphi(X_i) - c_k\|_{\mathcal{H}}^2\]where \(c_k = \frac{1}{|C_k|} \sum_{j \in C_k} \varphi(X_j)\) is the cluster centroid in feature space. Since \(\varphi\) may not have an explicit form, all computations are expressed through the Gram matrix \(K_{ij} = k(X_i, X_j)\):
\[\|\varphi(X_i) - c_k\|^2 = K_{ii} - \frac{2}{|C_k|} \sum_{j \in C_k} K_{ij} + \frac{1}{|C_k|^2} \sum_{j,l \in C_k} K_{jl}\]With a linear kernel this recovers standard K-means.
- Parameters:
- n_clusters¶
Number of clusters.
- kernel¶
Kernel function.
- n_init¶
Number of random restarts (best result is kept).
- max_iter¶
Maximum number of iterations.
- tol¶
Convergence tolerance.
- random_state¶
Random seed.
Example
>>> import numpy as np >>> from geomexp.kernels import KernelKMeans, RBFKernel >>> X = np.random.randn(100, 2) >>> model = KernelKMeans(n_clusters=3, kernel=RBFKernel(gamma=0.5)) >>> result = model.fit(X) >>> print(result.assignments.shape) (100,)
- class geomexp.LinearKernel[source]¶
Bases:
KernelLinear (inner product) kernel: \(k(x, y) = \langle x, y \rangle\).
- class geomexp.MaternKernel(nu=1.5, lengthscale=1)[source]¶
Bases:
KernelMatern kernel implementing the Matern covariance function.
\[k(x, y) = \frac{2^{1-\nu}}{\Gamma(\nu)} \left(\frac{\|x - y\|}{\ell}\right)^{\nu} K_{\nu}\!\left(\frac{\|x - y\|}{\ell}\right)\]where \(K_{\nu}\) is the modified Bessel function of the second kind. Special cases: \(\nu = 0.5\) gives the exponential kernel, \(\nu = 1.5\) and \(\nu = 2.5\) have closed-form expressions, and \(\nu \to \infty\) recovers the RBF (squared-exponential) kernel.
- nu¶
Smoothness parameter \(\nu > 0\).
- lengthscale¶
Lengthscale parameter \(\ell > 0\).
- class geomexp.PolynomialKernel(degree=3, coef0=1)[source]¶
Bases:
KernelPolynomial kernel.
\[k(x, y) = (\langle x, y \rangle + c)^p\]- degree¶
Polynomial degree \(p\).
- coef0¶
Additive constant \(c\).
- class geomexp.RBFKernel(gamma=1)[source]¶
Bases:
KernelRadial basis function (Gaussian) kernel.
\[k(x, y) = \exp\!\bigl(-\gamma \|x - y\|^2\bigr)\]- Parameters:
gamma (float)
- gamma¶
Bandwidth parameter \(\gamma > 0\).
- class geomexp.WeightedEuclideanGeometry(weights)[source]¶
Bases:
HilbertGeometryDiagonal-weighted geometry: \(\langle a, b \rangle_w = \sum_i w_i\, a_i\, b_i\).
Suitable for functional data discretised on a grid with quadrature weights \(w_i\), approximating
\[\langle f, g \rangle_{L^2} = \int f(t)\, g(t)\, \mathrm{d}t \;\approx\; \sum_i w_i\, f(t_i)\, g(t_i).\]- Parameters:
weights (
ndarray) – Positive quadrature weight array of shape(d,).
- geomexp.expectile_loss(residuals, index_vectors, geometry=None)[source]¶
Compute the geometric expectile loss.
The loss function is:
\[\ell_u(t) = \frac{1}{2}\bigl(\|t\|_H^2 + \|t\|_H\,\langle u, t \rangle_H\bigr)\]for \((t, u) \in \mathcal{X} \times \mathcal{B}\), as introduced by Herrmann et al. (2018). When
geometryisNonethe standard Euclidean inner product is used.- Parameters:
residuals (
ndarray) – Residual vectors \(t = x - c\), shape(n, d)or(d,).index_vectors (
ndarray) – Index vector \(u\), broadcastable toresiduals. Shape(d,)for a single shared index, or(n, d)for per-sample indices.geometry (
HilbertGeometry|None) – Hilbert space geometry.None(default) uses Euclidean.
- Return type:
- Returns:
Loss values of shape
(n,)(or scalar if input was 1-D).