com.eignex.kumulant/schema/OptimizerSpec

OptimizerSpec

@Serializable

Wire-portable optimizer strategy. Sealed root of Sgd / Adagrad / Rmsprop / Adam; consumed by the online linear-model stats (com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat, com.eignex.kumulant.stat.regression.SoftmaxRegressionStat) to pick the per-coordinate update rule.

A single OptimizerSpec materialises into one live com.eignex.kumulant.stat.regression.Optimizer per stat. For multi-output stats like com.eignex.kumulant.stat.regression.SoftmaxRegressionStat, the stat creates one optimizer per output class; each gets its own per-coordinate aux state but they all share the same spec configuration.

Pick by need:

Sgd when the per-coordinate update rate is stable and you don't need adaptive learning rates. The cheapest path; stateless.
Adagrad when feature occurrence is sparse / power-law and rare features should learn faster than common ones. Per-coord adaptive rate.
Rmsprop when Adagrad's monotone-decreasing learning rate decays too aggressively. Exponential moving average of squared gradients.
Adam for the general-purpose default in modern online learning. Bias-corrected first / second moments; the closest thing to "just works on most problems."

Penalties (com.eignex.kumulant.stat.regression.glm.Penalty) attach to com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat only when paired with Sgd; the lazy-update tricks (Bottou multiplicative scaling for L2, cumulative truncated gradient for L1) are SGD-specific and don't extend cleanly to adaptive optimizers.

Inheritors

Functions

materialize

abstract fun materialize(featureSize: Int, concurrency: Concurrency = Concurrency.None): Optimizer(source)

Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).