Sgd
Plain stochastic gradient descent. The default and the cheapest entry; stateless apart from the global step counter feeding the learning-rate schedule. Per-coordinate update: w[i] -= lr(step) * weight * grad[i].
Reach for Sgd when:
You're using com.eignex.kumulant.stat.regression.glm.Penalty.L1 or com.eignex.kumulant.stat.regression.glm.Penalty.L2 (other optimizers don't support penalties).
The per-coordinate gradient scale is roughly uniform across features (no power-law sparsity).
Convergence speed matters less than memory: Sgd's aux state is one global step counter.
Properties
Per-step learning-rate schedule. The expression is evaluated with the step counter as its x input; standard schedules (com.eignex.kumulant.stat.regression.glm.ConstantRate, com.eignex.kumulant.stat.regression.glm.StepDecay, com.eignex.kumulant.stat.regression.glm.ExponentialDecay) live in the GLM package. Any ScalarExpr works; 1.0 / (1.0 + Const(0.01) * X) for an inverse-time decay, for instance.
Functions
Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).
Sgd
learningRate
Per-step learning-rate schedule. The expression is evaluated with the step counter as its x input; standard schedules (com.eignex.kumulant.stat.regression.glm.ConstantRate, com.eignex.kumulant.stat.regression.glm.StepDecay, com.eignex.kumulant.stat.regression.glm.ExponentialDecay) live in the GLM package. Any ScalarExpr works; 1.0 / (1.0 + Const(0.01) * X) for an inverse-time decay, for instance.
materialize
Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).