Sgd

@Serializable

@SerialName(value = "Sgd")

data class Sgd(val learningRate: ScalarExpr = ConstantRate(1e-3)) : OptimizerSpec(source)

Plain stochastic gradient descent. The default and the cheapest entry; stateless apart from the global step counter feeding the learning-rate schedule. Per-coordinate update: w[i] -= lr(step) * weight * grad[i].

Reach for Sgd when:

You're using com.eignex.kumulant.stat.regression.glm.Penalty.L1 or com.eignex.kumulant.stat.regression.glm.Penalty.L2 (other optimizers don't support penalties).
The per-coordinate gradient scale is roughly uniform across features (no power-law sparsity).
Convergence speed matters less than memory: Sgd's aux state is one global step counter.

Constructors

Sgd

constructor(learningRate: ScalarExpr = ConstantRate(1e-3))(source)

Properties

learningRate

val learningRate: ScalarExpr(source)

Per-step learning-rate schedule. The expression is evaluated with the step counter as its x input; standard schedules (com.eignex.kumulant.stat.regression.glm.ConstantRate, com.eignex.kumulant.stat.regression.glm.StepDecay, com.eignex.kumulant.stat.regression.glm.ExponentialDecay) live in the GLM package. Any ScalarExpr works; 1.0 / (1.0 + Const(0.01) * X) for an inverse-time decay, for instance.

Functions

materialize

open override fun materialize(featureSize: Int, concurrency: Concurrency = Concurrency.None): Optimizer(source)

Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).