kumulant

Adam

@Serializable
@SerialName(value = "Adam")
data class Adam(val learningRate: ScalarExpr = ConstantRate(0.001), val beta1: Double = 0.9, val beta2: Double = 0.999, val epsilon: Double = 1.0E-8) : OptimizerSpec(source)

Adam. Bias-corrected first and second moments per coordinate (Kingma & Ba 2015); the general-purpose default in modern online learning. Per-coordinate update:

m[i] = beta1 * m[i] + (1 - beta1) * grad[i]<br>v[i] = beta2 * v[i] + (1 - beta2) * grad[i]^2<br>mHat = m[i] / (1 - beta1^t)<br>vHat = v[i] / (1 - beta2^t)<br>w[i] -= lr * mHat / (sqrt(vHat) + epsilon)

Reach for Adam as the default. Defaults of beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8 are the standard published values; the only knob most callers touch is learningRate.

Memory cost is two state arrays of featureSize doubles per optimizer; heavier than Sgd / Adagrad / Rmsprop but typically negligible relative to the parameter vector itself.

Constructors

Link copied to clipboard
constructor(learningRate: ScalarExpr = ConstantRate(0.001), beta1: Double = 0.9, beta2: Double = 0.999, epsilon: Double = 1.0E-8)

Properties

Link copied to clipboard

First-moment EMA decay; standard published value.

Link copied to clipboard

Second-moment EMA decay; standard published value.

Link copied to clipboard

Numerical-stability epsilon added under the square root.

Link copied to clipboard

Base learning rate applied after bias-corrected moment normalisation.

Functions

Link copied to clipboard
open override fun materialize(featureSize: Int, concurrency: Concurrency = Concurrency.None): Optimizer

Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).

Adam

constructor(learningRate: ScalarExpr = ConstantRate(0.001), beta1: Double = 0.9, beta2: Double = 0.999, epsilon: Double = 1.0E-8)(source)

beta1

First-moment EMA decay; standard published value.

beta2

Second-moment EMA decay; standard published value.

epsilon

Numerical-stability epsilon added under the square root.

learningRate

Base learning rate applied after bias-corrected moment normalisation.

materialize

open override fun materialize(featureSize: Int, concurrency: Concurrency = Concurrency.None): Optimizer(source)

Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).