Adam

@Serializable

@SerialName(value = "Adam")

data class Adam(val learningRate: ScalarExpr = ConstantRate(0.001), val beta1: Double = 0.9, val beta2: Double = 0.999, val epsilon: Double = 1.0E-8) : OptimizerSpec(source)

Adam. Bias-corrected first and second moments per coordinate (Kingma & Ba 2015); the general-purpose default in modern online learning. Per-coordinate update:

m[i] = beta1 * m[i] + (1 - beta1) * grad[i]
v[i] = beta2 * v[i] + (1 - beta2) * grad[i]^2
mHat = m[i] / (1 - beta1^t)
vHat = v[i] / (1 - beta2^t)
w[i] -= lr * mHat / (sqrt(vHat) + epsilon)

Reach for Adam as the default. Defaults of beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8 are the standard published values; the only knob most callers touch is learningRate.

Memory cost is two state arrays of featureSize doubles per optimizer; heavier than Sgd / Adagrad / Rmsprop but typically negligible relative to the parameter vector itself.

Constructors

Adam

constructor(learningRate: ScalarExpr = ConstantRate(0.001), beta1: Double = 0.9, beta2: Double = 0.999, epsilon: Double = 1.0E-8)(source)

Functions

materialize

open override fun materialize(featureSize: Int, concurrency: Concurrency = Concurrency.None): Optimizer(source)

Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).

Adam

Constructors

Adam

Properties

beta1

beta2

epsilon

learningRate

Functions

materialize