Adam
Adam. Bias-corrected first and second moments per coordinate (Kingma & Ba 2015); the general-purpose default in modern online learning. Per-coordinate update:
m[i] = beta1 * m[i] + (1 - beta1) * grad[i]<br>v[i] = beta2 * v[i] + (1 - beta2) * grad[i]^2<br>mHat = m[i] / (1 - beta1^t)<br>vHat = v[i] / (1 - beta2^t)<br>w[i] -= lr * mHat / (sqrt(vHat) + epsilon)Reach for Adam as the default. Defaults of beta1 = 0.9, beta2 = 0.999, epsilon = 1e-8 are the standard published values; the only knob most callers touch is learningRate.
Memory cost is two state arrays of featureSize doubles per optimizer; heavier than Sgd / Adagrad / Rmsprop but typically negligible relative to the parameter vector itself.
Constructors
Properties
Functions
Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).
Adam
beta1
beta2
epsilon
learningRate
Base learning rate applied after bias-corrected moment normalisation.
materialize
Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).