kumulant

Rmsprop

@Serializable
@SerialName(value = "Rmsprop")
data class Rmsprop(val learningRate: ScalarExpr = ConstantRate(0.01), val rho: Double = 0.9, val epsilon: Double = 1.0E-8) : OptimizerSpec(source)

RMSProp. Per-coordinate adaptive learning rate via an exponential moving average of squared gradients: the same shape as Adagrad but with a sliding window instead of a monotone accumulator.

Reach for Rmsprop when Adagrad's effective learning rate decays faster than you want; non-stationary streams, online problems where the data distribution drifts over the lifetime of the optimizer. rho near 1 gives a long memory (close to Adagrad); rho near 0 gives a short memory.

Constructors

Link copied to clipboard
constructor(learningRate: ScalarExpr = ConstantRate(0.01), rho: Double = 0.9, epsilon: Double = 1.0E-8)

Properties

Link copied to clipboard

Numerical-stability epsilon added under the square root.

Link copied to clipboard

Base learning rate, multiplied by the per-coord 1 / sqrt(emaG2 + eps) factor.

Link copied to clipboard
val rho: Double

EMA decay for the squared gradient; the memory horizon is roughly 1 / (1 - rho).

Functions

Link copied to clipboard
open override fun materialize(featureSize: Int, concurrency: Concurrency = Concurrency.None): Optimizer

Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).

Rmsprop

constructor(learningRate: ScalarExpr = ConstantRate(0.01), rho: Double = 0.9, epsilon: Double = 1.0E-8)(source)

epsilon

Numerical-stability epsilon added under the square root.

learningRate

Base learning rate, multiplied by the per-coord 1 / sqrt(emaG2 + eps) factor.

materialize

open override fun materialize(featureSize: Int, concurrency: Concurrency = Concurrency.None): Optimizer(source)

Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).

rho

EMA decay for the squared gradient; the memory horizon is roughly 1 / (1 - rho).