Rmsprop
RMSProp. Per-coordinate adaptive learning rate via an exponential moving average of squared gradients: the same shape as Adagrad but with a sliding window instead of a monotone accumulator.
Reach for Rmsprop when Adagrad's effective learning rate decays faster than you want; non-stationary streams, online problems where the data distribution drifts over the lifetime of the optimizer. rho near 1 gives a long memory (close to Adagrad); rho near 0 gives a short memory.
Constructors
Properties
Functions
Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).
Rmsprop
epsilon
learningRate
Base learning rate, multiplied by the per-coord 1 / sqrt(emaG2 + eps) factor.
materialize
Build a live optimizer instance over featureSize coordinates at the requested Concurrency. Each call returns a fresh optimizer with empty aux state; stats call this for each weight vector they want to track (one per output class for com.eignex.kumulant.stat.regression.SoftmaxRegressionStat).