kumulant

EpsilonDecreasing

class EpsilonDecreasing(val epsilon: Double = 2.0, val decay: Double = 0.5, priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult> (source)

Annealed epsilon-greedy. Effective exploration probability decreases as sample count accumulates: eps(t) = min(1, epsilon / totalSamples^decay).

Solves EpsilonGreedy's fixed-epsilon trade-off: explore aggressively early, then converge to mostly-greedy once the per-arm posteriors are well-separated. Theoretical defaults give decay = 0.5 (Auer et al.

  1. for a sqrt(T) regret bound; lower decay keeps exploring longer, higher decay converges to greedy faster.

Constructors

Link copied to clipboard
constructor(epsilon: Double = 2.0, decay: Double = 0.5, priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02)

Properties

Link copied to clipboard
open override val arm: NormalArm

Per-arm cumulator spec; determines the prior pseudo-counts, value encoding, and result shape that evaluate consumes.

Link copied to clipboard

Decay exponent applied to the running sample count.

Link copied to clipboard

Initial exploration scale; effective epsilon decays as samples accumulate.

Functions

Link copied to clipboard
open override fun addArm(snapshot: WeightedVarianceResult)

Hook called when a new arm joins the population. Lets stateful policies fold the new arm's snapshot into their global counters (UCB's total-samples, UCB1Normal's arm count). Default no-op.

Link copied to clipboard

Allocate a fresh per-arm accumulator from the arm spec. Default delegates to arm.createStat(); override only if the policy needs a non-standard variant.

Link copied to clipboard
open override fun evaluate(snapshot: WeightedVarianceResult, step: Long, rng: Random): Double

Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).

Link copied to clipboard
open override fun removeArm(snapshot: WeightedVarianceResult)

Hook called when an arm leaves the population. Inverse of addArm; lets stateful policies remove the departing arm's contribution from their global counters. Default no-op.

Link copied to clipboard
open override fun update(stat: SeriesStat<WeightedVarianceResult>, value: Double, weight: Double = 1.0)

Fold an observed reward value (with optional weight) into the per-arm stat. Default applies arm.encode first; policies with global counters (UCB families) override to update their counter alongside the stat update.

EpsilonDecreasing

constructor(epsilon: Double = 2.0, decay: Double = 0.5, priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02)(source)

addArm

open override fun addArm(snapshot: WeightedVarianceResult)(source)

Hook called when a new arm joins the population. Lets stateful policies fold the new arm's snapshot into their global counters (UCB's total-samples, UCB1Normal's arm count). Default no-op.

arm

open override val arm: NormalArm(source)

Per-arm cumulator spec; determines the prior pseudo-counts, value encoding, and result shape that evaluate consumes.

decay

Decay exponent applied to the running sample count.

epsilon

Initial exploration scale; effective epsilon decays as samples accumulate.

evaluate

open override fun evaluate(snapshot: WeightedVarianceResult, step: Long, rng: Random): Double(source)

Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).

removeArm

open override fun removeArm(snapshot: WeightedVarianceResult)(source)

Hook called when an arm leaves the population. Inverse of addArm; lets stateful policies remove the departing arm's contribution from their global counters. Default no-op.

update

open override fun update(stat: SeriesStat<WeightedVarianceResult>, value: Double, weight: Double = 1.0)(source)

Fold an observed reward value (with optional weight) into the per-arm stat. Default applies arm.encode first; policies with global counters (UCB families) override to update their counter alongside the stat update.