EpsilonDecreasing
Annealed epsilon-greedy. Effective exploration probability decreases as sample count accumulates: eps(t) = min(1, epsilon / totalSamples^decay).
Solves EpsilonGreedy's fixed-epsilon trade-off: explore aggressively early, then converge to mostly-greedy once the per-arm posteriors are well-separated. Theoretical defaults give decay = 0.5 (Auer et al.
for a
sqrt(T)regret bound; lowerdecaykeeps exploring longer, higherdecayconverges to greedy faster.
Constructors
Functions
Hook called when a new arm joins the population. Lets stateful policies fold the new arm's snapshot into their global counters (UCB's total-samples, UCB1Normal's arm count). Default no-op.
Allocate a fresh per-arm accumulator from the arm spec. Default delegates to arm.createStat(); override only if the policy needs a non-standard variant.
Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).
Hook called when an arm leaves the population. Inverse of addArm; lets stateful policies remove the departing arm's contribution from their global counters. Default no-op.
EpsilonDecreasing
addArm
Hook called when a new arm joins the population. Lets stateful policies fold the new arm's snapshot into their global counters (UCB's total-samples, UCB1Normal's arm count). Default no-op.
arm
decay
epsilon
evaluate
Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).
removeArm
Hook called when an arm leaves the population. Inverse of addArm; lets stateful policies remove the departing arm's contribution from their global counters. Default no-op.
update
Fold an observed reward value (with optional weight) into the per-arm stat. Default applies arm.encode first; policies with global counters (UCB families) override to update their counter alongside the stat update.