EpsilonGreedy
Epsilon-greedy: with probability epsilon pick a uniformly random arm (explore), otherwise pick the arm with the highest mean (exploit). The simplest exploration scheme that actually works; no math machinery, tune one knob.
Sensitive to the epsilon value: too low and you under-explore (regret scales linearly in horizon for the wrong arm); too high and you waste pulls on known-bad arms. Typical good values are 0.05–0.2. For automatic tuning use EpsilonDecreasing, which anneals epsilon toward zero as samples accumulate.
Functions
Hook called when a new arm joins the population. Lets stateful policies fold the new arm's snapshot into their global counters (UCB's total-samples, UCB1Normal's arm count). Default no-op.
Allocate a fresh per-arm accumulator from the arm spec. Default delegates to arm.createStat(); override only if the policy needs a non-standard variant.
Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).
Hook called when an arm leaves the population. Inverse of addArm; lets stateful policies remove the departing arm's contribution from their global counters. Default no-op.
EpsilonGreedy
arm
epsilon
evaluate
Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).