ThompsonSampling
Thompson sampling: score each arm by a draw from its conjugate posterior given the snapshot. The bandit then picks the arm with the highest sample; no explicit exploration knob, the exploration falls out of posterior variance shrinking as data accumulates.
Pair an Arm with a Posterior of the same result type R:
BernoulliArm + BetaPosterior: see BetaBernoulliTS.
NormalArm + NormalGammaPosterior: see NormalTS.
MeanArm + PoissonGammaPosterior / GeometricBetaPosterior / ExponentialGammaPosterior / GammaScalePosterior; see PoissonTS, GeometricTS, ExponentialTS, GammaScaleTS.
Stateless across arms; addArm / removeArm are no-ops because no global counter is involved.
Functions
Allocate a fresh per-arm accumulator from the arm spec. Default delegates to arm.createStat(); override only if the policy needs a non-standard variant.
Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).
ThompsonSampling
arm
evaluate
Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).