TopTwoThompsonBandit
Top-Two Thompson Sampling (Russo 2020); pure-exploration variant of Thompson sampling for best-arm identification: sample every arm's posterior, take the argmax arm1, play it with probability beta, or else resample until the argmax differs from arm1 and play that runner-up.
The forced resampling keeps a fraction of the budget on the runner-up so the gap to the best arm is identified asymptotically optimally. Converges to the optimal exploration fraction beta = 0.5 for two-armed problems; tune lower to bias toward exploitation when running in the regret-minimisation regime.
Doesn't expose com.eignex.kumulant.bandit.Scorable: arm selection samples jointly and conditionally resamples, so there's no per-arm score callers can read in isolation. The per-arm posterior state still fits PerArmBandit.
Use cases: best-arm identification and pure-exploration problems where shrinking the regret gap matters more than minimising cumulative regret; any posterior expressible as a ThompsonSampling policy.
Arms: indexless, nbrArms ≥ 2 fixed at construction; each arm owns one posterior cell from policy.createArm().
Memory: O(nbrArms · arm-state); per-arm posterior plus a step counter.
Choose: O(nbrArms) expected; O(maxResamples · nbrArms) worst case when the second-arm resample loop spins to the cap.
Update: O(1) on the targeted arm, delegated to policy.update.
Randomness: every posterior sample and the beta coin flip use the caller-supplied random; reproducible under a fixed seed if the policy is.
Concurrency: per-arm com.eignex.kumulant.core.SeriesStat carries its own concurrency. The step counter is non-atomic; concurrent choose calls race on it. Cross-arm snapshot consistency during choose is best-effort under racing updates.
Properties
Functions
Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
Single posterior sample per arm; return the argmax.
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
TopTwoThompsonBandit
armResult
beta
choose
create
Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).
Useful when a worker accepts a stream of snapshots to apply sequentially: create(random).also { it.merge(snapshot) }.
maxResamples
Cap on the resample loop when searching for a different second arm.
merge
nbrArms
policy
Per-arm posterior + arm spec.
random
reset
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
sampleArgmax
Single posterior sample per arm; return the argmax.
snapshot
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
update
Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.
Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).