MultiArmedBandit
Univariate bandit with a fixed number of independent arms, each backed by a kumulant SeriesStat; on every choose the bandit asks the policy to score a fresh snapshot per arm and picks the argmax.
The selection rule and the arm accumulator both live in BanditPolicy, so swapping Thompson sampling for UCB1 is a policy swap, not a bandit swap.
Use cases: stationary multi-armed problems with scalar rewards; any policy expressible as "score each arm independently, pick the max".
Arms: indexless, nbrArms fixed at construction; each arm owns one SeriesStat from policy.createArm().
Memory: O(nbrArms · arm-state); per-arm SeriesStat plus a shared atomic step counter.
Choose: O(nbrArms); one policy.evaluate per arm, argmax.
Update: O(1) on the targeted arm, delegated to policy.update.
Randomness: every policy.evaluate and policy.update receives the caller-supplied random; reproducible under a fixed seed if the policy is.
Concurrency: per-arm SeriesStat carries its own concurrency. The step counter is an atomic so concurrent chooses see distinct t values; racing updates on different arms never block. Cross-arm snapshot consistency is best-effort; a concurrent update may interleave between per-arm reads.
Properties
Policy that owns the per-arm cumulators and the arm-selection rule.
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
Functions
Live per-arm accumulator owned by this bandit. Exposed so callers can compose with the stat ecosystem - e.g. inspect the running snapshot, plug into a com.eignex.kumulant.schema.StatGroup, or apply ops via the live-stat extensions. Writes flow through the policy's BanditPolicy.update (use MultiArmedBandit.update for that); the returned reference is intended for read-side and composition, not for bypassing the policy.
Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).
Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
MultiArmedBandit
armResult
armStat
Live per-arm accumulator owned by this bandit. Exposed so callers can compose with the stat ecosystem - e.g. inspect the running snapshot, plug into a com.eignex.kumulant.schema.StatGroup, or apply ops via the live-stat extensions. Writes flow through the policy's BanditPolicy.update (use MultiArmedBandit.update for that); the returned reference is intended for read-side and composition, not for bypassing the policy.
choose
Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).
create
Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).
Useful when a worker accepts a stream of snapshots to apply sequentially: create(random).also { it.merge(snapshot) }.
evaluate
merge
nbrArms
policy
Policy that owns the per-arm cumulators and the arm-selection rule.
random
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
reset
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
snapshot
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
update
Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.
Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).