com.eignex.kumulant/bandit/univariate/MultiArmedBandit

MultiArmedBandit

class MultiArmedBandit<R : Result>(val nbrArms: Int, val policy: BanditPolicy<R>, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<R> , Scorable(source)

Univariate bandit with a fixed number of independent arms, each backed by a kumulant SeriesStat; on every choose the bandit asks the policy to score a fresh snapshot per arm and picks the argmax.

The selection rule and the arm accumulator both live in BanditPolicy, so swapping Thompson sampling for UCB1 is a policy swap, not a bandit swap.

Use cases: stationary multi-armed problems with scalar rewards; any policy expressible as "score each arm independently, pick the max".

Arms: indexless, nbrArms fixed at construction; each arm owns one SeriesStat from policy.createArm().

Memory: O(nbrArms · arm-state); per-arm SeriesStat plus a shared atomic step counter.

Choose: O(nbrArms); one policy.evaluate per arm, argmax.

Update: O(1) on the targeted arm, delegated to policy.update.

Randomness: every policy.evaluate and policy.update receives the caller-supplied random; reproducible under a fixed seed if the policy is.

Concurrency: per-arm SeriesStat carries its own concurrency. The step counter is an atomic so concurrent chooses see distinct t values; racing updates on different arms never block. Cross-arm snapshot consistency is best-effort; a concurrent update may interleave between per-arm reads.

Constructors

MultiArmedBandit

constructor(nbrArms: Int, policy: BanditPolicy<R>, random: Random = Random.Default)(source)

Properties

nbrArms

open override val nbrArms: Int(source)

Number of arms in the population.

policy

val policy: BanditPolicy<R>(source)

Policy that owns the per-arm cumulators and the arm-selection rule.

random

open override val random: Random(source)

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Functions

armResult

open override fun armResult(armIndex: Int): R(source)

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

armStat

fun armStat(armIndex: Int): SeriesStat<R>(source)

Live per-arm accumulator owned by this bandit. Exposed so callers can compose with the stat ecosystem - e.g. inspect the running snapshot, plug into a com.eignex.kumulant.schema.StatGroup, or apply ops via the live-stat extensions. Writes flow through the policy's BanditPolicy.update (use MultiArmedBandit.update for that); the returned reference is intended for read-side and composition, not for bypassing the policy.

choose

open override fun choose(): Int(source)

Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).

create

open override fun create(random: Random): MultiArmedBandit<R>(source)

Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).

Useful when a worker accepts a stream of snapshots to apply sequentially: create(random).also { it.merge(snapshot) }.

evaluate

open override fun evaluate(armIndex: Int): Double(source)

Score the arm at armIndex under the bandit's current state. The value's interpretation is policy-specific; UCB upper bound, Thompson draw, mean estimate, etc.; and what the bandit's choose would compare against the other arms' scores.

merge

open override fun merge(other: List<R>)(source)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

reset

open override fun reset()(source)

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

snapshot

open override fun snapshot(): List<R>(source)

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

update

open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)(source)

Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.

Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).

updateAll

open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.