kumulant

UnivariateBandit

Online optimizer over a fixed set of unindexed arms. Each round the caller:

  1. Calls choose to pick an arm.

  2. Plays it externally (whatever "playing an arm" means in the application).

  3. Observes a reward.

  4. Calls update with the arm index and the observed reward.

The reward type is Double; Bernoulli rewards encode as 0.0 / 1.0, continuous rewards pass through as-is, log-normal rewards may want to be pre-transformed via ln(value) before being passed in. Per-arm accumulators interpret the value according to their configured arm type (com.eignex.kumulant.bandit.univariate.Arm).

Implementations source all randomness from Bandit.random; never use Random.Default directly so the caller controls the PRNG.

Inheritors

Properties

Link copied to clipboard
abstract val nbrArms: Int

Number of arms in the population. Fixed at construction; arm indices are [0, nbrArms).

Link copied to clipboard
abstract val random: Random

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Functions

Link copied to clipboard
abstract fun choose(): Int

Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).

Link copied to clipboard
abstract fun reset()

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

Link copied to clipboard
abstract fun update(armIndex: Int, value: Double, weight: Double = 1.0)

Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.

Link copied to clipboard
open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.

choose

abstract fun choose(): Int(source)

Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).

updateAll

open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)(source)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.

Sizes must match: armIndices.size == values.size, and weights (if non-null) must also match. A null weights argument applies 1.0 to every observation.

update

abstract fun update(armIndex: Int, value: Double, weight: Double = 1.0)(source)

Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.

Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).