UnivariateBandit
Online optimizer over a fixed set of unindexed arms. Each round the caller:
Calls choose to pick an arm.
Plays it externally (whatever "playing an arm" means in the application).
Observes a reward.
Calls update with the arm index and the observed reward.
The reward type is Double; Bernoulli rewards encode as 0.0 / 1.0, continuous rewards pass through as-is, log-normal rewards may want to be pre-transformed via ln(value) before being passed in. Per-arm accumulators interpret the value according to their configured arm type (com.eignex.kumulant.bandit.univariate.Arm).
Implementations source all randomness from Bandit.random; never use Random.Default directly so the caller controls the PRNG.
Inheritors
Properties
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
Functions
Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
choose
Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).
updateAll
Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.
Sizes must match: armIndices.size == values.size, and weights (if non-null) must also match. A null weights argument applies 1.0 to every observation.
update
Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.
Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).