com.eignex.kumulant/bandit/TrackedUnivariateBandit

TrackedUnivariateBandit

class TrackedUnivariateBandit<B : UnivariateBandit>(val inner: B, chooseTemplate: SeriesStat<out Result>? = null, updateArmRewardTemplate: PairedStat<out Result>? = null, nowNanos: () -> Long = { 0L }) : UnivariateBandit(source)

Observability wrapper around any UnivariateBandit. Univariate has no context vector so two aggregate slots cover the surface:

chooseTemplate sees update(value = armIndex.toDouble(), weight = 1.0) at every choose. The arm-pick distribution over time.
updateArmRewardTemplate sees update(x = armIndex.toDouble(), y = reward, weight) at every update. Per-arm reward distribution.

Both templates are optional; null disables that side.

Constructors

TrackedUnivariateBandit

constructor(inner: B, chooseTemplate: SeriesStat<out Result>? = null, updateArmRewardTemplate: PairedStat<out Result>? = null, nowNanos: () -> Long = { 0L })(source)

Properties

inner

val inner: B(source)

Underlying bandit; exposed for PerArmBandit / Scorable access.

nbrArms

open override val nbrArms: Int(source)

Number of arms in the population. Fixed at construction; arm indices are [0, nbrArms).

random

open override val random: Random(source)

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Functions

choose

open override fun choose(): Int(source)

Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).

chooseResult

fun chooseResult(): Result?(source)

Snapshot of the choose-side arm-pick distribution; null when chooseTemplate is unset.

reset

open override fun reset()(source)

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

update

open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)(source)

Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.

Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).

updateArmRewardResult

fun updateArmRewardResult(): Result?(source)

Snapshot of the arm-versus-reward paired stat; null when updateArmRewardTemplate is unset.

updateAll

open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.