TrackedUnivariateBandit
Observability wrapper around any UnivariateBandit. Univariate has no context vector so two aggregate slots cover the surface:
chooseTemplate sees
update(value = armIndex.toDouble(), weight = 1.0)at everychoose. The arm-pick distribution over time.updateArmRewardTemplate sees
update(x = armIndex.toDouble(), y = reward, weight)at everyupdate. Per-arm reward distribution.
Both templates are optional; null disables that side.
Constructors
Properties
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
Functions
Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).
Snapshot of the choose-side arm-pick distribution; null when chooseTemplate is unset.
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
Snapshot of the arm-versus-reward paired stat; null when updateArmRewardTemplate is unset.
TrackedUnivariateBandit
chooseResult
Snapshot of the choose-side arm-pick distribution; null when chooseTemplate is unset.
choose
Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).
inner
nbrArms
random
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
reset
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
updateArmRewardResult
Snapshot of the arm-versus-reward paired stat; null when updateArmRewardTemplate is unset.
update
Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.
Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).