TrackedContextualBandit
Observability wrapper around any ContextualBandit. Every event flows into a small set of aggregate side stats, each modelling a different question about the bandit's behaviour. Arm-level bucketing is a separate stratify op; until that lands, encode the arm into the observation (via the joint template) and read the slope/contrast off the resulting stat.
Templates are independent and any subset may be null:
chooseTemplate sees
update(x = context, y = armIndex.toDouble(), weight = 1.0)at everychoose. Models the bandit's policy; the distribution of arm selections as a function of context.updateJointTemplate sees
update(x = [armIndex.toDouble()] ++ context, y = reward, weight)at everyupdate. Joint reward model with the chosen arm prepended as an extra feature; the coefficient on the arm dimension is the arm-conditional effect.featureSizeof this template must equal1 + contextFeatureSize.updateMarginalTemplate sees
update(x = context, y = reward, weight)at everyupdate. Marginal reward-given-context model, agnostic to arm.featureSizemust equalcontextFeatureSize.updateArmRewardTemplate sees
update(x = armIndex.toDouble(), y = reward, weight)at everyupdate. Per-arm reward distribution expressed as a paired stat; covariance, correlation, or per-arm slope.
The wrapper itself only satisfies ContextualBandit; the underlying bandit is exposed as inner typed B so callers reach extra interfaces; snapshot(), armResult, evaluate(i, x); through tracked.inner.<method> without losing static type information.
Constructors
Properties
Context vector dimension validated against templates and incoming updates.
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
Functions
Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.
Snapshot of the policy regressor; null when chooseTemplate is unset.
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
Snapshot of the arm-versus-reward paired stat; null when updateArmRewardTemplate is unset.
Snapshot of the joint reward regressor; null when updateJointTemplate is unset.
Snapshot of the marginal reward regressor; null when updateMarginalTemplate is unset.
TrackedContextualBandit
chooseResult
Snapshot of the policy regressor; null when chooseTemplate is unset.
choose
Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.
contextFeatureSize
Context vector dimension validated against templates and incoming updates.
inner
nbrArms
random
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
reset
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
updateArmRewardResult
Snapshot of the arm-versus-reward paired stat; null when updateArmRewardTemplate is unset.
updateJointResult
Snapshot of the joint reward regressor; null when updateJointTemplate is unset.
updateMarginalResult
Snapshot of the marginal reward regressor; null when updateMarginalTemplate is unset.