com.eignex.kumulant/bandit/TrackedContextualBandit

TrackedContextualBandit

class TrackedContextualBandit<B : ContextualBandit>(val inner: B, val contextFeatureSize: Int, chooseTemplate: RegressionStat<out Result>? = null, updateJointTemplate: RegressionStat<out Result>? = null, updateMarginalTemplate: RegressionStat<out Result>? = null, updateArmRewardTemplate: PairedStat<out Result>? = null, nowNanos: () -> Long = { 0L }) : ContextualBandit(source)

Observability wrapper around any ContextualBandit. Every event flows into a small set of aggregate side stats, each modelling a different question about the bandit's behaviour. Arm-level bucketing is a separate stratify op; until that lands, encode the arm into the observation (via the joint template) and read the slope/contrast off the resulting stat.

Templates are independent and any subset may be null:

chooseTemplate sees update(x = context, y = armIndex.toDouble(), weight = 1.0) at every choose. Models the bandit's policy; the distribution of arm selections as a function of context.
updateJointTemplate sees update(x = [armIndex.toDouble()] ++ context, y = reward, weight) at every update. Joint reward model with the chosen arm prepended as an extra feature; the coefficient on the arm dimension is the arm-conditional effect. featureSize of this template must equal 1 + contextFeatureSize.
updateMarginalTemplate sees update(x = context, y = reward, weight) at every update. Marginal reward-given-context model, agnostic to arm. featureSize must equal contextFeatureSize.
updateArmRewardTemplate sees update(x = armIndex.toDouble(), y = reward, weight) at every update. Per-arm reward distribution expressed as a paired stat; covariance, correlation, or per-arm slope.

The wrapper itself only satisfies ContextualBandit; the underlying bandit is exposed as inner typed B so callers reach extra interfaces; snapshot(), armResult, evaluate(i, x); through tracked.inner.<method> without losing static type information.

Constructors

TrackedContextualBandit

constructor(inner: B, contextFeatureSize: Int, chooseTemplate: RegressionStat<out Result>? = null, updateJointTemplate: RegressionStat<out Result>? = null, updateMarginalTemplate: RegressionStat<out Result>? = null, updateArmRewardTemplate: PairedStat<out Result>? = null, nowNanos: () -> Long = { 0L })(source)

Properties

contextFeatureSize

val contextFeatureSize: Int(source)

Context vector dimension validated against templates and incoming updates.

inner

val inner: B(source)

Underlying bandit; exposed for PerArmBandit / ContextualScorable access.

nbrArms

open override val nbrArms: Int(source)

Number of arms in the population. Fixed at construction; arm indices are [0, nbrArms).

random

open override val random: Random(source)

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Functions

choose

open override fun choose(x: VectorView): Int(source)

Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.

chooseResult

fun chooseResult(): Result?(source)

Snapshot of the policy regressor; null when chooseTemplate is unset.

reset

open override fun reset()(source)

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

update

open override fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)(source)

Fold a single (x, reward) observation into the arm at armIndex. The weight is the same observation-weight running through the library; typically 1.0, occasionally importance-weighted.

updateArmRewardResult

fun updateArmRewardResult(): Result?(source)

Snapshot of the arm-versus-reward paired stat; null when updateArmRewardTemplate is unset.

updateJointResult

fun updateJointResult(): Result?(source)

Snapshot of the joint reward regressor; null when updateJointTemplate is unset.

updateMarginalResult

fun updateMarginalResult(): Result?(source)

Snapshot of the marginal reward regressor; null when updateMarginalTemplate is unset.