ContextualBandit
Context-aware bandit: each round the caller observes a feature vector, uses it to choose an arm, plays the arm, observes a reward, and feeds the (context, reward) pair back to the bandit.
The standard contextual lifecycle:
Caller observes
x: VectorView(e.g. a user feature vector).Caller calls choose with
x; the bandit picks an arm by combining the per-arm model with the context.Caller plays the arm and observes a reward.
Caller calls update with the arm index, the same context
x, and the reward. The bandit updates the per-arm model with the(x, reward)pair.
Concrete contextual bandits typically own one com.eignex.kumulant.core.RegressionStat per arm (com.eignex.kumulant.bandit.contextual.RegressionContextualBandit), one nearest-neighbour reservoir per arm (com.eignex.kumulant.bandit.contextual.KnnContextualBandit), or a mixture-of-experts weighting (com.eignex.kumulant.bandit.contextual.Exp4Bandit).
Implementations source all randomness from Bandit.random.
Inheritors
Properties
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
Functions
Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
choose
Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.