kumulant

ContextualBandit

Context-aware bandit: each round the caller observes a feature vector, uses it to choose an arm, plays the arm, observes a reward, and feeds the (context, reward) pair back to the bandit.

The standard contextual lifecycle:

  1. Caller observes x: VectorView (e.g. a user feature vector).

  2. Caller calls choose with x; the bandit picks an arm by combining the per-arm model with the context.

  3. Caller plays the arm and observes a reward.

  4. Caller calls update with the arm index, the same context x, and the reward. The bandit updates the per-arm model with the (x, reward) pair.

Concrete contextual bandits typically own one com.eignex.kumulant.core.RegressionStat per arm (com.eignex.kumulant.bandit.contextual.RegressionContextualBandit), one nearest-neighbour reservoir per arm (com.eignex.kumulant.bandit.contextual.KnnContextualBandit), or a mixture-of-experts weighting (com.eignex.kumulant.bandit.contextual.Exp4Bandit).

Implementations source all randomness from Bandit.random.

Inheritors

Properties

Link copied to clipboard
abstract val nbrArms: Int

Number of arms in the population. Fixed at construction; arm indices are [0, nbrArms).

Link copied to clipboard
abstract val random: Random

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Functions

Link copied to clipboard
abstract fun choose(x: VectorView): Int

Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.

Link copied to clipboard
abstract fun reset()

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

Link copied to clipboard
abstract fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)

Fold a single (x, reward) observation into the arm at armIndex. The weight is the same observation-weight running through the library; typically 1.0, occasionally importance-weighted.

choose

abstract fun choose(x: VectorView): Int(source)

Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.

update

abstract fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)(source)

Fold a single (x, reward) observation into the arm at armIndex. The weight is the same observation-weight running through the library; typically 1.0, occasionally importance-weighted.