kumulant

bandit.contextual

Contextual bandits: each round comes with a feature vector, and the reward depends on both the chosen arm and the context. Three families, covering linear / non-linear / non-parametric reward models plus the adversarial-expert case.

The three bandits

BanditReward modelReach for it when
RegressionContextualBanditOne regression stat per arm; scoring rule is a com.eignex.kumulant.stat.regression.RegressionPosteriorThe relationship between context and reward is structurally linear / tree-shaped; anything modellable by a com.eignex.kumulant.stat.regression.glm or com.eignex.kumulant.stat.regression.tree regressor.
KnnContextualBanditNearest-neighbours over a reservoir of (context, reward) pairs per armThe reward surface is non-parametric and hard to model; kNN lets the data speak directly. Memory grows with the reservoir size; pick when the feature space is low-dimensional and observations are scarce.
Exp4BanditAdversarial expert mixtureThe context is itself a set of expert recommendations (or a hand-crafted distribution over arms) and you want a regret bound that holds without distributional assumptions.

RegressionContextualBandit

The flagship contextual bandit. Each arm owns a regressor (any com.eignex.kumulant.core.RegressionStat) and a com.eignex.kumulant.stat.regression.RegressionPosterior turns the regressor's snapshot into a per-arm score at the round's context.

Common combinations:

  • Linear Thompson sampling: BayesianRegressionStat per arm + MultivariateGaussian posterior.

  • LinUCB: BayesianRegressionStat per arm + LinUcb posterior.

  • High-dimensional sparse: DiagonalRegressionStat per arm + FactorisedGaussian posterior; trades full covariance for per-coordinate uncertainty.

  • Non-linear: RandomForestRegressionStat per arm + ThompsonForestPosterior or UcbForestPosterior.

  • Cheap point estimates: StochasticRegressionStat per arm + PointPosterior (no exploration; pure greedy). Useful when paired with an explicit exploration policy like epsilon-greedy at a higher layer.

The bandit takes a template regressor at construction; each arm gets its own materialized copy via template.create(concurrency). Optional globalRegression pools across arms for shared structure / faster warm-up; see also HierarchicalBayesianRegression for the explicit cross-arm pooling story.

KnnContextualBandit

Per-arm reservoir of (context, reward) pairs. At choose time, the bandit walks each arm's reservoir, finds the k nearest neighbours to the round's context (by euclidean or cosine distance), and scores the arm by the mean (or a quantile) of their rewards. Optional exploration bonus.

Reach for it when:

  • The feature space is small (kNN cost scales with reservoir × features).

  • The reward surface is genuinely non-linear and hard to parameterise.

  • Cold-start observations are scarce: kNN starts producing reasonable scores from a handful of samples per arm.

KnnArmResult carries the reservoir as a serializable snapshot, so replicas can merge (reservoir union) and ship reservoirs across processes.

Exp4Bandit

EXP4; exponential weights with experts. The context isn't a feature vector but a per-round distribution over arms supplied by each of K experts. EXP4 maintains weights over experts (not arms) and selects an arm by mixing the expert distributions according to expert weights.

Exp4State captures the expert weights; the bandit is the adversarial counterpart of com.eignex.kumulant.bandit.univariate.Exp3Bandit for the contextual case.

Wire portability

ContextualBanditSpec is the sealed root of wire-portable contextual configs:

Both round-trip through skema-based JSON / CBOR and materialise via com.eignex.kumulant.bandit.materialize into the live bandit.

Interface hierarchy

See com.eignex.kumulant.bandit for the cross-cutting interface story (Bandit / ContextualBandit / Snapshotable / PerArmBandit / ContextualScorable).

Types

Link copied to clipboard
@Serializable
sealed interface ContextualBanditSpec

Wire-portable specification for a contextual bandit instance.

Link copied to clipboard
class Exp4Bandit(val nbrArms: Int, val experts: List<Exp4Expert>, val eta: Double = defaultEta(nbrArms, experts.size), val gamma: Double = (nbrArms * eta).coerceAtMost(1.0), val random: Random = Random.Default) : ContextualBandit, Snapshotable<Exp4State>

EXP4 (Auer, Cesa-Bianchi, Freund, Schapire 2002); adversarial contextual bandit over a fixed pool of experts. Each round, every expert returns a distribution over arms for the context; the bandit mixes those distributions weighted by per-expert exponential weights, blends with uniform exploration gamma, samples an arm, and on reward r ∈ [0,1] folds the IPS-corrected gain back into the expert weights.

Link copied to clipboard
fun interface Exp4Expert

Maps a context vector to a probability distribution over arms. Implementations are stateless w.r.t. the bandit; they consult only the context and any internal state frozen at construction. The returned array must have length nbrArms and sum to 1.

Link copied to clipboard
@Serializable
@SerialName(value = "Exp4State")
data class Exp4State(val weights: DoubleArray) : Result

Snapshot of com.eignex.kumulant.bandit.contextual.Exp4Bandit's state: the per-expert exponential weights. The bandit's state is over experts (not arms), so it surfaces via Snapshotable rather than the com.eignex.kumulant.bandit.PerArmBandit per-arm convenience.

Link copied to clipboard
@Serializable
@SerialName(value = "KnnArmResult")
data class KnnArmResult(val contexts: List<DoubleArray>, val rewards: DoubleArray, val weights: DoubleArray, val totalWeight: Double) : Result

Per-arm snapshot for KnnContextualBandit: the retained history of (context, reward, weight) triples plus the cumulative observation weight.

Link copied to clipboard
class KnnContextualBandit(val nbrArms: Int, val k: Int = 5, val maxHistoryPerArm: Int = 1024, val coldStartScore: Double = 1.0, val exploration: Double = 1.0, val distance: (VectorView, VectorView) -> Double = ::squaredL2, val random: Random = Random.Default) : ContextualBandit, PerArmBandit<KnnArmResult> , ContextualScorable

Non-parametric contextual bandit: each arm keeps a bounded FIFO history of past (context, reward, weight) observations and is scored at choose time by the empirical mean reward over the k nearest historical contexts, plus an optional UCB-style bonus that decays with the arm's cumulative weight.

Link copied to clipboard
@Serializable
@SerialName(value = "KnnContextual")
data class KnnContextualSpec(val nbrArms: Int, val k: Int = 5, val maxHistoryPerArm: Int = 1024, val coldStartScore: Double = 1.0, val exploration: Double = 1.0, val distance: String = "squaredL2") : ContextualBanditSpec

Spec for KnnContextualBandit. distance is a named lookup against a small built-in registry; currently "squaredL2" is the only stock entry.

Link copied to clipboard
@Serializable
sealed interface LinearRegressionSpec

Wire-portable spec for the three LinearRegressionResult-typed regressors that RegressionContextualBandit composes with. RegressionTree-based regressors and other non-linear stats are not yet wire-portable; construct them programmatically.

Link copied to clipboard
class RegressionContextualBandit<R : Result>(val nbrArms: Int, template: RegressionStat<R>, val posterior: RegressionPosterior<R>, val exploration: Double = 1.0, globalTemplate: RegressionStat<R>? = null, val random: Random = Random.Default) : ContextualBandit, PerArmBandit<R> , ContextualScorable

Generic contextual bandit: each arm owns a RegressionStat cloned from template and is scored at choose time by the shared posterior under the round's context vector, argmaxed across arms. The same machinery covers every regressor in kumulant:

Link copied to clipboard
@Serializable
@SerialName(value = "RegressionContextual")
data class RegressionContextualSpec(val nbrArms: Int, val regression: LinearRegressionSpec, val posterior: LinearPosterior<*>, val exploration: Double = 1.0, val globalRegression: LinearRegressionSpec? = null) : ContextualBanditSpec

Spec for RegressionContextualBandit with a linear-posterior backbone. The regression variant picks one of the three LinearRegressionResult-typed regressors; the posterior selects the matching scoring rule.