com.eignex.kumulant/bandit/contextual/RegressionContextualBandit

RegressionContextualBandit

class RegressionContextualBandit<R : Result>(val nbrArms: Int, template: RegressionStat<R>, val posterior: RegressionPosterior<R>, val exploration: Double = 1.0, globalTemplate: RegressionStat<R>? = null, val random: Random = Random.Default) : ContextualBandit, PerArmBandit<R> , ContextualScorable(source)

Generic contextual bandit: each arm owns a RegressionStat cloned from template and is scored at choose time by the shared posterior under the round's context vector, argmaxed across arms. The same machinery covers every regressor in kumulant:

Linear Thompson Sampling: com.eignex.kumulant.stat.regression.glm.BayesianRegressionStat

com.eignex.kumulant.stat.regression.glm.MultivariateGaussian.

LinUcb: any linear regressor + com.eignex.kumulant.stat.regression.glm.LinUcb.
Greedy SGD: com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat + com.eignex.kumulant.stat.regression.glm.PointPosterior with exploration = 0.0.
Decision-tree bandit: com.eignex.kumulant.stat.regression.tree.DecisionTreeRegressionStat

a com.eignex.kumulant.stat.regression.tree.TreePosterior.

Random-forest bandit: com.eignex.kumulant.stat.regression.tree.RandomForestRegressionStat

a com.eignex.kumulant.stat.regression.tree.ForestPosterior.

Per-arm regressors are constructed via template.create(null) so per-arm state is independent. exploration scales the posterior's exploration parameter; pass 0.0 for pure exploitation (point estimates only).

Optional continuous pooling: when globalTemplate is non-null the bandit also maintains a global regressor that absorbs every (x, reward) regardless of arm. Per-arm regressors then fit residuals against the global's mean prediction, and arm scoring adds the global's mean back in. The global's mean is read via posterior.evaluate(globalSnapshot, x, rng, exploration = 0.0); i.e. the same posterior at zero exploration; so any regressor whose posterior implements exploration = 0 as mean-prediction (every built-in one does) can be pooled. Caveats are the same as the linear-only version: policy-weighted global bias, approximate joint fit, exploration variance underestimated where the global itself is uncertain. For true hierarchical Bayes use com.eignex.kumulant.stat.regression.glm.BayesianRegressionStat.fitPopulationPrior.

Use cases: parametric and tree-based contextual bandits over scalar rewards; any RegressionStat + RegressionPosterior pairing that supports the policy you want (Thompson, LinUCB, greedy, tree, forest).

Arms: contextual with caller-defined feature dimension; nbrArms fixed at construction. Per-arm state is the cloned regressor; optional global regressor is a single additional cell.

Memory: O(nbrArms · regressor-state) plus optional O(regressor-state) for the global. The dominant per-arm term depends on the regressor; e.g. O(featureSize^2) for Bayesian/LinUCB Gram matrices, O(featureSize) for SGD, tree-size-dependent for trees and forests.

Choose: O(nbrArms · posterior-evaluate) plus one global evaluate when pooling is on. posterior-evaluate is regressor-dependent (e.g. O(featureSize^2) for Bayesian sampling, O(featureSize) for point predictions).

Update: O(regressor-update) on the played arm, plus one global update when pooling is on. Regressor-dependent: e.g. O(featureSize^2) for Sherman-Morrison Bayesian updates, O(featureSize) for SGD.

Randomness: every posterior evaluate (per arm during choose, plus the optional global at exploration = 0) receives the caller-supplied random; reproducible under a fixed seed if the posterior is.

Concurrency: per-arm RegressionStat carries its own concurrency, and the optional global is a single shared RegressionStat whose concurrency it likewise inherits. Cross-arm snapshot consistency during choose is best-effort under racing updates.

Constructors

RegressionContextualBandit

constructor(nbrArms: Int, template: RegressionStat<R>, posterior: RegressionPosterior<R>, exploration: Double = 1.0, globalTemplate: RegressionStat<R>? = null, random: Random = Random.Default)(source)

Properties

exploration

val exploration: Double(source)

Per-evaluate exploration scale forwarded to the posterior; 0.0 collapses to the point estimate.

nbrArms

open override val nbrArms: Int(source)

Number of arms in the population. Fixed at construction; arm indices are [0, nbrArms).

posterior

val posterior: RegressionPosterior<R>(source)

Stateless arm scorer applied to each per-arm snapshot at choose time.

random

open override val random: Random(source)

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Functions

armResult

open override fun armResult(armIndex: Int): R(source)

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

armStat

fun armStat(armIndex: Int): RegressionStat<R>(source)

Live per-arm regressor. When pooling is on this fits residuals against the global mean, so its predictions are deltas, not full predictions; use evaluate for the combined score and globalSnapshot for the global's state.

choose

open override fun choose(x: VectorView): Int(source)

Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.

create

open override fun create(random: Random): RegressionContextualBandit<R>(source)

Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).

Useful when a worker accepts a stream of snapshots to apply sequentially: create(random).also { it.merge(snapshot) }.

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

update

open override fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)(source)

Fold a single (x, reward) observation into the arm at armIndex. The weight is the same observation-weight running through the library; typically 1.0, occasionally importance-weighted.

RegressionContextualBandit

Constructors

RegressionContextualBandit

Properties

exploration

nbrArms

posterior

random

Functions

armResult

armStat

choose

create

evaluate

globalSnapshot

globalStat

merge

mergeGlobal

reset

snapshot

update