RegressionContextualBandit
Generic contextual bandit: each arm owns a RegressionStat cloned from template and is scored at choose time by the shared posterior under the round's context vector, argmaxed across arms. The same machinery covers every regressor in kumulant:
Linear Thompson Sampling: com.eignex.kumulant.stat.regression.glm.BayesianRegressionStat
LinUcb: any linear regressor + com.eignex.kumulant.stat.regression.glm.LinUcb.
Greedy SGD: com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat + com.eignex.kumulant.stat.regression.glm.PointPosterior with
exploration = 0.0.Decision-tree bandit: com.eignex.kumulant.stat.regression.tree.DecisionTreeRegressionStat
Random-forest bandit: com.eignex.kumulant.stat.regression.tree.RandomForestRegressionStat
Per-arm regressors are constructed via template.create(null) so per-arm state is independent. exploration scales the posterior's exploration parameter; pass 0.0 for pure exploitation (point estimates only).
Optional continuous pooling: when globalTemplate is non-null the bandit also maintains a global regressor that absorbs every (x, reward) regardless of arm. Per-arm regressors then fit residuals against the global's mean prediction, and arm scoring adds the global's mean back in. The global's mean is read via posterior.evaluate(globalSnapshot, x, rng, exploration = 0.0); i.e. the same posterior at zero exploration; so any regressor whose posterior implements exploration = 0 as mean-prediction (every built-in one does) can be pooled. Caveats are the same as the linear-only version: policy-weighted global bias, approximate joint fit, exploration variance underestimated where the global itself is uncertain. For true hierarchical Bayes use com.eignex.kumulant.stat.regression.glm.BayesianRegressionStat.fitPopulationPrior.
Use cases: parametric and tree-based contextual bandits over scalar rewards; any RegressionStat + RegressionPosterior pairing that supports the policy you want (Thompson, LinUCB, greedy, tree, forest).
Arms: contextual with caller-defined feature dimension; nbrArms fixed at construction. Per-arm state is the cloned regressor; optional global regressor is a single additional cell.
Memory: O(nbrArms · regressor-state) plus optional O(regressor-state) for the global. The dominant per-arm term depends on the regressor; e.g. O(featureSize^2) for Bayesian/LinUCB Gram matrices, O(featureSize) for SGD, tree-size-dependent for trees and forests.
Choose: O(nbrArms · posterior-evaluate) plus one global evaluate when pooling is on. posterior-evaluate is regressor-dependent (e.g. O(featureSize^2) for Bayesian sampling, O(featureSize) for point predictions).
Update: O(regressor-update) on the played arm, plus one global update when pooling is on. Regressor-dependent: e.g. O(featureSize^2) for Sherman-Morrison Bayesian updates, O(featureSize) for SGD.
Randomness: every posterior evaluate (per arm during choose, plus the optional global at exploration = 0) receives the caller-supplied random; reproducible under a fixed seed if the posterior is.
Concurrency: per-arm RegressionStat carries its own concurrency, and the optional global is a single shared RegressionStat whose concurrency it likewise inherits. Cross-arm snapshot consistency during choose is best-effort under racing updates.
Constructors
Properties
Per-evaluate exploration scale forwarded to the posterior; 0.0 collapses to the point estimate.
Stateless arm scorer applied to each per-arm snapshot at choose time.
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
Functions
Live per-arm regressor. When pooling is on this fits residuals against the global mean, so its predictions are deltas, not full predictions; use evaluate for the combined score and globalSnapshot for the global's state.
Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.
Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).
Current global pooling snapshot, or null if pooling is disabled.
Live global pooling regressor, or null if pooling is disabled.
Merge another bandit replica's global snapshot. No-op when pooling is disabled.
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
RegressionContextualBandit
armResult
armStat
Live per-arm regressor. When pooling is on this fits residuals against the global mean, so its predictions are deltas, not full predictions; use evaluate for the combined score and globalSnapshot for the global's state.
choose
Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.
create
Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).
Useful when a worker accepts a stream of snapshots to apply sequentially: create(random).also { it.merge(snapshot) }.
evaluate
exploration
Per-evaluate exploration scale forwarded to the posterior; 0.0 collapses to the point estimate.
globalSnapshot
Current global pooling snapshot, or null if pooling is disabled.
globalStat
Live global pooling regressor, or null if pooling is disabled.
mergeGlobal
Merge another bandit replica's global snapshot. No-op when pooling is disabled.
merge
nbrArms
posterior
Stateless arm scorer applied to each per-arm snapshot at choose time.
random
Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.
reset
Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.
snapshot
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.