kumulant

RegressionContextualBandit

class RegressionContextualBandit<R : Result>(val nbrArms: Int, template: RegressionStat<R>, val posterior: RegressionPosterior<R>, val exploration: Double = 1.0, globalTemplate: RegressionStat<R>? = null, val random: Random = Random.Default) : ContextualBandit, PerArmBandit<R> , ContextualScorable(source)

Generic contextual bandit: each arm owns a RegressionStat cloned from template and is scored at choose time by the shared posterior under the round's context vector, argmaxed across arms. The same machinery covers every regressor in kumulant:

Per-arm regressors are constructed via template.create(null) so per-arm state is independent. exploration scales the posterior's exploration parameter; pass 0.0 for pure exploitation (point estimates only).

Optional continuous pooling: when globalTemplate is non-null the bandit also maintains a global regressor that absorbs every (x, reward) regardless of arm. Per-arm regressors then fit residuals against the global's mean prediction, and arm scoring adds the global's mean back in. The global's mean is read via posterior.evaluate(globalSnapshot, x, rng, exploration = 0.0); i.e. the same posterior at zero exploration; so any regressor whose posterior implements exploration = 0 as mean-prediction (every built-in one does) can be pooled. Caveats are the same as the linear-only version: policy-weighted global bias, approximate joint fit, exploration variance underestimated where the global itself is uncertain. For true hierarchical Bayes use com.eignex.kumulant.stat.regression.glm.BayesianRegressionStat.fitPopulationPrior.

Use cases: parametric and tree-based contextual bandits over scalar rewards; any RegressionStat + RegressionPosterior pairing that supports the policy you want (Thompson, LinUCB, greedy, tree, forest).

Arms: contextual with caller-defined feature dimension; nbrArms fixed at construction. Per-arm state is the cloned regressor; optional global regressor is a single additional cell.

Memory: O(nbrArms · regressor-state) plus optional O(regressor-state) for the global. The dominant per-arm term depends on the regressor; e.g. O(featureSize^2) for Bayesian/LinUCB Gram matrices, O(featureSize) for SGD, tree-size-dependent for trees and forests.

Choose: O(nbrArms · posterior-evaluate) plus one global evaluate when pooling is on. posterior-evaluate is regressor-dependent (e.g. O(featureSize^2) for Bayesian sampling, O(featureSize) for point predictions).

Update: O(regressor-update) on the played arm, plus one global update when pooling is on. Regressor-dependent: e.g. O(featureSize^2) for Sherman-Morrison Bayesian updates, O(featureSize) for SGD.

Randomness: every posterior evaluate (per arm during choose, plus the optional global at exploration = 0) receives the caller-supplied random; reproducible under a fixed seed if the posterior is.

Concurrency: per-arm RegressionStat carries its own concurrency, and the optional global is a single shared RegressionStat whose concurrency it likewise inherits. Cross-arm snapshot consistency during choose is best-effort under racing updates.

Constructors

Link copied to clipboard
constructor(nbrArms: Int, template: RegressionStat<R>, posterior: RegressionPosterior<R>, exploration: Double = 1.0, globalTemplate: RegressionStat<R>? = null, random: Random = Random.Default)

Properties

Link copied to clipboard

Per-evaluate exploration scale forwarded to the posterior; 0.0 collapses to the point estimate.

Link copied to clipboard
open override val nbrArms: Int

Number of arms in the population. Fixed at construction; arm indices are [0, nbrArms).

Link copied to clipboard

Stateless arm scorer applied to each per-arm snapshot at choose time.

Link copied to clipboard
open override val random: Random

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Functions

Link copied to clipboard
open override fun armResult(armIndex: Int): R

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

Link copied to clipboard
fun armStat(armIndex: Int): RegressionStat<R>

Live per-arm regressor. When pooling is on this fits residuals against the global mean, so its predictions are deltas, not full predictions; use evaluate for the combined score and globalSnapshot for the global's state.

Link copied to clipboard
open override fun choose(x: VectorView): Int

Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.

Link copied to clipboard
open override fun create(random: Random): RegressionContextualBandit<R>

Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).

Link copied to clipboard
open override fun evaluate(armIndex: Int, x: VectorView): Double

Score the arm at armIndex under the current state and context x.

Link copied to clipboard

Current global pooling snapshot, or null if pooling is disabled.

Link copied to clipboard

Live global pooling regressor, or null if pooling is disabled.

Link copied to clipboard
open override fun merge(other: List<R>)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

Link copied to clipboard
fun mergeGlobal(other: R)

Merge another bandit replica's global snapshot. No-op when pooling is disabled.

Link copied to clipboard
open override fun reset()

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

Link copied to clipboard
open override fun snapshot(): List<R>

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

Link copied to clipboard
open override fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)

Fold a single (x, reward) observation into the arm at armIndex. The weight is the same observation-weight running through the library; typically 1.0, occasionally importance-weighted.

RegressionContextualBandit

constructor(nbrArms: Int, template: RegressionStat<R>, posterior: RegressionPosterior<R>, exploration: Double = 1.0, globalTemplate: RegressionStat<R>? = null, random: Random = Random.Default)(source)

armResult

open override fun armResult(armIndex: Int): R(source)

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

armStat

fun armStat(armIndex: Int): RegressionStat<R>(source)

Live per-arm regressor. When pooling is on this fits residuals against the global mean, so its predictions are deltas, not full predictions; use evaluate for the combined score and globalSnapshot for the global's state.

choose

open override fun choose(x: VectorView): Int(source)

Pick an arm to play next, given the per-round context x. The bandit combines the context with its per-arm model to score each arm under a configurable com.eignex.kumulant.stat.regression.RegressionPosterior (or analogue) and returns the argmax / sampled choice.

create

open override fun create(random: Random): RegressionContextualBandit<R>(source)

Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).

Useful when a worker accepts a stream of snapshots to apply sequentially: create(random).also { it.merge(snapshot) }.

evaluate

open override fun evaluate(armIndex: Int, x: VectorView): Double(source)

Score the arm at armIndex under the current state and context x.

exploration

Per-evaluate exploration scale forwarded to the posterior; 0.0 collapses to the point estimate.

globalSnapshot

Current global pooling snapshot, or null if pooling is disabled.

globalStat

Live global pooling regressor, or null if pooling is disabled.

mergeGlobal

fun mergeGlobal(other: R)(source)

Merge another bandit replica's global snapshot. No-op when pooling is disabled.

merge

open override fun merge(other: List<R>)(source)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

nbrArms

open override val nbrArms: Int(source)

Number of arms in the population. Fixed at construction; arm indices are [0, nbrArms).

posterior

Stateless arm scorer applied to each per-arm snapshot at choose time.

random

open override val random: Random(source)

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

reset

open override fun reset()(source)

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

snapshot

open override fun snapshot(): List<R>(source)

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

update

open override fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)(source)

Fold a single (x, reward) observation into the arm at armIndex. The weight is the same observation-weight running through the library; typically 1.0, occasionally importance-weighted.