kumulant

bandit

Multi-armed and contextual bandits built on the same Stat / Result foundation as the rest of the library.

Why bandits live here

A multi-armed bandit is the simplest reinforcement-learning shape: on each round you pick one of K actions ("arms"), receive a reward, and the only feedback you get is the reward for the arm you actually played. You never learn what would have happened for the arms you did not pick. The bandit's job is to balance exploitation (play the arm that looks best) against exploration (play a different arm to learn whether it might be better), so cumulative reward converges toward what an oracle who knew the best arm in advance would have collected.

In a contextual bandit each round also comes with a feature vector, and the reward depends on both the chosen arm and the context. Linear, nearest-neighbour, and tree-based contextual bandits all fit that shape and live in the contextual family.

Typical use cases:

  • A/B-test-style optimisation under a budget (picking which creative, variant, or arm to show next).

  • Online recommendation (which item to show this user given their feature vector).

  • Operator selection inside meta-heuristics (which neighbourhood move to try inside an LNS solver).

  • Any sequential decision problem where the data collection itself should adapt as evidence accumulates rather than running a fixed-size experiment.

Bandits fit kumulant naturally because they are themselves streaming problems. Observations arrive one at a time, the per-arm state has to stay bounded no matter how long the run goes, and the same evidence that drives choose is the evidence the rest of the library is already tracking. A bandit arm is a kumulant accumulator viewed through a scoring rule, so the same Welford means, exponential-weight cells, and regression posteriors that power streaming summaries also power Thompson sampling, UCB, EXP3, and LinUCB. Two replicas of a bandit can train in parallel and stitch their snapshots back together with merge, the same way two parallel mean estimators do.

Interface hierarchy

The action surface and the state surface are orthogonal, so each bandit family implements exactly the pieces that fit.

  • Bandit: common root: nbrArms, random, reset.

  • UnivariateBandit: choose() and update(arm, value, weight) for indexless arms.

  • ContextualBandit: choose(x) and update(arm, x, reward, weight) for per-round context vectors.

  • Snapshotable: snapshot, merge, create(random). State shape is whatever the bandit family needs.

  • PerArmBandit: convenience for the common case where state is one Result per arm; extends Snapshotable over a list of results and adds a per-arm armResult accessor.

  • Scorable: opt-in: exposes evaluate(armIndex) when selection is an argmax over independent per-arm scores. ContextualScorable is the contextual analogue.

Joint-sampling bandits (Boltzmann, top-two Thompson) do not expose Scorable because no per-arm score is meaningful in isolation. Exp3 / Exp4 do not fit PerArmBandit because their state is not per-arm. Each concrete bandit's KDoc states which interfaces it implements and why.

Subpackages

  • bandit.univariate: Indexless arms: epsilon-greedy / decreasing, UCB1 / KL-UCB / MOSS, Thompson, Boltzmann, EXP3, multi-armed shells, roulette-wheel selection.

  • bandit.contextual: Per-arm regression bandits over the com.eignex.kumulant.stat.regression family: linear (Bayesian, diagonal, stochastic), kNN, tree- and forest-based.

  • bandit.policy: Pluggable scoring policies (Greedy, EpsilonGreedy, UCB1, ThompsonSampling, KLUcb, MOSS, etc.) shared by univariate bandits.

Wire portability

A bandit's BanditSpec round-trips through the same skema-based mechanism as com.eignex.kumulant.schema.StatSpec: declare a spec, encode to JSON / CBOR, ship, decode, materialise. The same data classes parameterise both Bandit construction in code and Bandit construction from the wire.

Types

Link copied to clipboard
interface Bandit

Root of every bandit kumulant ships. Carries the bare minimum every flavour needs: arm count, a randomness source, and a way to wipe state back to its prior-seeded baseline.

Link copied to clipboard

Context-aware bandit: each round the caller observes a feature vector, uses it to choose an arm, plays the arm, observes a reward, and feeds the (context, reward) pair back to the bandit.

Link copied to clipboard

Contextual analog of Scorable: per-arm score under the current state and a supplied context vector. Implemented by com.eignex.kumulant.bandit.contextual.RegressionContextualBandit and com.eignex.kumulant.bandit.contextual.KnnContextualBandit; both have an argmax-shaped selection rule that decomposes into per-arm scores.

Link copied to clipboard

Convenience for the dominant case where bandit state is one Result per arm. Adds per-arm access on top of Snapshotable; useful for inspection, debugging, and policies that want to peek at a single arm's posterior without materialising the whole list.

Link copied to clipboard
interface Scorable

Opt-in per-arm scoring for inspection / debugging / custom selectors. Bandits whose UnivariateBandit.choose is an argmax over independent per-arm scores expose this; UCB1, Thompson, epsilon-greedy, etc.

Link copied to clipboard
interface Snapshotable<S>

State surface for any bandit whose state can be checkpointed, replicated, and merged with a sibling's. Orthogonal to the action surface; every bandit family has its own natural S:

Link copied to clipboard
class TrackedContextualBandit<B : ContextualBandit>(val inner: B, val contextFeatureSize: Int, chooseTemplate: RegressionStat<out Result>? = null, updateJointTemplate: RegressionStat<out Result>? = null, updateMarginalTemplate: RegressionStat<out Result>? = null, updateArmRewardTemplate: PairedStat<out Result>? = null, nowNanos: () -> Long = { 0L }) : ContextualBandit

Observability wrapper around any ContextualBandit. Every event flows into a small set of aggregate side stats, each modelling a different question about the bandit's behaviour. Arm-level bucketing is a separate stratify op; until that lands, encode the arm into the observation (via the joint template) and read the slope/contrast off the resulting stat.

Link copied to clipboard
class TrackedUnivariateBandit<B : UnivariateBandit>(val inner: B, chooseTemplate: SeriesStat<out Result>? = null, updateArmRewardTemplate: PairedStat<out Result>? = null, nowNanos: () -> Long = { 0L }) : UnivariateBandit

Observability wrapper around any UnivariateBandit. Univariate has no context vector so two aggregate slots cover the surface:

Link copied to clipboard

Online optimizer over a fixed set of unindexed arms. Each round the caller:

Properties

Link copied to clipboard

Built-in distance functions referenced by KnnContextualSpec.distance. Extend by passing a custom map when constructing the bandit programmatically.

Functions

Link copied to clipboard

Build a live BanditPolicy from its spec.

fun BoltzmannSpec.materialize(random: Random = Random.Default): BoltzmannBandit

Build a live BoltzmannBandit from its spec.

fun Exp3Spec.materialize(random: Random = Random.Default): Exp3Bandit

Build a live Exp3Bandit from its spec, resolving null eta / gamma to defaults.

fun MultiArmedSpec<*>.materialize(random: Random = Random.Default): MultiArmedBandit<Result>

Build a live UnivariateBandit from its spec.

Build a live RouletteWheelBandit from its spec.

fun <R : Result> TopTwoThompsonSpec<R>.materialize(random: Random = Random.Default): TopTwoThompsonBandit<R>

Build a live TopTwoThompsonBandit from its spec.

fun UnivariateBanditSpec.materialize(random: Random = Random.Default): Bandit

Dispatch any UnivariateBanditSpec to its concrete bandit.

fun ContextualBanditSpec.materialize(random: Random = Random.Default, concurrency: Concurrency = Concurrency.None): Bandit

Dispatch any ContextualBanditSpec to its concrete bandit.

fun KnnContextualSpec.materialize(random: Random = Random.Default, distanceRegistry: Map<String, (VectorView, VectorView) -> Double> = knnDistanceRegistry): KnnContextualBandit

Build a live KnnContextualBandit from its spec, resolving the distance function via distanceRegistry (defaults to knnDistanceRegistry).

fun RegressionContextualSpec.materialize(random: Random = Random.Default, concurrency: Concurrency = Concurrency.None): RegressionContextualBandit<out LinearRegressionResult>

Build a live RegressionContextualBandit from its spec.