kumulant

Exp4Bandit

class Exp4Bandit(val nbrArms: Int, val experts: List<Exp4Expert>, val eta: Double = defaultEta(nbrArms, experts.size), val gamma: Double = (nbrArms * eta).coerceAtMost(1.0), val random: Random = Random.Default) : ContextualBandit, Snapshotable<Exp4State> (source)

EXP4 (Auer, Cesa-Bianchi, Freund, Schapire 2002); adversarial contextual bandit over a fixed pool of experts. Each round, every expert returns a distribution over arms for the context; the bandit mixes those distributions weighted by per-expert exponential weights, blends with uniform exploration gamma, samples an arm, and on reward r ∈ [0,1] folds the IPS-corrected gain back into the expert weights.

Regret bound is O(sqrt(T · K · ln N)) under the default eta/gamma picks derived from nbrArms (K) and experts.size (N), so the algorithm trades off exploration breadth (more experts) against learning rate. Rewards passed to update must lie in [0, 1] for the regret theory to apply; outside-bound rewards are accepted but may destabilise the weight updates.

State is per-expert (not per-arm) so it surfaces via Snapshotable<Exp4State> rather than the com.eignex.kumulant.bandit.PerArmBandit convenience used by sibling contextual bandits.

Use cases: non-stationary or adversarial contextual problems where a small set of policies (linear scorers, rule-based heuristics, pretrained models) can advise arm distributions; meta-learning over a finite pool of experts.

Arms: contextual with caller-defined feature dimension (every expert's advise returns length nbrArms); nbrArms and experts.size fixed at construction.

Memory: O(experts.size + experts.size · nbrArms); one weight per expert plus a cached last-advice matrix and play distribution.

Choose: O(experts.size · (advise + nbrArms)); query every expert and mix their distributions.

Update: O(experts.size · (advise + nbrArms)); re-evaluates experts at x so the played arm's IPS gain is correct, then multiplicative update across all expert weights.

Randomness: every choose consumes one random.nextDouble(); reproducible under a fixed seed when expert advise is deterministic.

Concurrency: not thread-safe; expert weights, the cached advice matrix, and the cached play distribution are mutated without synchronisation. Serialise choose and update externally for multi-thread use.

Constructors

Link copied to clipboard
constructor(nbrArms: Int, experts: List<Exp4Expert>, eta: Double = defaultEta(nbrArms, experts.size), gamma: Double = (nbrArms * eta).coerceAtMost(1.0), random: Random = Random.Default)

Types

Link copied to clipboard
object Companion

Default-tuning helpers.

Properties

Link copied to clipboard
val eta: Double

Learning rate on per-expert gain updates. Defaults to sqrt(ln(N) / (T * K)) with T = horizon; pass a static value if the horizon is unknown.

Link copied to clipboard

Fixed pool of experts; non-empty.

Link copied to clipboard

Exploration mix: probability mass distributed uniformly across arms before blending in the expert mixture. Defaults to K * eta.

Link copied to clipboard
open override val nbrArms: Int

Number of arms; every expert's distribution must have this length.

Link copied to clipboard
open override val random: Random

Single source of randomness for the round's arm draw.

Functions

Link copied to clipboard
open override fun choose(x: VectorView): Int

Build the round's play distribution and sample an arm.

Link copied to clipboard
open override fun create(random: Random): Exp4Bandit

Spawn a fresh bandit with the same experts and tunables; weights reset to uniform.

Link copied to clipboard

Current per-expert weights, normalised to sum to 1.

Link copied to clipboard
open override fun merge(other: Exp4State)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

Link copied to clipboard

Mean of expert distributions at x weighted by current weights, blended with uniform exploration via gamma.

Link copied to clipboard
open override fun reset()

Reset all expert weights to uniform.

Link copied to clipboard
open override fun snapshot(): Exp4State

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

Link copied to clipboard
open override fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)

Fold a (context, reward) observation back into the expert weights.

Exp4Bandit

constructor(nbrArms: Int, experts: List<Exp4Expert>, eta: Double = defaultEta(nbrArms, experts.size), gamma: Double = (nbrArms * eta).coerceAtMost(1.0), random: Random = Random.Default)(source)

choose

open override fun choose(x: VectorView): Int(source)

Build the round's play distribution and sample an arm.

create

open override fun create(random: Random): Exp4Bandit(source)

Spawn a fresh bandit with the same experts and tunables; weights reset to uniform.

eta

Learning rate on per-expert gain updates. Defaults to sqrt(ln(N) / (T * K)) with T = horizon; pass a static value if the horizon is unknown.

expertWeights

Current per-expert weights, normalised to sum to 1.

experts

Fixed pool of experts; non-empty.

gamma

Exploration mix: probability mass distributed uniformly across arms before blending in the expert mixture. Defaults to K * eta.

merge

open override fun merge(other: Exp4State)(source)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

nbrArms

open override val nbrArms: Int(source)

Number of arms; every expert's distribution must have this length.

playDistribution

Mean of expert distributions at x weighted by current weights, blended with uniform exploration via gamma.

random

open override val random: Random(source)

Single source of randomness for the round's arm draw.

reset

open override fun reset()(source)

Reset all expert weights to uniform.

snapshot

open override fun snapshot(): Exp4State(source)

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

update

open override fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)(source)

Fold a (context, reward) observation back into the expert weights.