com.eignex.kumulant/bandit/contextual/Exp4Bandit

Exp4Bandit

class Exp4Bandit(val nbrArms: Int, val experts: List<Exp4Expert>, val eta: Double = defaultEta(nbrArms, experts.size), val gamma: Double = (nbrArms * eta).coerceAtMost(1.0), val random: Random = Random.Default) : ContextualBandit, Snapshotable<Exp4State> (source)

EXP4 (Auer, Cesa-Bianchi, Freund, Schapire 2002); adversarial contextual bandit over a fixed pool of experts. Each round, every expert returns a distribution over arms for the context; the bandit mixes those distributions weighted by per-expert exponential weights, blends with uniform exploration gamma, samples an arm, and on reward r ∈ [0,1] folds the IPS-corrected gain back into the expert weights.

Regret bound is O(sqrt(T · K · ln N)) under the default eta/gamma picks derived from nbrArms (K) and experts.size (N), so the algorithm trades off exploration breadth (more experts) against learning rate. Rewards passed to update must lie in [0, 1] for the regret theory to apply; outside-bound rewards are accepted but may destabilise the weight updates.

State is per-expert (not per-arm) so it surfaces via Snapshotable<Exp4State> rather than the com.eignex.kumulant.bandit.PerArmBandit convenience used by sibling contextual bandits.

Use cases: non-stationary or adversarial contextual problems where a small set of policies (linear scorers, rule-based heuristics, pretrained models) can advise arm distributions; meta-learning over a finite pool of experts.

Arms: contextual with caller-defined feature dimension (every expert's advise returns length nbrArms); nbrArms and experts.size fixed at construction.

Memory: O(experts.size + experts.size · nbrArms); one weight per expert plus a cached last-advice matrix and play distribution.

Choose: O(experts.size · (advise + nbrArms)); query every expert and mix their distributions.

Update: O(experts.size · (advise + nbrArms)); re-evaluates experts at x so the played arm's IPS gain is correct, then multiplicative update across all expert weights.

Randomness: every choose consumes one random.nextDouble(); reproducible under a fixed seed when expert advise is deterministic.

Concurrency: not thread-safe; expert weights, the cached advice matrix, and the cached play distribution are mutated without synchronisation. Serialise choose and update externally for multi-thread use.

Constructors

Exp4Bandit

constructor(nbrArms: Int, experts: List<Exp4Expert>, eta: Double = defaultEta(nbrArms, experts.size), gamma: Double = (nbrArms * eta).coerceAtMost(1.0), random: Random = Random.Default)(source)

Types

Companion

object Companion

Default-tuning helpers.

Properties

eta

val eta: Double(source)

Learning rate on per-expert gain updates. Defaults to sqrt(ln(N) / (T * K)) with T = horizon; pass a static value if the horizon is unknown.

experts

val experts: List<Exp4Expert>(source)

Fixed pool of experts; non-empty.

gamma

val gamma: Double(source)

Exploration mix: probability mass distributed uniformly across arms before blending in the expert mixture. Defaults to K * eta.

nbrArms

open override val nbrArms: Int(source)

Number of arms; every expert's distribution must have this length.

random

open override val random: Random(source)

Single source of randomness for the round's arm draw.

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

update

open override fun update(armIndex: Int, x: VectorView, reward: Double, weight: Double = 1.0)(source)

Fold a (context, reward) observation back into the expert weights.

Exp4Bandit

Constructors

Exp4Bandit

Types

Properties

eta

experts

gamma

nbrArms

random

Functions

choose

create

expertWeights

merge

playDistribution

reset

snapshot

update