com.eignex.kumulant/bandit/univariate/Exp3Bandit

Exp3Bandit

class Exp3Bandit(val nbrArms: Int, val eta: Double = defaultEta(nbrArms), val gamma: Double = (nbrArms * eta).coerceAtMost(1.0), val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<Exp3ArmResult> (source)

EXP3 (Auer, Cesa-Bianchi, Freund, Schapire 2002); adversarial multi-armed bandit over a fixed pool of nbrArms. Each round: compute play distribution p[a] = (1 - gamma) · w[a]/Σw + gamma/K, sample a ~ p, then on reward r ∈ [0,1] update w[a] *= exp(eta · r / p[a]) using the importance-sampling-corrected gain.

Regret bound is O(sqrt(T · K · ln K)) under default tunings. Univariate sibling to com.eignex.kumulant.bandit.contextual.Exp4Bandit; same machinery without the expert layer. Standalone class (not a BanditPolicy under MultiArmedBandit) because its sampling distribution is computed across arms, not by independent-per-arm score + argmax.

Rewards passed to update must lie in [0, 1] for the regret theory to apply; outside-bound rewards are accepted but may destabilise the exponential weight update.

Use cases: non-stationary or adversarial scalar-reward problems where the per-arm reward distribution may shift over time; settings where a regret bound is wanted without distributional assumptions.

Arms: indexless, nbrArms fixed at construction; per-arm state is one exponential weight.

Memory: O(nbrArms); one weight per arm plus a cached play distribution.

Choose: O(nbrArms); build the play distribution, inverse-CDF sample.

Update: O(nbrArms); rebuilds the play distribution to read p[arm], then multiplicative weight update on the played arm.

Randomness: every choose consumes one random.nextDouble(); reproducible under a fixed seed.

Concurrency: not thread-safe; weights and the cached play distribution are mutated without synchronisation. Serialise choose and update externally for multi-thread use.

Constructors

Exp3Bandit

constructor(nbrArms: Int, eta: Double = defaultEta(nbrArms), gamma: Double = (nbrArms * eta).coerceAtMost(1.0), random: Random = Random.Default)(source)

Types

Companion

object Companion

EXP3 tuning defaults from Auer et al.

Properties

eta

val eta: Double(source)

Learning rate on per-arm gain updates.

gamma

val gamma: Double(source)

Exploration mix: probability mass distributed uniformly across arms.

nbrArms

open override val nbrArms: Int(source)

Number of arms.

random

open override val random: Random(source)

Single source of randomness.

Functions

armWeights

fun armWeights(): DoubleArray(source)

Current per-arm weights, normalised to sum to 1.

choose

open override fun choose(): Int(source)

Build the round's play distribution and sample an arm.

create

open override fun create(random: Random): Exp3Bandit(source)

Spawn a fresh bandit with the same tunables; weights reset.

merge

open override fun merge(other: List<Exp3ArmResult>)(source)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

playDistribution

fun playDistribution(): DoubleArray(source)

Current play distribution: weight-normalised softmax blended with uniform gamma.

reset

open override fun reset()(source)

Reset all weights to uniform.

snapshot

open override fun snapshot(): List<Exp3ArmResult>(source)

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

update

open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)(source)

Fold a (arm, reward) observation into the played arm's weight.

armResult

open fun armResult(armIndex: Int): Exp3ArmResult

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

updateAll

open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.