com.eignex.kumulant/bandit/univariate/UCB1

UCB1

class UCB1(val alpha: Double = 1.0, priorAlpha: Double = 1.0, priorBeta: Double = 1.0) : BanditPolicy<BernoulliSumResult> (source)

Classical UCB1 (Auer, Cesa-Bianchi, Fischer 2002). Score is mean + alpha * sqrt(2 * ln(totalSamples) / armSamples); exploitation (running mean) plus a confidence bound that shrinks as the arm accumulates pulls. Unexplored arms get +infinity so they're tried at least once.

Designed for Bernoulli rewards but works on any [0, 1]-bounded reward. The exploration constant alpha scales the confidence width; the theoretical value is 1.0, lower values reduce exploration, higher increases it.

Pair this with a BernoulliArm beta prior; the prior alpha/beta seed the snapshot so the first few pulls aren't dominated by integer noise.

Constructors

UCB1

constructor(alpha: Double = 1.0, priorAlpha: Double = 1.0, priorBeta: Double = 1.0)(source)

Properties

alpha

val alpha: Double(source)

Exploration scale on the confidence-bound term. Theoretical default is 1.0.

arm

open override val arm: BernoulliArm(source)

Per-arm cumulator spec; determines the prior pseudo-counts, value encoding, and result shape that evaluate consumes.

Functions

addArm

open override fun addArm(snapshot: BernoulliSumResult)(source)

Hook called when a new arm joins the population. Lets stateful policies fold the new arm's snapshot into their global counters (UCB's total-samples, UCB1Normal's arm count). Default no-op.

evaluate

open override fun evaluate(snapshot: BernoulliSumResult, step: Long, rng: Random): Double(source)

Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).

removeArm

open override fun removeArm(snapshot: BernoulliSumResult)(source)

Hook called when an arm leaves the population. Inverse of addArm; lets stateful policies remove the departing arm's contribution from their global counters. Default no-op.

update

open override fun update(stat: SeriesStat<BernoulliSumResult>, value: Double, weight: Double = 1.0)(source)

Fold an observed reward value (with optional weight) into the per-arm stat. Default applies arm.encode first; policies with global counters (UCB families) override to update their counter alongside the stat update.

createArm

open fun createArm(): SeriesStat<BernoulliSumResult>

Allocate a fresh per-arm accumulator from the arm spec. Default delegates to arm.createStat(); override only if the policy needs a non-standard variant.