kumulant

ThompsonSampling

class ThompsonSampling<R : Result>(val arm: Arm<R>, val posterior: Posterior<R>) : BanditPolicy<R> (source)

Thompson sampling: score each arm by a draw from its conjugate posterior given the snapshot. The bandit then picks the arm with the highest sample; no explicit exploration knob, the exploration falls out of posterior variance shrinking as data accumulates.

Pair an Arm with a Posterior of the same result type R:

Stateless across arms; addArm / removeArm are no-ops because no global counter is involved.

Constructors

Link copied to clipboard
constructor(arm: Arm<R>, posterior: Posterior<R>)

Properties

Link copied to clipboard
open override val arm: Arm<R>

Per-arm cumulator spec; determines the prior pseudo-counts, value encoding, and result shape that evaluate consumes.

Link copied to clipboard

Stateless sampler used to draw a score from each arm's snapshot.

Functions

Link copied to clipboard
open fun addArm(snapshot: R)

Hook called when a new arm joins the population. Lets stateful policies fold the new arm's snapshot into their global counters (UCB's total-samples, UCB1Normal's arm count). Default no-op.

Link copied to clipboard
open fun createArm(): SeriesStat<R>

Allocate a fresh per-arm accumulator from the arm spec. Default delegates to arm.createStat(); override only if the policy needs a non-standard variant.

Link copied to clipboard
open override fun evaluate(snapshot: R, step: Long, rng: Random): Double

Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).

Link copied to clipboard
open fun removeArm(snapshot: R)

Hook called when an arm leaves the population. Inverse of addArm; lets stateful policies remove the departing arm's contribution from their global counters. Default no-op.

Link copied to clipboard
open fun update(stat: SeriesStat<R>, value: Double, weight: Double = 1.0)

Fold an observed reward value (with optional weight) into the per-arm stat. Default applies arm.encode first; policies with global counters (UCB families) override to update their counter alongside the stat update.

ThompsonSampling

constructor(arm: Arm<R>, posterior: Posterior<R>)(source)

arm

open override val arm: Arm<R>(source)

Per-arm cumulator spec; determines the prior pseudo-counts, value encoding, and result shape that evaluate consumes.

evaluate

open override fun evaluate(snapshot: R, step: Long, rng: Random): Double(source)

Score an arm given its current snapshot. Higher scores are preferred by the bandit. step is the global update count (for time-dependent exploration schedules); rng is the bandit's shared com.eignex.kumulant.bandit.Bandit.random (consumed by sampling policies).

posterior

Stateless sampler used to draw a score from each arm's snapshot.