com.eignex.kumulant/bandit/univariate

bandit.univariate

Indexless multi-armed bandits. Each round, the bandit picks one of K arms via choose(), the caller observes a reward, and update(arm, value, weight) folds it back into that arm's accumulator. No per-round feature vector; for that, see com.eignex.kumulant.bandit.contextual.

Bandit shells

MultiArmedBandit is the workhorse. It carries a BanditPolicy and a list of Arms; choose delegates to the policy's scoring rule. Most named bandits in the literature (UCB1, Thompson, epsilon-greedy, etc.) are just a policy swap on this shell.

Bandit	Selection rule
MultiArmedBandit	Argmax over per-arm policy scores, or whatever else the policy implements (joint sampling, etc.).
BoltzmannBandit	Softmax over per-arm means with a cooling temperature schedule.
Exp3Bandit	Adversarial bandit with exponential-weights updates and a regret bound under non-stationary reward distributions.
RouletteWheelBandit	Operator-selection roulette where arm probability is proportional to score. Used in meta-heuristics where each "arm" is a neighbourhood move and the score is the recent improvement rate.
TopTwoThompsonBandit	Top-two Thompson sampling: draw two samples, play the second-best with probability `1 - beta`. Identifies the best arm faster than vanilla Thompson when many arms are competitive.

Policies

BanditPolicy is the scoring strategy plugged into MultiArmedBandit. The library ships every commonly-cited one:

Policy	Family
Greedy	Pure exploitation; argmax of point estimates. The baseline.
EpsilonGreedy, EpsilonDecreasing	Mix of exploitation and uniform-random exploration; epsilon either fixed or annealed.
UniformSelection	Pure exploration, used as a baseline.
UCB1, UCB1Normal, UCB1Tuned	Upper-confidence-bound family; variants differ in confidence-interval shape.
KlUcb	KL-UCB; tighter bound than UCB1 for Bernoulli arms.
Moss	MOSS bound; near-optimal regret across stationary settings.
UcbV	UCB-V; UCB with variance-aware confidence width.
ThompsonSampling	Posterior sampling: draw from each arm's posterior, play argmax.

Each policy reads its per-arm state through a Posterior adapter that projects the arm's com.eignex.kumulant.core.Result to whatever the scoring rule needs (a mean, a Beta posterior, a normal-gamma posterior, etc.). The GammaScalePosterior is the canonical example used by BoltzmannBandit for variance-scaled softmax temperatures.

Arms

Arm is the per-arm state contract. Each arm pairs a stat with the com.eignex.kumulant.core.Concurrency and reset story it needs:

Arm	Backing stat	Suits
BernoulliArm	com.eignex.kumulant.stat.summary.BernoulliSumStat + count	Binary rewards (click / no click, pass / fail).
MeanArm	com.eignex.kumulant.stat.summary.MeanStat	Continuous rewards where mean suffices.
NormalArm	com.eignex.kumulant.stat.summary.VarianceStat	Continuous rewards with normally-distributed noise; carries enough state for UCB-V and Thompson with normal-gamma.
LogNormalArm	Welford over `log(value)`	Multiplicative rewards (revenue, latency).
MomentsArm	com.eignex.kumulant.stat.summary.MomentsStat	Higher-order shape matters (skewness / kurtosis aware scoring).

CompositeArm (and CompositeSubArm) model multi-component rewards like zero-inflated lognormal revenue, without writing a per-shape arm class. Routing and score combination travel as com.eignex.kumulant.schema.ScalarExpr expressions, so the whole composite round-trips on the wire alongside the rest of the bandit config.

Wire portability

UnivariateBanditSpec is the sealed root of wire-portable bandit configs:

MultiArmedSpec: bandit + policy + arm list.
RouletteWheelSpec: roulette-wheel variant.
Other family-specific specs co-located here.

Configurations and policies round-trip through skema-based JSON / CBOR just like the com.eignex.kumulant.schema.StatSpec family. The materializer in com.eignex.kumulant.bandit takes a spec and a Random and returns the live bandit; pass the same seed across replicas for reproducible exploration.

Interface hierarchy

See com.eignex.kumulant.bandit for the action/state interface split: which bandits expose com.eignex.kumulant.bandit.Scorable, which implement com.eignex.kumulant.bandit.PerArmBandit, and where joint- sampling bandits diverge from the per-arm-score path.

Types

Arm

@Serializable

sealed interface Arm<R : Result>

Recipe for one bandit arm's cumulator side: how to build a freshly-seeded SeriesStat for that arm, and how to encode a raw observation before folding it into the stat. Posteriors and BanditPolicys pair with arm specs of the same R.

BanditPolicy

interface BanditPolicy<R : Result>

Scoring strategy for a com.eignex.kumulant.bandit.univariate.MultiArmedBandit. Decides which arm to play given snapshots of each arm's sufficient statistic R. The bandit calls evaluate for every arm and picks the argmax; the policy is the entire exploration/exploitation knob.

BanditPolicySpec

@Serializable

sealed interface BanditPolicySpec<R : Result>

Wire-portable specification for a BanditPolicy.

BernoulliArm

@Serializable

@SerialName(value = "BernoulliArm")

data class BernoulliArm(val priorAlpha: Double = 1.0, val priorBeta: Double = 1.0) : Arm<BernoulliSumResult>

Bernoulli arm. The reward is binary {0, 1} and the unknown is the success probability p. A Beta(priorAlpha, priorBeta) prior is conjugate to the Bernoulli likelihood; the posterior is Beta(priorAlpha + successes, priorBeta + failures).

BetaPosterior

@Serializable

@SerialName(value = "BetaPosterior")

data object BetaPosterior : Posterior<BernoulliSumResult>

Beta posterior over a Bernoulli rate. successes and trials-successes are the Beta parameters; both must be positive (i.e. snapshot must be prior-seeded).

BoltzmannBandit

class BoltzmannBandit(val nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02, val initialTau: Double = 1.0, val minTau: Double = 0.001, val decay: Double = 1.0, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<WeightedMeanResult>

Boltzmann exploration (a.k.a. softmax bandit): samples arm a with probability proportional to exp(mean[a] / tau(t)), where tau(t) is the temperature at round t and per-arm means are tracked by independent com.eignex.kumulant.stat.summary.MeanStat cells.

BoltzmannSpec

@Serializable

@SerialName(value = "Boltzmann")

data class BoltzmannSpec(val nbrArms: Int, val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val initialTau: Double = 1.0, val minTau: Double = 0.001, val decay: Double = 1.0) : UnivariateBanditSpec

Spec for BoltzmannBandit.

CompositeArm

@Serializable

@SerialName(value = "CompositeArm")

data class CompositeArm(val subArms: List<CompositeSubArm>) : Arm<ResultList<Result>>

Composite Arm built from N independent sub-arms. Each observation is routed through every sub-arm's CompositeSubArm.valueExpr / CompositeSubArm.weightExpr / CompositeSubArm.filter AST before being fed to the corresponding per-sub-arm accumulator. The composite result is a ResultList of the sub-snapshots; pair with CompositePosterior to combine sub-arm draws into a single score.

CompositePosterior

@Serializable

@SerialName(value = "CompositePosterior")

data class CompositePosterior(val subPosteriors: List<Posterior<*>>, val combine: ScalarExpr) : Posterior<ResultList<Result>>

Composite Posterior over the sub-snapshots produced by a CompositeArm. Each sub-posterior draws independently; the resulting samples are packed as V(0)..V(N-1) and reduced to a single score by the combine AST.

CompositeSubArm

@Serializable

@SerialName(value = "CompositeSubArm")

data class CompositeSubArm(val arm: Arm<*>, val valueExpr: ScalarExpr = X, val weightExpr: ScalarExpr = Const(1.0), val filter: BoolExpr? = null)

One leg of a CompositeArm: which arm receives observations, with optional AST-driven transformation of value, weight, and a filter predicate.

EpsilonDecreasing

class EpsilonDecreasing(val epsilon: Double = 2.0, val decay: Double = 0.5, priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Annealed epsilon-greedy. Effective exploration probability decreases as sample count accumulates: eps(t) = min(1, epsilon / totalSamples^decay).

EpsilonDecreasingSpec

@Serializable

@SerialName(value = "EpsilonDecreasing")

data class EpsilonDecreasingSpec(val epsilon: Double = 2.0, val decay: Double = 0.5, val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>

Spec for EpsilonDecreasing.

EpsilonGreedy

class EpsilonGreedy(val epsilon: Double = 0.1, priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Epsilon-greedy: with probability epsilon pick a uniformly random arm (explore), otherwise pick the arm with the highest mean (exploit). The simplest exploration scheme that actually works; no math machinery, tune one knob.

EpsilonGreedySpec

@Serializable

@SerialName(value = "EpsilonGreedy")

data class EpsilonGreedySpec(val epsilon: Double = 0.1, val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>

Spec for EpsilonGreedy.

Exp3ArmResult

@Serializable

@SerialName(value = "Exp3ArmResult")

data class Exp3ArmResult(val weight: Double) : Result

Per-arm snapshot for Exp3Bandit: the exponential-weight cell for one arm.

Exp3Bandit

class Exp3Bandit(val nbrArms: Int, val eta: Double = defaultEta(nbrArms), val gamma: Double = (nbrArms * eta).coerceAtMost(1.0), val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<Exp3ArmResult>

EXP3 (Auer, Cesa-Bianchi, Freund, Schapire 2002); adversarial multi-armed bandit over a fixed pool of nbrArms. Each round: compute play distribution p[a] = (1 - gamma) · w[a]/Σw + gamma/K, sample a ~ p, then on reward r ∈ [0,1] update w[a] *= exp(eta · r / p[a]) using the importance-sampling-corrected gain.

Exp3Spec

@Serializable

@SerialName(value = "Exp3")

data class Exp3Spec(val nbrArms: Int, val eta: Double? = null, val gamma: Double? = null) : UnivariateBanditSpec

Spec for Exp3Bandit. Pass null for eta / gamma to use the algorithm's defaults.

ExponentialGammaPosterior

@Serializable

@SerialName(value = "ExponentialGammaPosterior")

data object ExponentialGammaPosterior : Posterior<WeightedMeanResult>

Gamma posterior over an Exponential rate.

GammaScalePosterior

@Serializable

@SerialName(value = "GammaScalePosterior")

data class GammaScalePosterior(val fixedShape: Double) : Posterior<WeightedMeanResult>

Gamma posterior over the scale of a Gamma likelihood with fixed shape - the shape is a posterior parameter rather than something we infer from data. Not an object because of that parameter.

GeometricBetaPosterior

@Serializable

@SerialName(value = "GeometricBetaPosterior")

data object GeometricBetaPosterior : Posterior<WeightedMeanResult>

Beta posterior over a Geometric success probability.

Greedy

class Greedy(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Pure-exploitation policy: always picks the arm with the highest posterior mean. No exploration at all; converges fastest to the apparent best arm but can lock into a suboptimal arm forever if early rewards mislead it.

GreedySpec

@Serializable

@SerialName(value = "Greedy")

data class GreedySpec(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>

Spec for Greedy.

KlUcb

class KlUcb(val c: Double = 0.0, val tolerance: Double = 1.0E-6, priorAlpha: Double = 1.0, priorBeta: Double = 1.0) : BanditPolicy<BernoulliSumResult>

KL-UCB (Garivier & Cappé 2011). UCB variant for Bernoulli arms with a KL-divergence confidence bound instead of the Hoeffding bound UCB1 uses. Score is the largest q in [mean, 1] such that n * KL(mean, q) <= ln(t) + c * ln(ln(t)); computed by binary search with tolerance precision.

KlUcbSpec

@Serializable

@SerialName(value = "KlUcb")

data class KlUcbSpec(val c: Double = 0.0, val tolerance: Double = 1.0E-6, val priorAlpha: Double = 1.0, val priorBeta: Double = 1.0) : BanditPolicySpec<BernoulliSumResult>

Spec for KlUcb.

LogNormalArm

@Serializable

@SerialName(value = "LogNormalArm")

data class LogNormalArm(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 2.0) : Arm<WeightedVarianceResult>

Like NormalArm but folds ln(value) into the stat via encode. The right pick when rewards are multiplicative rather than additive; revenue per session, latency in milliseconds, anything where the noise scales with the magnitude.

LogNormalGammaPosterior

@Serializable

@SerialName(value = "LogNormalGammaPosterior")

data object LogNormalGammaPosterior : Posterior<WeightedVarianceResult>

Log-Normal-Gamma posterior: same draw as NormalGammaPosterior but exp-transformed back to the real scale. Intended for arms whose stat already accumulates log-rewards (see LogNormalArm's encode).

MeanArm

@Serializable

@SerialName(value = "MeanArm")

data class MeanArm(val priorMean: Double = 1.0, val priorWeight: Double = 0.01) : Arm<WeightedMeanResult>

Single-moment arm; tracks the running mean but not variance. The right pick when the likelihood's sufficient statistic is one running sum (or equivalently a running mean × count):

MomentsArm

@Serializable

@SerialName(value = "MomentsArm")

data class MomentsArm(val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : Arm<MomentsResult>

Moments-tracking arm. Backs MomentsStat under the hood, which means the snapshot exposes the raw second moment m2 (in addition to mean and variance); required by the variance-aware UCB policies that need mean-of-squares directly:

Moss

class Moss(val nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<WeightedMeanResult>

MOSS; Minimax Optimal Strategy in the Stochastic case (Audibert & Bubeck 2009). UCB variant where the confidence bound shrinks faster than UCB1 once an arm has accumulated more than t / K samples. Score is mean + sqrt(max(0, ln(t / (K * n))) / n).

MossSpec

@Serializable

@SerialName(value = "Moss")

data class MossSpec(val nbrArms: Int, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<WeightedMeanResult>

Spec for Moss.

MultiArmedBandit

class MultiArmedBandit<R : Result>(val nbrArms: Int, val policy: BanditPolicy<R>, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<R> , Scorable

Univariate bandit with a fixed number of independent arms, each backed by a kumulant SeriesStat; on every choose the bandit asks the policy to score a fresh snapshot per arm and picks the argmax.

MultiArmedSpec

@Serializable

@SerialName(value = "MultiArmed")

data class MultiArmedSpec<R : Result>(val nbrArms: Int, val policy: BanditPolicySpec<R>) : UnivariateBanditSpec

Spec for MultiArmedBandit.

NormalArm

@Serializable

@SerialName(value = "NormalArm")

data class NormalArm(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : Arm<WeightedVarianceResult>

Gaussian arm with a Normal-Gamma prior (unknown mean and variance). Tracks both the running mean and the sum of squared deviations, which gives NormalGammaPosterior enough to draw (mean, variance) jointly and gives the variance-aware policies (Greedy, EpsilonGreedy) a reasonable variance estimate.

NormalGammaPosterior

@Serializable

@SerialName(value = "NormalGammaPosterior")

data object NormalGammaPosterior : Posterior<WeightedVarianceResult>

Normal-Gamma posterior over a normal mean/variance. Draws (variance, mean) jointly: variance ~ Inverse-Gamma(n/2, n*s^2/2), mean | variance ~ Normal(snapshot.mean, sigma^2/n).

PoissonGammaPosterior

@Serializable

@SerialName(value = "PoissonGammaPosterior")

data object PoissonGammaPosterior : Posterior<WeightedMeanResult>

Gamma posterior over a Poisson rate: Gamma(sum, totalWeights).

Posterior

@Serializable

sealed interface Posterior<R : Result>

Stateless conjugate posterior over a univariate likelihood, parameterised by the sufficient-statistic snapshot R. A Posterior is a pure (snapshot, rng) -> sample function: no priors, no per-arm state, no update path. Arm lifecycle, value encoding, and prior seeding all live in Arm.

RouletteWheelArmResult

@Serializable

@SerialName(value = "RouletteWheelArmResult")

data class RouletteWheelArmResult(val weight: Double, val accumulatedScore: Double, val callCount: Int) : Result

Per-arm state snapshot for RouletteWheelBandit. Exposes the current weight plus the running segment counters callers may want to inspect for debugging.

RouletteWheelBandit

class RouletteWheelBandit(val nbrArms: Int, val reactionFactor: Double = 0.1, val segmentLength: Int = 10, val initialWeight: Double = 1.0, val minWeight: Double = 0.01, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<RouletteWheelArmResult> , Scorable

Adaptive operator-selection bandit in the Ropke-Pisinger 2006 ALNS scheme: each arm carries a weight, choose samples proportional to weights (roulette wheel), and weights re-balance in batches.

RouletteWheelSpec

@Serializable

@SerialName(value = "RouletteWheel")

data class RouletteWheelSpec(val nbrArms: Int, val reactionFactor: Double = 0.1, val segmentLength: Int = 10, val initialWeight: Double = 1.0, val minWeight: Double = 0.01) : UnivariateBanditSpec

Spec for RouletteWheelBandit.

ThompsonSampling

class ThompsonSampling<R : Result>(val arm: Arm<R>, val posterior: Posterior<R>) : BanditPolicy<R>

Thompson sampling: score each arm by a draw from its conjugate posterior given the snapshot. The bandit then picks the arm with the highest sample; no explicit exploration knob, the exploration falls out of posterior variance shrinking as data accumulates.

ThompsonSamplingSpec

@Serializable

@SerialName(value = "ThompsonSampling")

data class ThompsonSamplingSpec<R : Result>(val arm: Arm<R>, val posterior: Posterior<R>) : BanditPolicySpec<R>

Spec for ThompsonSampling.

TopTwoThompsonBandit

class TopTwoThompsonBandit<R : Result>(val nbrArms: Int, val policy: ThompsonSampling<R>, val beta: Double = 0.5, val maxResamples: Int = 32, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<R>

Top-Two Thompson Sampling (Russo 2020); pure-exploration variant of Thompson sampling for best-arm identification: sample every arm's posterior, take the argmax arm1, play it with probability beta, or else resample until the argmax differs from arm1 and play that runner-up.

TopTwoThompsonSpec

@Serializable

@SerialName(value = "TopTwoThompson")

data class TopTwoThompsonSpec<R : Result>(val nbrArms: Int, val policy: ThompsonSamplingSpec<R>, val beta: Double = 0.5, val maxResamples: Int = 32) : UnivariateBanditSpec

Spec for TopTwoThompsonBandit.

UCB1

class UCB1(val alpha: Double = 1.0, priorAlpha: Double = 1.0, priorBeta: Double = 1.0) : BanditPolicy<BernoulliSumResult>

Classical UCB1 (Auer, Cesa-Bianchi, Fischer 2002). Score is mean + alpha * sqrt(2 * ln(totalSamples) / armSamples); exploitation (running mean) plus a confidence bound that shrinks as the arm accumulates pulls. Unexplored arms get +infinity so they're tried at least once.

UCB1Normal

class UCB1Normal(val alpha: Double = 1.0, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<MomentsResult>

UCB1-Normal (Auer et al. 2002). Variance-aware UCB for Gaussian rewards; uses the sample variance derived from the MomentsResult snapshot to scale the confidence bound. Reach for it when rewards are roughly Gaussian and unbounded; UCB1's [0, 1] assumption doesn't hold.

Ucb1NormalSpec

@Serializable

@SerialName(value = "UCB1Normal")

data class Ucb1NormalSpec(val alpha: Double = 1.0, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<MomentsResult>

Spec for UCB1Normal.

Ucb1Spec

@Serializable

@SerialName(value = "UCB1")

data class Ucb1Spec(val alpha: Double = 1.0, val priorAlpha: Double = 1.0, val priorBeta: Double = 1.0) : BanditPolicySpec<BernoulliSumResult>

Spec for UCB1.

UCB1Tuned

class UCB1Tuned(val alpha: Double = 1.0, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<MomentsResult>

UCB1-Tuned (Auer et al. 2002). Same shape as UCB1 but the confidence bound multiplier uses an upper bound on the variance: min(0.25, v) where v is the sample variance plus a small padding term. Tighter bound than plain UCB1 when the empirical variance is well below 0.25; degrades gracefully to UCB1 when the variance is uninformative.

Ucb1TunedSpec

@Serializable

@SerialName(value = "UCB1Tuned")

data class Ucb1TunedSpec(val alpha: Double = 1.0, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<MomentsResult>

Spec for UCB1Tuned.

UcbV

class UcbV(val zeta: Double = 1.2, val c: Double = 1.0, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<MomentsResult>

UCB-V; variance-aware UCB with finite-sample honesty (Audibert, Munos, Szepesvári 2009). Score is mean + sqrt(2 * V * zeta * ln(t) / n) + 3 * c * zeta * ln(t) / n, where V is the running variance from the MomentsResult snapshot.

UcbVSpec

@Serializable

@SerialName(value = "UcbV")

data class UcbVSpec(val zeta: Double = 1.2, val c: Double = 1.0, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<MomentsResult>

Spec for UcbV.

UniformSelection

class UniformSelection(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Pure-exploration policy: every evaluate returns a fresh uniform draw, so the bandit picks arms uniformly at random regardless of observations. No exploitation at all; the opposite extreme of Greedy.

UniformSelectionSpec

@Serializable

@SerialName(value = "UniformSelection")

data class UniformSelectionSpec(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>

Spec for UniformSelection.

UnivariateBanditSpec

@Serializable

sealed interface UnivariateBanditSpec

Wire-portable specification for a univariate bandit instance.

Functions

BetaBernoulliTS

fun BetaBernoulliTS(priorAlpha: Double = 1.0, priorBeta: Double = 1.0): ThompsonSampling<BernoulliSumResult>

Thompson sampling over a Beta(priorAlpha, priorBeta) prior on a Bernoulli reward.

ExponentialTS

fun ExponentialTS(priorMean: Double = 1.0, priorWeight: Double = 0.01): ThompsonSampling<WeightedMeanResult>

Thompson sampling over an exponential reward with a Gamma prior on the rate.

GammaScaleTS

fun GammaScaleTS(fixedShape: Double, priorMean: Double = 1.0, priorWeight: Double = 0.1): ThompsonSampling<WeightedMeanResult>

Thompson sampling over a Gamma reward with known shape and Gamma prior on the scale.

GeometricTS

fun GeometricTS(priorMean: Double = 2.0, priorWeight: Double = 1.0): ThompsonSampling<WeightedMeanResult>

Thompson sampling over a geometric reward with a Beta prior on the success probability.

LogNormalTS

fun LogNormalTS(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 2.0): ThompsonSampling<WeightedVarianceResult>

Thompson sampling over a log-normal reward via Normal-Gamma on log(value).

NormalTS

fun NormalTS(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02): ThompsonSampling<WeightedVarianceResult>

Thompson sampling over a Normal-Gamma prior; unknown mean and variance.

PoissonTS

fun PoissonTS(priorMean: Double = 1.0, priorWeight: Double = 0.01): ThompsonSampling<WeightedMeanResult>

Thompson sampling over a Poisson reward with a Gamma prior on the rate.

warmStart

fun BernoulliArm.Companion.warmStart(global: BernoulliSumResult, shrinkage: Double = 1.0): BernoulliArm

Warm-started BernoulliArm from a global Bernoulli snapshot.

fun LogNormalArm.Companion.warmStart(global: WeightedVarianceResult, shrinkage: Double = 1.0): LogNormalArm

Warm-started LogNormalArm from a global weighted-variance snapshot on the log scale (caller is responsible for ensuring the snapshot is over ln(reward)).

fun MeanArm.Companion.warmStart(global: WeightedMeanResult, shrinkage: Double = 1.0): MeanArm

Warm-started MeanArm from a global weighted-mean snapshot.

fun MomentsArm.Companion.warmStart(global: MomentsResult, shrinkage: Double = 1.0): MomentsArm

Warm-started MomentsArm from a global moments snapshot.

fun NormalArm.Companion.warmStart(global: WeightedVarianceResult, shrinkage: Double = 1.0): NormalArm

Warm-started NormalArm from a global weighted-variance snapshot. The arm's prior variance is preserved from the global; only the prior weight is shrunk.