kumulant

bandit.univariate

Indexless multi-armed bandits. Each round, the bandit picks one of K arms via choose(), the caller observes a reward, and update(arm, value, weight) folds it back into that arm's accumulator. No per-round feature vector; for that, see com.eignex.kumulant.bandit.contextual.

Bandit shells

MultiArmedBandit is the workhorse. It carries a BanditPolicy and a list of Arms; choose delegates to the policy's scoring rule. Most named bandits in the literature (UCB1, Thompson, epsilon-greedy, etc.) are just a policy swap on this shell.

BanditSelection rule
MultiArmedBanditArgmax over per-arm policy scores, or whatever else the policy implements (joint sampling, etc.).
BoltzmannBanditSoftmax over per-arm means with a cooling temperature schedule.
Exp3BanditAdversarial bandit with exponential-weights updates and a regret bound under non-stationary reward distributions.
RouletteWheelBanditOperator-selection roulette where arm probability is proportional to score. Used in meta-heuristics where each "arm" is a neighbourhood move and the score is the recent improvement rate.
TopTwoThompsonBanditTop-two Thompson sampling: draw two samples, play the second-best with probability 1 - beta. Identifies the best arm faster than vanilla Thompson when many arms are competitive.

Policies

BanditPolicy is the scoring strategy plugged into MultiArmedBandit. The library ships every commonly-cited one:

PolicyFamily
GreedyPure exploitation; argmax of point estimates. The baseline.
EpsilonGreedy, EpsilonDecreasingMix of exploitation and uniform-random exploration; epsilon either fixed or annealed.
UniformSelectionPure exploration, used as a baseline.
UCB1, UCB1Normal, UCB1TunedUpper-confidence-bound family; variants differ in confidence-interval shape.
KlUcbKL-UCB; tighter bound than UCB1 for Bernoulli arms.
MossMOSS bound; near-optimal regret across stationary settings.
UcbVUCB-V; UCB with variance-aware confidence width.
ThompsonSamplingPosterior sampling: draw from each arm's posterior, play argmax.

Each policy reads its per-arm state through a Posterior adapter that projects the arm's com.eignex.kumulant.core.Result to whatever the scoring rule needs (a mean, a Beta posterior, a normal-gamma posterior, etc.). The GammaScalePosterior is the canonical example used by BoltzmannBandit for variance-scaled softmax temperatures.

Arms

Arm is the per-arm state contract. Each arm pairs a stat with the com.eignex.kumulant.core.Concurrency and reset story it needs:

ArmBacking statSuits
BernoulliArmcom.eignex.kumulant.stat.summary.BernoulliSumStat + countBinary rewards (click / no click, pass / fail).
MeanArmcom.eignex.kumulant.stat.summary.MeanStatContinuous rewards where mean suffices.
NormalArmcom.eignex.kumulant.stat.summary.VarianceStatContinuous rewards with normally-distributed noise; carries enough state for UCB-V and Thompson with normal-gamma.
LogNormalArmWelford over log(value)Multiplicative rewards (revenue, latency).
MomentsArmcom.eignex.kumulant.stat.summary.MomentsStatHigher-order shape matters (skewness / kurtosis aware scoring).

CompositeArm (and CompositeSubArm) model multi-component rewards like zero-inflated lognormal revenue, without writing a per-shape arm class. Routing and score combination travel as com.eignex.kumulant.schema.ScalarExpr expressions, so the whole composite round-trips on the wire alongside the rest of the bandit config.

Wire portability

UnivariateBanditSpec is the sealed root of wire-portable bandit configs:

Configurations and policies round-trip through skema-based JSON / CBOR just like the com.eignex.kumulant.schema.StatSpec family. The materializer in com.eignex.kumulant.bandit takes a spec and a Random and returns the live bandit; pass the same seed across replicas for reproducible exploration.

Interface hierarchy

See com.eignex.kumulant.bandit for the action/state interface split: which bandits expose com.eignex.kumulant.bandit.Scorable, which implement com.eignex.kumulant.bandit.PerArmBandit, and where joint- sampling bandits diverge from the per-arm-score path.

Types

Link copied to clipboard
@Serializable
sealed interface Arm<R : Result>

Recipe for one bandit arm's cumulator side: how to build a freshly-seeded SeriesStat for that arm, and how to encode a raw observation before folding it into the stat. Posteriors and BanditPolicys pair with arm specs of the same R.

Link copied to clipboard
interface BanditPolicy<R : Result>

Scoring strategy for a com.eignex.kumulant.bandit.univariate.MultiArmedBandit. Decides which arm to play given snapshots of each arm's sufficient statistic R. The bandit calls evaluate for every arm and picks the argmax; the policy is the entire exploration/exploitation knob.

Link copied to clipboard
@Serializable
sealed interface BanditPolicySpec<R : Result>

Wire-portable specification for a BanditPolicy.

Link copied to clipboard
@Serializable
@SerialName(value = "BernoulliArm")
data class BernoulliArm(val priorAlpha: Double = 1.0, val priorBeta: Double = 1.0) : Arm<BernoulliSumResult>

Bernoulli arm. The reward is binary {0, 1} and the unknown is the success probability p. A Beta(priorAlpha, priorBeta) prior is conjugate to the Bernoulli likelihood; the posterior is Beta(priorAlpha + successes, priorBeta + failures).

Link copied to clipboard
@Serializable
@SerialName(value = "BetaPosterior")
data object BetaPosterior : Posterior<BernoulliSumResult>

Beta posterior over a Bernoulli rate. successes and trials-successes are the Beta parameters; both must be positive (i.e. snapshot must be prior-seeded).

Link copied to clipboard
class BoltzmannBandit(val nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02, val initialTau: Double = 1.0, val minTau: Double = 0.001, val decay: Double = 1.0, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<WeightedMeanResult>

Boltzmann exploration (a.k.a. softmax bandit): samples arm a with probability proportional to exp(mean[a] / tau(t)), where tau(t) is the temperature at round t and per-arm means are tracked by independent com.eignex.kumulant.stat.summary.MeanStat cells.

Link copied to clipboard
@Serializable
@SerialName(value = "Boltzmann")
data class BoltzmannSpec(val nbrArms: Int, val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val initialTau: Double = 1.0, val minTau: Double = 0.001, val decay: Double = 1.0) : UnivariateBanditSpec

Spec for BoltzmannBandit.

Link copied to clipboard
@Serializable
@SerialName(value = "CompositeArm")
data class CompositeArm(val subArms: List<CompositeSubArm>) : Arm<ResultList<Result>>

Composite Arm built from N independent sub-arms. Each observation is routed through every sub-arm's CompositeSubArm.valueExpr / CompositeSubArm.weightExpr / CompositeSubArm.filter AST before being fed to the corresponding per-sub-arm accumulator. The composite result is a ResultList of the sub-snapshots; pair with CompositePosterior to combine sub-arm draws into a single score.

Link copied to clipboard
@Serializable
@SerialName(value = "CompositePosterior")
data class CompositePosterior(val subPosteriors: List<Posterior<*>>, val combine: ScalarExpr) : Posterior<ResultList<Result>>

Composite Posterior over the sub-snapshots produced by a CompositeArm. Each sub-posterior draws independently; the resulting samples are packed as V(0)..V(N-1) and reduced to a single score by the combine AST.

Link copied to clipboard
@Serializable
@SerialName(value = "CompositeSubArm")
data class CompositeSubArm(val arm: Arm<*>, val valueExpr: ScalarExpr = X, val weightExpr: ScalarExpr = Const(1.0), val filter: BoolExpr? = null)

One leg of a CompositeArm: which arm receives observations, with optional AST-driven transformation of value, weight, and a filter predicate.

Link copied to clipboard
class EpsilonDecreasing(val epsilon: Double = 2.0, val decay: Double = 0.5, priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Annealed epsilon-greedy. Effective exploration probability decreases as sample count accumulates: eps(t) = min(1, epsilon / totalSamples^decay).

Link copied to clipboard
@Serializable
@SerialName(value = "EpsilonDecreasing")
data class EpsilonDecreasingSpec(val epsilon: Double = 2.0, val decay: Double = 0.5, val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>
Link copied to clipboard
class EpsilonGreedy(val epsilon: Double = 0.1, priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Epsilon-greedy: with probability epsilon pick a uniformly random arm (explore), otherwise pick the arm with the highest mean (exploit). The simplest exploration scheme that actually works; no math machinery, tune one knob.

Link copied to clipboard
@Serializable
@SerialName(value = "EpsilonGreedy")
data class EpsilonGreedySpec(val epsilon: Double = 0.1, val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>

Spec for EpsilonGreedy.

Link copied to clipboard
@Serializable
@SerialName(value = "Exp3ArmResult")
data class Exp3ArmResult(val weight: Double) : Result

Per-arm snapshot for Exp3Bandit: the exponential-weight cell for one arm.

Link copied to clipboard
class Exp3Bandit(val nbrArms: Int, val eta: Double = defaultEta(nbrArms), val gamma: Double = (nbrArms * eta).coerceAtMost(1.0), val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<Exp3ArmResult>

EXP3 (Auer, Cesa-Bianchi, Freund, Schapire 2002); adversarial multi-armed bandit over a fixed pool of nbrArms. Each round: compute play distribution p[a] = (1 - gamma) · w[a]/Σw + gamma/K, sample a ~ p, then on reward r ∈ [0,1] update w[a] *= exp(eta · r / p[a]) using the importance-sampling-corrected gain.

Link copied to clipboard
@Serializable
@SerialName(value = "Exp3")
data class Exp3Spec(val nbrArms: Int, val eta: Double? = null, val gamma: Double? = null) : UnivariateBanditSpec

Spec for Exp3Bandit. Pass null for eta / gamma to use the algorithm's defaults.

Link copied to clipboard
@Serializable
@SerialName(value = "ExponentialGammaPosterior")
data object ExponentialGammaPosterior : Posterior<WeightedMeanResult>

Gamma posterior over an Exponential rate.

Link copied to clipboard
@Serializable
@SerialName(value = "GammaScalePosterior")
data class GammaScalePosterior(val fixedShape: Double) : Posterior<WeightedMeanResult>

Gamma posterior over the scale of a Gamma likelihood with fixed shape - the shape is a posterior parameter rather than something we infer from data. Not an object because of that parameter.

Link copied to clipboard
@Serializable
@SerialName(value = "GeometricBetaPosterior")
data object GeometricBetaPosterior : Posterior<WeightedMeanResult>

Beta posterior over a Geometric success probability.

Link copied to clipboard
class Greedy(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Pure-exploitation policy: always picks the arm with the highest posterior mean. No exploration at all; converges fastest to the apparent best arm but can lock into a suboptimal arm forever if early rewards mislead it.

Link copied to clipboard
@Serializable
@SerialName(value = "Greedy")
data class GreedySpec(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>

Spec for Greedy.

Link copied to clipboard
class KlUcb(val c: Double = 0.0, val tolerance: Double = 1.0E-6, priorAlpha: Double = 1.0, priorBeta: Double = 1.0) : BanditPolicy<BernoulliSumResult>

KL-UCB (Garivier & Cappé 2011). UCB variant for Bernoulli arms with a KL-divergence confidence bound instead of the Hoeffding bound UCB1 uses. Score is the largest q in [mean, 1] such that n * KL(mean, q) <= ln(t) + c * ln(ln(t)); computed by binary search with tolerance precision.

Link copied to clipboard
@Serializable
@SerialName(value = "KlUcb")
data class KlUcbSpec(val c: Double = 0.0, val tolerance: Double = 1.0E-6, val priorAlpha: Double = 1.0, val priorBeta: Double = 1.0) : BanditPolicySpec<BernoulliSumResult>

Spec for KlUcb.

Link copied to clipboard
@Serializable
@SerialName(value = "LogNormalArm")
data class LogNormalArm(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 2.0) : Arm<WeightedVarianceResult>

Like NormalArm but folds ln(value) into the stat via encode. The right pick when rewards are multiplicative rather than additive; revenue per session, latency in milliseconds, anything where the noise scales with the magnitude.

Link copied to clipboard
@Serializable
@SerialName(value = "LogNormalGammaPosterior")
data object LogNormalGammaPosterior : Posterior<WeightedVarianceResult>

Log-Normal-Gamma posterior: same draw as NormalGammaPosterior but exp-transformed back to the real scale. Intended for arms whose stat already accumulates log-rewards (see LogNormalArm's encode).

Link copied to clipboard
@Serializable
@SerialName(value = "MeanArm")
data class MeanArm(val priorMean: Double = 1.0, val priorWeight: Double = 0.01) : Arm<WeightedMeanResult>

Single-moment arm; tracks the running mean but not variance. The right pick when the likelihood's sufficient statistic is one running sum (or equivalently a running mean × count):

Link copied to clipboard
@Serializable
@SerialName(value = "MomentsArm")
data class MomentsArm(val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : Arm<MomentsResult>

Moments-tracking arm. Backs MomentsStat under the hood, which means the snapshot exposes the raw second moment m2 (in addition to mean and variance); required by the variance-aware UCB policies that need mean-of-squares directly:

Link copied to clipboard
class Moss(val nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<WeightedMeanResult>

MOSS; Minimax Optimal Strategy in the Stochastic case (Audibert & Bubeck 2009). UCB variant where the confidence bound shrinks faster than UCB1 once an arm has accumulated more than t / K samples. Score is mean + sqrt(max(0, ln(t / (K * n))) / n).

Link copied to clipboard
@Serializable
@SerialName(value = "Moss")
data class MossSpec(val nbrArms: Int, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<WeightedMeanResult>

Spec for Moss.

Link copied to clipboard
class MultiArmedBandit<R : Result>(val nbrArms: Int, val policy: BanditPolicy<R>, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<R> , Scorable

Univariate bandit with a fixed number of independent arms, each backed by a kumulant SeriesStat; on every choose the bandit asks the policy to score a fresh snapshot per arm and picks the argmax.

Link copied to clipboard
@Serializable
@SerialName(value = "MultiArmed")
data class MultiArmedSpec<R : Result>(val nbrArms: Int, val policy: BanditPolicySpec<R>) : UnivariateBanditSpec
Link copied to clipboard
@Serializable
@SerialName(value = "NormalArm")
data class NormalArm(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : Arm<WeightedVarianceResult>

Gaussian arm with a Normal-Gamma prior (unknown mean and variance). Tracks both the running mean and the sum of squared deviations, which gives NormalGammaPosterior enough to draw (mean, variance) jointly and gives the variance-aware policies (Greedy, EpsilonGreedy) a reasonable variance estimate.

Link copied to clipboard
@Serializable
@SerialName(value = "NormalGammaPosterior")
data object NormalGammaPosterior : Posterior<WeightedVarianceResult>

Normal-Gamma posterior over a normal mean/variance. Draws (variance, mean) jointly: variance ~ Inverse-Gamma(n/2, n*s^2/2), mean | variance ~ Normal(snapshot.mean, sigma^2/n).

Link copied to clipboard
@Serializable
@SerialName(value = "PoissonGammaPosterior")
data object PoissonGammaPosterior : Posterior<WeightedMeanResult>

Gamma posterior over a Poisson rate: Gamma(sum, totalWeights).

Link copied to clipboard
@Serializable
sealed interface Posterior<R : Result>

Stateless conjugate posterior over a univariate likelihood, parameterised by the sufficient-statistic snapshot R. A Posterior is a pure (snapshot, rng) -> sample function: no priors, no per-arm state, no update path. Arm lifecycle, value encoding, and prior seeding all live in Arm.

Link copied to clipboard
@Serializable
@SerialName(value = "RouletteWheelArmResult")
data class RouletteWheelArmResult(val weight: Double, val accumulatedScore: Double, val callCount: Int) : Result

Per-arm state snapshot for RouletteWheelBandit. Exposes the current weight plus the running segment counters callers may want to inspect for debugging.

Link copied to clipboard
class RouletteWheelBandit(val nbrArms: Int, val reactionFactor: Double = 0.1, val segmentLength: Int = 10, val initialWeight: Double = 1.0, val minWeight: Double = 0.01, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<RouletteWheelArmResult> , Scorable

Adaptive operator-selection bandit in the Ropke-Pisinger 2006 ALNS scheme: each arm carries a weight, choose samples proportional to weights (roulette wheel), and weights re-balance in batches.

Link copied to clipboard
@Serializable
@SerialName(value = "RouletteWheel")
data class RouletteWheelSpec(val nbrArms: Int, val reactionFactor: Double = 0.1, val segmentLength: Int = 10, val initialWeight: Double = 1.0, val minWeight: Double = 0.01) : UnivariateBanditSpec
Link copied to clipboard
class ThompsonSampling<R : Result>(val arm: Arm<R>, val posterior: Posterior<R>) : BanditPolicy<R>

Thompson sampling: score each arm by a draw from its conjugate posterior given the snapshot. The bandit then picks the arm with the highest sample; no explicit exploration knob, the exploration falls out of posterior variance shrinking as data accumulates.

Link copied to clipboard
@Serializable
@SerialName(value = "ThompsonSampling")
data class ThompsonSamplingSpec<R : Result>(val arm: Arm<R>, val posterior: Posterior<R>) : BanditPolicySpec<R>
Link copied to clipboard
class TopTwoThompsonBandit<R : Result>(val nbrArms: Int, val policy: ThompsonSampling<R>, val beta: Double = 0.5, val maxResamples: Int = 32, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<R>

Top-Two Thompson Sampling (Russo 2020); pure-exploration variant of Thompson sampling for best-arm identification: sample every arm's posterior, take the argmax arm1, play it with probability beta, or else resample until the argmax differs from arm1 and play that runner-up.

Link copied to clipboard
@Serializable
@SerialName(value = "TopTwoThompson")
data class TopTwoThompsonSpec<R : Result>(val nbrArms: Int, val policy: ThompsonSamplingSpec<R>, val beta: Double = 0.5, val maxResamples: Int = 32) : UnivariateBanditSpec
Link copied to clipboard
class UCB1(val alpha: Double = 1.0, priorAlpha: Double = 1.0, priorBeta: Double = 1.0) : BanditPolicy<BernoulliSumResult>

Classical UCB1 (Auer, Cesa-Bianchi, Fischer 2002). Score is mean + alpha * sqrt(2 * ln(totalSamples) / armSamples); exploitation (running mean) plus a confidence bound that shrinks as the arm accumulates pulls. Unexplored arms get +infinity so they're tried at least once.

Link copied to clipboard
class UCB1Normal(val alpha: Double = 1.0, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<MomentsResult>

UCB1-Normal (Auer et al. 2002). Variance-aware UCB for Gaussian rewards; uses the sample variance derived from the MomentsResult snapshot to scale the confidence bound. Reach for it when rewards are roughly Gaussian and unbounded; UCB1's [0, 1] assumption doesn't hold.

Link copied to clipboard
@Serializable
@SerialName(value = "UCB1Normal")
data class Ucb1NormalSpec(val alpha: Double = 1.0, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<MomentsResult>

Spec for UCB1Normal.

Link copied to clipboard
@Serializable
@SerialName(value = "UCB1")
data class Ucb1Spec(val alpha: Double = 1.0, val priorAlpha: Double = 1.0, val priorBeta: Double = 1.0) : BanditPolicySpec<BernoulliSumResult>

Spec for UCB1.

Link copied to clipboard
class UCB1Tuned(val alpha: Double = 1.0, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<MomentsResult>

UCB1-Tuned (Auer et al. 2002). Same shape as UCB1 but the confidence bound multiplier uses an upper bound on the variance: min(0.25, v) where v is the sample variance plus a small padding term. Tighter bound than plain UCB1 when the empirical variance is well below 0.25; degrades gracefully to UCB1 when the variance is uninformative.

Link copied to clipboard
@Serializable
@SerialName(value = "UCB1Tuned")
data class Ucb1TunedSpec(val alpha: Double = 1.0, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<MomentsResult>

Spec for UCB1Tuned.

Link copied to clipboard
class UcbV(val zeta: Double = 1.2, val c: Double = 1.0, priorMean: Double = 0.0, priorWeight: Double = 0.02) : BanditPolicy<MomentsResult>

UCB-V; variance-aware UCB with finite-sample honesty (Audibert, Munos, Szepesvári 2009). Score is mean + sqrt(2 * V * zeta * ln(t) / n) + 3 * c * zeta * ln(t) / n, where V is the running variance from the MomentsResult snapshot.

Link copied to clipboard
@Serializable
@SerialName(value = "UcbV")
data class UcbVSpec(val zeta: Double = 1.2, val c: Double = 1.0, val priorMean: Double = 0.0, val priorWeight: Double = 0.02) : BanditPolicySpec<MomentsResult>

Spec for UcbV.

Link copied to clipboard
class UniformSelection(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02) : BanditPolicy<WeightedVarianceResult>

Pure-exploration policy: every evaluate returns a fresh uniform draw, so the bandit picks arms uniformly at random regardless of observations. No exploitation at all; the opposite extreme of Greedy.

Link copied to clipboard
@Serializable
@SerialName(value = "UniformSelection")
data class UniformSelectionSpec(val priorMean: Double = 0.0, val priorWeight: Double = 0.02, val priorSquaredDeviations: Double = 0.02) : BanditPolicySpec<WeightedVarianceResult>
Link copied to clipboard
@Serializable
sealed interface UnivariateBanditSpec

Wire-portable specification for a univariate bandit instance.

Functions

Link copied to clipboard
fun BetaBernoulliTS(priorAlpha: Double = 1.0, priorBeta: Double = 1.0): ThompsonSampling<BernoulliSumResult>

Thompson sampling over a Beta(priorAlpha, priorBeta) prior on a Bernoulli reward.

Link copied to clipboard
fun ExponentialTS(priorMean: Double = 1.0, priorWeight: Double = 0.01): ThompsonSampling<WeightedMeanResult>

Thompson sampling over an exponential reward with a Gamma prior on the rate.

Link copied to clipboard
fun GammaScaleTS(fixedShape: Double, priorMean: Double = 1.0, priorWeight: Double = 0.1): ThompsonSampling<WeightedMeanResult>

Thompson sampling over a Gamma reward with known shape and Gamma prior on the scale.

Link copied to clipboard
fun GeometricTS(priorMean: Double = 2.0, priorWeight: Double = 1.0): ThompsonSampling<WeightedMeanResult>

Thompson sampling over a geometric reward with a Beta prior on the success probability.

Link copied to clipboard
fun LogNormalTS(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 2.0): ThompsonSampling<WeightedVarianceResult>

Thompson sampling over a log-normal reward via Normal-Gamma on log(value).

Link copied to clipboard
fun NormalTS(priorMean: Double = 0.0, priorWeight: Double = 0.02, priorSquaredDeviations: Double = 0.02): ThompsonSampling<WeightedVarianceResult>

Thompson sampling over a Normal-Gamma prior; unknown mean and variance.

Link copied to clipboard
fun PoissonTS(priorMean: Double = 1.0, priorWeight: Double = 0.01): ThompsonSampling<WeightedMeanResult>

Thompson sampling over a Poisson reward with a Gamma prior on the rate.

Link copied to clipboard

Warm-started BernoulliArm from a global Bernoulli snapshot.

Warm-started LogNormalArm from a global weighted-variance snapshot on the log scale (caller is responsible for ensuring the snapshot is over ln(reward)).

Warm-started MeanArm from a global weighted-mean snapshot.

Warm-started MomentsArm from a global moments snapshot.

Warm-started NormalArm from a global weighted-variance snapshot. The arm's prior variance is preserved from the global; only the prior weight is shrunk.