BoltzmannBandit
Boltzmann exploration (a.k.a. softmax bandit): samples arm a with probability proportional to exp(mean[a] / tau(t)), where tau(t) is the temperature at round t and per-arm means are tracked by independent com.eignex.kumulant.stat.summary.MeanStat cells.
Default schedule cools as tau(t) = max(minTau, initialTau / t^decay); Cesa-Bianchi & Fischer's classical recipe with decay = 1. Pass a constant schedule (decay = 0) for fixed-temperature softmax. High temperature flattens the distribution toward uniform exploration; low temperature sharpens toward greedy exploitation.
The play distribution is a softmax over all arms, not an argmax over independent per-arm scores; so this bandit doesn't expose com.eignex.kumulant.bandit.Scorable, but its per-arm (mean, totalWeights) state still fits PerArmBandit.
Use cases: stationary scalar-reward problems where smooth probabilistic exploration is preferable to UCB's deterministic confidence bounds; any setting where a tunable cooling schedule is convenient.
Arms: indexless, nbrArms fixed at construction; each arm owns one com.eignex.kumulant.stat.summary.MeanStat.
Memory: O(nbrArms); one mean cell per arm plus a step counter.
Choose: O(nbrArms); softmax over per-arm means, inverse-CDF sample.
Update: O(1) on the targeted arm.
Randomness: every choose consumes one random.nextDouble() for the softmax draw; reproducible under a fixed seed.
Concurrency: per-arm com.eignex.kumulant.core.SeriesStat carries its own concurrency. The step counter is non-atomic; concurrent choose calls race on it and may yield duplicate t values; pin to a single thread when the cooling schedule must be exact.
Constructors
Functions
Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.
Spawn a fresh bandit with the same configuration.
Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.
Current play distribution: softmax(mean / tau(t)). Also advances the internal step.
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
Current temperature: max(minTau, initialTau / step^decay).
BoltzmannBandit
armResult
Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.
choose
create
Spawn a fresh bandit with the same configuration.
decay
initialTau
Initial temperature; the schedule cools from here.
merge
Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.
minTau
nbrArms
playDistribution
Current play distribution: softmax(mean / tau(t)). Also advances the internal step.
random
reset
snapshot
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
temperature
Current temperature: max(minTau, initialTau / step^decay).