com.eignex.kumulant/bandit/univariate/BoltzmannBandit

BoltzmannBandit

class BoltzmannBandit(val nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02, val initialTau: Double = 1.0, val minTau: Double = 0.001, val decay: Double = 1.0, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<WeightedMeanResult> (source)

Boltzmann exploration (a.k.a. softmax bandit): samples arm a with probability proportional to exp(mean[a] / tau(t)), where tau(t) is the temperature at round t and per-arm means are tracked by independent com.eignex.kumulant.stat.summary.MeanStat cells.

Default schedule cools as tau(t) = max(minTau, initialTau / t^decay); Cesa-Bianchi & Fischer's classical recipe with decay = 1. Pass a constant schedule (decay = 0) for fixed-temperature softmax. High temperature flattens the distribution toward uniform exploration; low temperature sharpens toward greedy exploitation.

The play distribution is a softmax over all arms, not an argmax over independent per-arm scores; so this bandit doesn't expose com.eignex.kumulant.bandit.Scorable, but its per-arm (mean, totalWeights) state still fits PerArmBandit.

Use cases: stationary scalar-reward problems where smooth probabilistic exploration is preferable to UCB's deterministic confidence bounds; any setting where a tunable cooling schedule is convenient.

Arms: indexless, nbrArms fixed at construction; each arm owns one com.eignex.kumulant.stat.summary.MeanStat.

Memory: O(nbrArms); one mean cell per arm plus a step counter.

Choose: O(nbrArms); softmax over per-arm means, inverse-CDF sample.

Update: O(1) on the targeted arm.

Randomness: every choose consumes one random.nextDouble() for the softmax draw; reproducible under a fixed seed.

Concurrency: per-arm com.eignex.kumulant.core.SeriesStat carries its own concurrency. The step counter is non-atomic; concurrent choose calls race on it and may yield duplicate t values; pin to a single thread when the cooling schedule must be exact.

Constructors

BoltzmannBandit

constructor(nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02, initialTau: Double = 1.0, minTau: Double = 0.001, decay: Double = 1.0, random: Random = Random.Default)(source)

Properties

decay

val decay: Double(source)

Cooling decay exponent: tau(t) = initialTau / t^decay. 0.0 is fixed-temperature.

initialTau

val initialTau: Double(source)

Initial temperature; the schedule cools from here.

minTau

val minTau: Double(source)

Floor on the temperature so the softmax never collapses to a delta.

nbrArms

open override val nbrArms: Int(source)

Number of arms.

random

open override val random: Random(source)

Single source of randomness.

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

temperature

fun temperature(): Double(source)

Current temperature: max(minTau, initialTau / step^decay).

update

open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)(source)

Fold (arm, reward) into the per-arm mean.

updateAll

open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.

BoltzmannBandit

Constructors

BoltzmannBandit

Properties

decay

initialTau

minTau

nbrArms

random

Functions

armResult

choose

create

merge

playDistribution

reset

snapshot

temperature

update