kumulant

BoltzmannBandit

class BoltzmannBandit(val nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02, val initialTau: Double = 1.0, val minTau: Double = 0.001, val decay: Double = 1.0, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<WeightedMeanResult> (source)

Boltzmann exploration (a.k.a. softmax bandit): samples arm a with probability proportional to exp(mean[a] / tau(t)), where tau(t) is the temperature at round t and per-arm means are tracked by independent com.eignex.kumulant.stat.summary.MeanStat cells.

Default schedule cools as tau(t) = max(minTau, initialTau / t^decay); Cesa-Bianchi & Fischer's classical recipe with decay = 1. Pass a constant schedule (decay = 0) for fixed-temperature softmax. High temperature flattens the distribution toward uniform exploration; low temperature sharpens toward greedy exploitation.

The play distribution is a softmax over all arms, not an argmax over independent per-arm scores; so this bandit doesn't expose com.eignex.kumulant.bandit.Scorable, but its per-arm (mean, totalWeights) state still fits PerArmBandit.

Use cases: stationary scalar-reward problems where smooth probabilistic exploration is preferable to UCB's deterministic confidence bounds; any setting where a tunable cooling schedule is convenient.

Arms: indexless, nbrArms fixed at construction; each arm owns one com.eignex.kumulant.stat.summary.MeanStat.

Memory: O(nbrArms); one mean cell per arm plus a step counter.

Choose: O(nbrArms); softmax over per-arm means, inverse-CDF sample.

Update: O(1) on the targeted arm.

Randomness: every choose consumes one random.nextDouble() for the softmax draw; reproducible under a fixed seed.

Concurrency: per-arm com.eignex.kumulant.core.SeriesStat carries its own concurrency. The step counter is non-atomic; concurrent choose calls race on it and may yield duplicate t values; pin to a single thread when the cooling schedule must be exact.

Constructors

Link copied to clipboard
constructor(nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02, initialTau: Double = 1.0, minTau: Double = 0.001, decay: Double = 1.0, random: Random = Random.Default)

Properties

Link copied to clipboard

Cooling decay exponent: tau(t) = initialTau / t^decay. 0.0 is fixed-temperature.

Link copied to clipboard

Initial temperature; the schedule cools from here.

Link copied to clipboard

Floor on the temperature so the softmax never collapses to a delta.

Link copied to clipboard
open override val nbrArms: Int

Number of arms.

Link copied to clipboard
open override val random: Random

Single source of randomness.

Functions

Link copied to clipboard
open override fun armResult(armIndex: Int): WeightedMeanResult

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

Link copied to clipboard
open override fun choose(): Int

Sample arm from the softmax of per-arm means at the current temperature.

Link copied to clipboard
open override fun create(random: Random): BoltzmannBandit

Spawn a fresh bandit with the same configuration.

Link copied to clipboard
open override fun merge(other: List<WeightedMeanResult>)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

Link copied to clipboard

Current play distribution: softmax(mean / tau(t)). Also advances the internal step.

Link copied to clipboard
open override fun reset()

Reset arms to priors and step counter to zero.

Link copied to clipboard
open override fun snapshot(): List<WeightedMeanResult>

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

Link copied to clipboard

Current temperature: max(minTau, initialTau / step^decay).

Link copied to clipboard
open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)

Fold (arm, reward) into the per-arm mean.

Link copied to clipboard
open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.

BoltzmannBandit

constructor(nbrArms: Int, priorMean: Double = 0.0, priorWeight: Double = 0.02, initialTau: Double = 1.0, minTau: Double = 0.001, decay: Double = 1.0, random: Random = Random.Default)(source)

armResult

open override fun armResult(armIndex: Int): WeightedMeanResult(source)

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

choose

open override fun choose(): Int(source)

Sample arm from the softmax of per-arm means at the current temperature.

create

open override fun create(random: Random): BoltzmannBandit(source)

Spawn a fresh bandit with the same configuration.

decay

Cooling decay exponent: tau(t) = initialTau / t^decay. 0.0 is fixed-temperature.

initialTau

Initial temperature; the schedule cools from here.

merge

open override fun merge(other: List<WeightedMeanResult>)(source)

Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.

minTau

Floor on the temperature so the softmax never collapses to a delta.

nbrArms

open override val nbrArms: Int(source)

Number of arms.

playDistribution

Current play distribution: softmax(mean / tau(t)). Also advances the internal step.

random

open override val random: Random(source)

Single source of randomness.

reset

open override fun reset()(source)

Reset arms to priors and step counter to zero.

snapshot

open override fun snapshot(): List<WeightedMeanResult>(source)

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

temperature

Current temperature: max(minTau, initialTau / step^decay).

update

open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)(source)

Fold (arm, reward) into the per-arm mean.