kumulant

RouletteWheelBandit

class RouletteWheelBandit(val nbrArms: Int, val reactionFactor: Double = 0.1, val segmentLength: Int = 10, val initialWeight: Double = 1.0, val minWeight: Double = 0.01, val random: Random = Random.Default) : UnivariateBandit, PerArmBandit<RouletteWheelArmResult> , Scorable(source)

Adaptive operator-selection bandit in the Ropke-Pisinger 2006 ALNS scheme: each arm carries a weight, choose samples proportional to weights (roulette wheel), and weights re-balance in batches.

After every segmentLength calls, weights re-balance via w_i = w_i · (1 - reactionFactor) + reactionFactor · avgScore_i, where avgScore_i is the mean reward per call of arm i over the segment, floored at minWeight so no arm is permanently extinguished. Batched re-balance (rather than per-observation) is useful when rewards are noisy and continuous updates would thrash. Only meaningful for reward-maximisation: the weight increase is asymmetric and has no clean "minimise" dual; callers wanting to minimise should negate the reward before update.

Implemented as a direct UnivariateBandit rather than a BanditPolicy plugged into MultiArmedBandit because the re-balance is a global cross-arm operation (each arm's new weight depends on its segment-mean).

Use cases: operator selection inside meta-heuristics (ALNS, LNS), where reward signals are noisy and selection breadth across many candidate operators matters more than fine-grained per-step tracking.

Arms: indexless, nbrArms fixed at construction; per-arm state is (weight, segment score sum, segment call count).

Memory: O(nbrArms); three parallel arrays plus a segment counter.

Choose: O(nbrArms); sum the weights, inverse-CDF sample.

Update: O(1) amortised; O(nbrArms) on the segment boundary where the re-balance sweeps every arm.

Randomness: every choose consumes one random.nextDouble() (or one nextInt when all weights collapse to zero); reproducible under a fixed seed.

Concurrency: not thread-safe; weights, segment scores, and the segment counter are mutated without synchronisation. Serialise choose and update externally for multi-thread use.

Constructors

Link copied to clipboard
constructor(nbrArms: Int, reactionFactor: Double = 0.1, segmentLength: Int = 10, initialWeight: Double = 1.0, minWeight: Double = 0.01, random: Random = Random.Default)

Properties

Link copied to clipboard

Starting weight assigned to every arm.

Link copied to clipboard

Floor on the rebalanced weight; prevents arms from being permanently extinguished.

Link copied to clipboard
open override val nbrArms: Int

Number of arms in the population.

Link copied to clipboard
open override val random: Random

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

Link copied to clipboard

Blend factor for the Ropke-Pisinger weight update; 0 = no learning, 1 = pure segment-mean.

Link copied to clipboard

Number of update calls between successive weight rebalances.

Functions

Link copied to clipboard
open fun armResult(armIndex: Int): RouletteWheelArmResult

Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.

Link copied to clipboard
open override fun choose(): Int

Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).

Link copied to clipboard
open override fun create(random: Random): RouletteWheelBandit

Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).

Link copied to clipboard
open override fun evaluate(armIndex: Int): Double

Returns the arm's current weight. Used by choose is computed inline; this is the Scorable view exposed for inspection / debugging.

Link copied to clipboard
open override fun merge(other: List<RouletteWheelArmResult>)

Heuristic merge: weights are arithmetically averaged across replicas; scores and call counts are summed (treating the other replica's segment as additional unobserved data). Roulette-wheel adaptive selection has no canonical merge semantics - the segment-based rebalance is inherently sequential - so use this for "roughly combine two parallel runs" rather than for principled aggregation.

Link copied to clipboard
open override fun reset()

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

Link copied to clipboard
open override fun snapshot(): List<RouletteWheelArmResult>

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

Link copied to clipboard
open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)

Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.

Link copied to clipboard
open fun updateAll(armIndices: IntArray, values: DoubleArray, weights: DoubleArray? = null)

Batched update: fold one observation per arm/value pair in a single call. Equivalent to looping update but skips per-call overhead and may take a per-bandit lock once.

RouletteWheelBandit

constructor(nbrArms: Int, reactionFactor: Double = 0.1, segmentLength: Int = 10, initialWeight: Double = 1.0, minWeight: Double = 0.01, random: Random = Random.Default)(source)

choose

open override fun choose(): Int(source)

Pick an arm to play next. Uses Bandit.random for any sampling. The returned index is in [0, nbrArms). Repeated calls without intervening updates may return different arms (for randomised selection) or the same arm (for argmax-style policies once the leading arm is well-separated).

create

open override fun create(random: Random): RouletteWheelBandit(source)

Spawn a fresh bandit with the same configuration; state resets to the prior seed. The random source is replaced; pass the source you want the new bandit to use for exploration (which is independent of merging in another snapshot's state).

Useful when a worker accepts a stream of snapshots to apply sequentially: create(random).also { it.merge(snapshot) }.

evaluate

open override fun evaluate(armIndex: Int): Double(source)

Returns the arm's current weight. Used by choose is computed inline; this is the Scorable view exposed for inspection / debugging.

initialWeight

Starting weight assigned to every arm.

merge

open override fun merge(other: List<RouletteWheelArmResult>)(source)

Heuristic merge: weights are arithmetically averaged across replicas; scores and call counts are summed (treating the other replica's segment as additional unobserved data). Roulette-wheel adaptive selection has no canonical merge semantics - the segment-based rebalance is inherently sequential - so use this for "roughly combine two parallel runs" rather than for principled aggregation.

minWeight

Floor on the rebalanced weight; prevents arms from being permanently extinguished.

nbrArms

open override val nbrArms: Int(source)

Number of arms in the population.

random

open override val random: Random(source)

Single source of randomness for UnivariateBandit.choose / ContextualBandit.choose and any policy-internal sampling. Callers pass a Random(seed) at construction for reproducible exploration; the bandit threads the same instance through every randomised decision.

reactionFactor

Blend factor for the Ropke-Pisinger weight update; 0 = no learning, 1 = pure segment-mean.

reset

open override fun reset()(source)

Clear all state back to the prior-seeded baseline. Equivalent to spawning a fresh bandit with the same configuration via Snapshotable.create, but in place; keeps the same arm count, policy, concurrency mode, and random instance.

segmentLength

Number of update calls between successive weight rebalances.

snapshot

Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.

update

open override fun update(armIndex: Int, value: Double, weight: Double = 1.0)(source)

Fold a single observed reward value into the arm at armIndex with the given weight. Weight is the same observation-weight that runs through the rest of the library; typically 1.0, occasionally importance-weighted for off-policy correction.

Index out of range throws; some bandits also bound-check the value (e.g. Bernoulli arms require value in {0.0, 1.0}).