KnnContextualBandit
Non-parametric contextual bandit: each arm keeps a bounded FIFO history of past (context, reward, weight) observations and is scored at choose time by the empirical mean reward over the k nearest historical contexts, plus an optional UCB-style bonus that decays with the arm's cumulative weight.
Distance: defaults to squared L2 between dense vectors; supply distance for custom metrics (Mahalanobis, cosine, kernelised).
History cap: each arm's reservoir is bounded to maxHistoryPerArm; when full, new observations overwrite the oldest.
Cold start: arms with fewer than
kobservations scorecoldStartScore + ucbBonus, so they are explored before more populous arms.
Per-arm state is a history rather than a sufficient statistic, so the PerArmBandit snapshot is a KnnArmResult (the bounded reservoir itself) rather than a scalar summary.
Use cases: contextual problems where reward is a smooth function of context but the functional form is unknown; small-to-medium feature dimensions where exact k-NN is affordable; settings where interpretable "similar past contexts" reasoning is valuable.
Arms: contextual with caller-defined feature dimension; nbrArms fixed at construction. Per-arm reservoir is bounded by maxHistoryPerArm.
Memory: O(nbrArms · maxHistoryPerArm · featureSize); bounded per-arm history of context copies plus parallel reward/weight arrays.
Choose: O(nbrArms · maxHistoryPerArm · (featureSize + k)); linear scan over each arm's history with a bounded top-k heap.
Update: O(featureSize); append context copy and roll the oldest entry off when capped.
Randomness: random is held for API uniformity but currently unused; choose is deterministic, breaking ties by lowest arm index.
Concurrency: not thread-safe; per-arm history mutable lists, the total-weight array, and the step counter are mutated without synchronisation. Serialise choose and update externally for multi-thread use.
Constructors
Properties
Optimistic value assigned to arms with no history yet; drives initial exploration.
Pairwise distance between context vectors; defaults to squared L2.
UCB-style exploration scale on sqrt(ln(totalSteps) / armWeight); 0.0 disables.
Maximum observations retained per arm; older entries roll off via FIFO.
Functions
Per-arm snapshot at armIndex. Default implementation reads from the full snapshot; implementations may override to avoid building the entire list when only one arm is needed.
Argmax over per-arm evaluate scores. Ties broken by lowest index.
Spawn a fresh bandit with the same configuration; history resets to empty.
Live arm history size for armIndex.
Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.
KnnContextualBandit
armWeight
choose
Argmax over per-arm evaluate scores. Ties broken by lowest index.
coldStartScore
Optimistic value assigned to arms with no history yet; drives initial exploration.
create
Spawn a fresh bandit with the same configuration; history resets to empty.
distance
Pairwise distance between context vectors; defaults to squared L2.
evaluate
exploration
UCB-style exploration scale on sqrt(ln(totalSteps) / armWeight); 0.0 disables.
historySize
Live arm history size for armIndex.
k
maxHistoryPerArm
Maximum observations retained per arm; older entries roll off via FIFO.
merge
Fold another replica's other state into this bandit. Most families merge exactly via the underlying stat's parallel-merge formula; SGD- based contextual bandits merge approximately. Each concrete bandit's KDoc documents its merge semantics.
nbrArms
random
reset
snapshot
Materialise the current state as a serialisable snapshot. Reads are non-mutating; call as often as needed without affecting decisions. Same snapshot consistency rules as com.eignex.kumulant.core.Stat.read ; under com.eignex.kumulant.core.Concurrency.Relaxed coupled cells may drift by ULPs.