kumulant

DecisionTreeRegressionStat

class DecisionTreeRegressionStat(val featureSize: Int, val splitCandidates: List<Split>, val config: RegressionTreeConfig = RegressionTreeConfig(), val concurrency: Concurrency = Concurrency.None, leafArmFactory: () -> SeriesStat<WeightedVarianceResult> = { VarianceStat(concurrency) }, randomSeed: Int = 0) : RegressionStat<TreeRegressionResult> (source)

Online VFDT decision-tree regressor; a piecewise-constant predictor over the feature space, growing on the fly via the Hoeffding bound. Wraps a RegressionTree in the kumulant RegressionStat contract so it composes with everything that consumes regressors (the bandit family, schemas, op pipelines).

Snapshots are immutable TreeRegressionResults carrying the frozen split structure and per-node weighted-variance aggregates; bandits pair this with a tree-aware com.eignex.kumulant.stat.regression.RegressionPosterior (e.g. MeanTreePosterior or ThompsonTreePosterior) to score arms at choose time.

Reward encoding lives at the call site; pre-transform y (e.g. ln(y)) before update. The internal leaf accumulator is fixed to VarianceStat's WeightedVarianceResult so the VarianceReduction split metric applies.

Use cases: non-linear regression where the relationship between context and target is piecewise constant or step-like; bandit reward modelling, contextual stratification, anything where linear regression would miss the structure. Reach for RandomForestRegressionStat for ensembled diversity.

Memory: O(nodes · splitCandidates); a VarianceStat per node plus per-audit-leaf candidate accumulators. Bounded by RegressionTreeConfig.maxNodes.

Update: O(depth) per observation; a tree walk to the destination leaf, then an arm update at that leaf. Splits fire at most once every RegressionTreeConfig.splitPeriod observations per audit leaf.

Concurrency: The hot update path touches exactly one accumulator; the leaf the observation routes to. Internal split nodes carry no live arm; subtree aggregates (rootSnapshot, TreeSplitResult.value) are derived by combining descendants at snapshot time. Each leaf arm is a VarianceStat honouring Concurrency, so multiple threads landing in different leaves never contend. Split conversion takes a per-tree lock fired only at split decisions. Predictions (the load-bearing consumer for bandits) are race-free; the root-level aggregate TreeRegressionResult.totalWeights / rootMean is best-effort under concurrent growth and may drift by a few ULPs of the workload; single-threaded runs are exact. See RegressionTree for the full concurrency design.

Constructors

Link copied to clipboard
constructor(featureSize: Int, splitCandidates: List<Split>, config: RegressionTreeConfig = RegressionTreeConfig(), concurrency: Concurrency = Concurrency.None, leafArmFactory: () -> SeriesStat<WeightedVarianceResult> = { VarianceStat(concurrency) }, randomSeed: Int = 0)

Properties

Link copied to clipboard
open override val concurrency: Concurrency

The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:

Link copied to clipboard

Tunables shared with the underlying RegressionTree.

Link copied to clipboard
open override val featureSize: Int

Number of features expected in x on each update. Mismatched lengths throw.

Link copied to clipboard

Candidate splits considered at every audit leaf. Pass an empty list to disable growth; the regressor then degenerates to a single global accumulator.

Functions

Link copied to clipboard
open override fun create(concurrency: Concurrency? = null): DecisionTreeRegressionStat

Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.

Link copied to clipboard
open override fun merge(values: TreeRegressionResult)

Fold another accumulator's snapshot into this one. The unit of merge is the immutable Result; not a live Stat; which is what lets the merge cross a process boundary. Many workers track slices of the same stream, call read periodically, ship snapshots to a coordinator, and the coordinator merges them in.

Link copied to clipboard
open override fun read(timestampNanos: Long = currentTimeNanos()): TreeRegressionResult

Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.

Link copied to clipboard
open override fun reset()

Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.

Link copied to clipboard

Live underlying tree. Use for inspection / pretty-printing.

Link copied to clipboard
open fun update(x: VectorView, y: Double, weight: Double = 1.0)

Record an (x, y) observation with the given weight at the current time.

open fun update(x: DoubleArray, y: Double, weight: Double = 1.0)

Convenience overload that wraps x as a DenseVector.

open fun update(x: DoubleArray, y: Double, timestampNanos: Long, weight: Double = 1.0)

Timestamped convenience overload that wraps x as a DenseVector.

open override fun update(x: VectorView, y: Double, timestampNanos: Long, weight: Double = 1.0)

Record an (x, y) observation at timestampNanos with the given weight.

DecisionTreeRegressionStat

constructor(featureSize: Int, splitCandidates: List<Split>, config: RegressionTreeConfig = RegressionTreeConfig(), concurrency: Concurrency = Concurrency.None, leafArmFactory: () -> SeriesStat<WeightedVarianceResult> = { VarianceStat(concurrency) }, randomSeed: Int = 0)(source)

concurrency

open override val concurrency: Concurrency(source)

The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:

Picked at construction; immutable after.

config

Tunables shared with the underlying RegressionTree.

create

open override fun create(concurrency: Concurrency? = null): DecisionTreeRegressionStat(source)

Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.

The returned stat is independent: its state starts at the configured baseline, not at the source's current state. Each modality subtype narrows the return type so chaining doesn't lose the modality.

featureSize

open override val featureSize: Int(source)

Number of features expected in x on each update. Mismatched lengths throw.

merge

open override fun merge(values: TreeRegressionResult)(source)

Fold another accumulator's snapshot into this one. The unit of merge is the immutable Result; not a live Stat; which is what lets the merge cross a process boundary. Many workers track slices of the same stream, call read periodically, ship snapshots to a coordinator, and the coordinator merges them in.

Most stat families implement merge exactly (Chan-style parallel formulas for Welford, cell-wise additions for histograms, cell-wise max for HLL). SGD-based regressors merge approximately; they have no second-moment information for the principled combine. Each stat's KDoc documents its merge semantics.

read

open override fun read(timestampNanos: Long = currentTimeNanos()): TreeRegressionResult(source)

Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.

Snapshot consistency depends on the configured Concurrency. Under Concurrency.Strict / Concurrency.HighWrite a read locks against writers so coupled cells stay consistent. Under Concurrency.Relaxed the cells race and the snapshot may drift by ULPs of the workload under heavy contention; the drift is bounded and the read never throws.

timestampNanos is the read timestamp. Stats that don't care about time silently drop it; stats that do (rates, decay families, recency, windowed wrappers) use it as the ordering signal.

reset

open override fun reset()(source)

Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.

splitCandidates

Candidate splits considered at every audit leaf. Pass an empty list to disable growth; the regressor then degenerates to a single global accumulator.

tree

Live underlying tree. Use for inspection / pretty-printing.

update

open override fun update(x: VectorView, y: Double, timestampNanos: Long, weight: Double = 1.0)(source)

Record an (x, y) observation at timestampNanos with the given weight.