RandomForestRegressionStat
Online random-forest regressor; a population of RegressionTrees sharing the candidate-split pool. Diversity comes from:
Oza & Russell online bagging: per-tree Poisson(1) reweighting at every update.
Per-leaf mtry: each tree's audit leaves consider a random subset of the candidate splits, drawn at leaf birth from the tree's own RNG.
Snapshot is a ForestRegressionResult carrying every per-tree TreeRegressionResult; tree-aware posteriors merge per-tree leaf aggregates at score time.
Use cases: non-linear contextual regression with built-in variance estimation across trees; the natural backbone for Thompson-sampling contextual bandits. Reach for DecisionTreeRegressionStat alone when a single tree's predictions suffice and ensembled diversity isn't needed.
Memory: O(nbrTrees · single-tree memory); see DecisionTreeRegressionStat. Heavier but parallelisable.
Update: O(nbrTrees · depth) per observation; each tree's update is independent. Under bagging = true, each tree applies a fresh Poisson(1)-reweighted version of the update.
Concurrency: Inherits DecisionTreeRegressionStat's per-tree concurrency model. Trees are updated sequentially within a single update() call (no inner parallelism); concurrent callers each contend for each tree's split lock independently.
Constructors
Properties
The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:
RegressionTreeConfig with RegressionTreeConfig.mtry defaulted to ceil(sqrt(p)) when null.
Number of features expected in x on each update. Mismatched lengths throw.
Candidate split pool. Used by every tree; the per-leaf mtry filter draws from here.
Functions
Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.
Fold another accumulator's snapshot into this one. The unit of merge is the immutable Result; not a live Stat; which is what lets the merge cross a process boundary. Many workers track slices of the same stream, call read periodically, ship snapshots to a coordinator, and the coordinator merges them in.
Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.
Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.
Live underlying trees. Use for inspection.
Record an (x, y) observation with the given weight at the current time.
Convenience overload that wraps x as a DenseVector.
Timestamped convenience overload that wraps x as a DenseVector.
Record an (x, y) observation at timestampNanos with the given weight.
RandomForestRegressionStat
bagging
concurrency
The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:
Concurrency.None: single-threaded; no synchronisation. Cheapest path.
Concurrency.Relaxed: lock-free best-effort. Multi-cell stats (Welford-style MeanStat, VarianceStat, MomentsStat) may drift under contention but never throw.
Concurrency.Strict: serialised when needed for full correctness across coupled cells. Sketches always self-serialise; Welford stats lock per update.
Concurrency.HighWrite: optimised for many concurrent writers; JVM uses striped adders for naively additive stats.
Picked at construction; immutable after.
config
RegressionTreeConfig with RegressionTreeConfig.mtry defaulted to ceil(sqrt(p)) when null.
create
Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.
The returned stat is independent: its state starts at the configured baseline, not at the source's current state. Each modality subtype narrows the return type so chaining doesn't lose the modality.
featureSize
Number of features expected in x on each update. Mismatched lengths throw.
merge
Fold another accumulator's snapshot into this one. The unit of merge is the immutable Result; not a live Stat; which is what lets the merge cross a process boundary. Many workers track slices of the same stream, call read periodically, ship snapshots to a coordinator, and the coordinator merges them in.
Most stat families implement merge exactly (Chan-style parallel formulas for Welford, cell-wise additions for histograms, cell-wise max for HLL). SGD-based regressors merge approximately; they have no second-moment information for the principled combine. Each stat's KDoc documents its merge semantics.
nbrTrees
read
Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.
Snapshot consistency depends on the configured Concurrency. Under Concurrency.Strict / Concurrency.HighWrite a read locks against writers so coupled cells stay consistent. Under Concurrency.Relaxed the cells race and the snapshot may drift by ULPs of the workload under heavy contention; the drift is bounded and the read never throws.
timestampNanos is the read timestamp. Stats that don't care about time silently drop it; stats that do (rates, decay families, recency, windowed wrappers) use it as the ordering signal.
reset
Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.
splitCandidates
Candidate split pool. Used by every tree; the per-leaf mtry filter draws from here.
trees
Live underlying trees. Use for inspection.
update
Record an (x, y) observation at timestampNanos with the given weight.