StochasticRegressionStat
Online generalised linear regression by stochastic gradient descent on the canonical Link's negative log-likelihood plus optional Penalty. The cheapest of the multivariate regressors; point estimates only, no posterior, fast updates.
The per-coordinate update rule is owned by optimizer (Sgd / com.eignex.kumulant.schema.Adagrad / com.eignex.kumulant.schema.Rmsprop / com.eignex.kumulant.schema.Adam). The bias has its own biasOptimizer schedule because the intercept usually wants a different cadence than the coefficients.
Penalty.L1 and Penalty.L2 require optimizer (and biasOptimizer) to be Sgd; the lazy-update tricks they rely on (Bottou-style multiplicative scaling for L2; cumulative truncated gradient for L1) are SGD-specific. With a non-Sgd optimizer the penalty must be Penalty.None; folding L1/L2 into Adam-class updates is left for a future refactor.
Use cases: high-throughput online regression where point estimates suffice and the per-update cost must stay small. Reach for DiagonalRegressionStat when uncertainty is needed; for BayesianRegressionStat when the full posterior is needed.
Memory: O(featureSize); weights vector, bias, plus optimizer aux state.
Update: O(nnz(x)) per observation under Penalty.None; the L1/L2 paths add lazy-update bookkeeping with the same asymptotic cost.
Concurrency: Welford-coupled per-slot atomic under Concurrency.Relaxed (HOGWILD-style asynchronous SGD), serialised under Concurrency.Strict / Concurrency.HighWrite.
Constructors
Properties
Update rule for the bias scalar. Defaults to optimizer.
The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:
Number of features expected in x on each update. Mismatched lengths throw.
Canonical GLM link function; Link.Identity gives ordinary least-squares SGD.
Per-coordinate update rule for the weight vector.
Live view of the cumulative observation weight folded in.
Functions
Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.
Sample-weighted blend of weights and bias. SGD has no second-moment information, so this is an approximation; for principled merges use BayesianRegressionStat.
Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.
Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.
Record an (x, y) observation with the given weight at the current time.
Convenience overload that wraps x as a DenseVector.
Timestamped convenience overload that wraps x as a DenseVector.
Record an (x, y) observation at timestampNanos with the given weight.
StochasticRegressionStat
biasOptimizer
Update rule for the bias scalar. Defaults to optimizer.
bias
concurrency
The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:
Concurrency.None: single-threaded; no synchronisation. Cheapest path.
Concurrency.Relaxed: lock-free best-effort. Multi-cell stats (Welford-style MeanStat, VarianceStat, MomentsStat) may drift under contention but never throw.
Concurrency.Strict: serialised when needed for full correctness across coupled cells. Sketches always self-serialise; Welford stats lock per update.
Concurrency.HighWrite: optimised for many concurrent writers; JVM uses striped adders for naively additive stats.
Picked at construction; immutable after.
create
Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.
The returned stat is independent: its state starts at the configured baseline, not at the source's current state. Each modality subtype narrows the return type so chaining doesn't lose the modality.
featureSize
Number of features expected in x on each update. Mismatched lengths throw.
link
Canonical GLM link function; Link.Identity gives ordinary least-squares SGD.
merge
Sample-weighted blend of weights and bias. SGD has no second-moment information, so this is an approximation; for principled merges use BayesianRegressionStat.
optimizer
Per-coordinate update rule for the weight vector.
penalty
read
Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.
Snapshot consistency depends on the configured Concurrency. Under Concurrency.Strict / Concurrency.HighWrite a read locks against writers so coupled cells stay consistent. Under Concurrency.Relaxed the cells race and the snapshot may drift by ULPs of the workload under heavy contention; the drift is bounded and the read never throws.
timestampNanos is the read timestamp. Stats that don't care about time silently drop it; stats that do (rates, decay families, recency, windowed wrappers) use it as the ordering signal.
reset
Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.
sse
step
totalWeights
Live view of the cumulative observation weight folded in.
update
Record an (x, y) observation at timestampNanos with the given weight.