com.eignex.kumulant/stat/regression/glm/StochasticRegressionStat

StochasticRegressionStat

class StochasticRegressionStat(val featureSize: Int, val optimizer: OptimizerSpec = Sgd(), val biasOptimizer: OptimizerSpec = optimizer, val penalty: Penalty = Penalty.None, val link: Link = Link.Identity, val concurrency: Concurrency = Concurrency.None) : RegressionStat<StochasticRegressionResult> (source)

Online generalised linear regression by stochastic gradient descent on the canonical Link's negative log-likelihood plus optional Penalty. The cheapest of the multivariate regressors; point estimates only, no posterior, fast updates.

The per-coordinate update rule is owned by optimizer (Sgd / com.eignex.kumulant.schema.Adagrad / com.eignex.kumulant.schema.Rmsprop / com.eignex.kumulant.schema.Adam). The bias has its own biasOptimizer schedule because the intercept usually wants a different cadence than the coefficients.

Penalty.L1 and Penalty.L2 require optimizer (and biasOptimizer) to be Sgd; the lazy-update tricks they rely on (Bottou-style multiplicative scaling for L2; cumulative truncated gradient for L1) are SGD-specific. With a non-Sgd optimizer the penalty must be Penalty.None; folding L1/L2 into Adam-class updates is left for a future refactor.

Use cases: high-throughput online regression where point estimates suffice and the per-update cost must stay small. Reach for DiagonalRegressionStat when uncertainty is needed; for BayesianRegressionStat when the full posterior is needed.

Memory: O(featureSize); weights vector, bias, plus optimizer aux state.

Update: O(nnz(x)) per observation under Penalty.None; the L1/L2 paths add lazy-update bookkeeping with the same asymptotic cost.

Concurrency: Welford-coupled per-slot atomic under Concurrency.Relaxed (HOGWILD-style asynchronous SGD), serialised under Concurrency.Strict / Concurrency.HighWrite.

Constructors

StochasticRegressionStat

constructor(featureSize: Int, optimizer: OptimizerSpec = Sgd(), biasOptimizer: OptimizerSpec = optimizer, penalty: Penalty = Penalty.None, link: Link = Link.Identity, concurrency: Concurrency = Concurrency.None)(source)

Properties

bias

val bias: Double(source)

Live view of the running intercept.

biasOptimizer

val biasOptimizer: OptimizerSpec(source)

Update rule for the bias scalar. Defaults to optimizer.

concurrency

open override val concurrency: Concurrency(source)

The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:

Concurrency.None: single-threaded; no synchronisation. Cheapest path.
Concurrency.Relaxed: lock-free best-effort. Multi-cell stats (Welford-style MeanStat, VarianceStat, MomentsStat) may drift under contention but never throw.
Concurrency.Strict: serialised when needed for full correctness across coupled cells. Sketches always self-serialise; Welford stats lock per update.
Concurrency.HighWrite: optimised for many concurrent writers; JVM uses striped adders for naively additive stats.

Picked at construction; immutable after.

featureSize

open override val featureSize: Int(source)

Number of features expected in x on each update. Mismatched lengths throw.

link

val link: Link(source)

Canonical GLM link function; Link.Identity gives ordinary least-squares SGD.

optimizer

val optimizer: OptimizerSpec(source)

Per-coordinate update rule for the weight vector.

penalty

val penalty: Penalty(source)

Regularisation applied during the gradient step; requires Sgd optimizers.

sse

val sse: Double(source)

Live view of the accumulated per-link loss.

step

val step: Long(source)

Live view of the per-observation step counter.

totalWeights

val totalWeights: Double(source)

Live view of the cumulative observation weight folded in.

Functions

create

open override fun create(concurrency: Concurrency? = null): StochasticRegressionStat(source)

Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.

The returned stat is independent: its state starts at the configured baseline, not at the source's current state. Each modality subtype narrows the return type so chaining doesn't lose the modality.

merge

open override fun merge(values: StochasticRegressionResult)(source)

Sample-weighted blend of weights and bias. SGD has no second-moment information, so this is an approximation; for principled merges use BayesianRegressionStat.

read

open override fun read(timestampNanos: Long = currentTimeNanos()): StochasticRegressionResult(source)

Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.

Snapshot consistency depends on the configured Concurrency. Under Concurrency.Strict / Concurrency.HighWrite a read locks against writers so coupled cells stay consistent. Under Concurrency.Relaxed the cells race and the snapshot may drift by ULPs of the workload under heavy contention; the drift is bounded and the read never throws.

timestampNanos is the read timestamp. Stats that don't care about time silently drop it; stats that do (rates, decay families, recency, windowed wrappers) use it as the ordering signal.

reset

open override fun reset()(source)

Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.

update

open fun update(x: VectorView, y: Double, weight: Double = 1.0)

Record an (x, y) observation with the given weight at the current time.

open fun update(x: DoubleArray, y: Double, weight: Double = 1.0)

Convenience overload that wraps x as a DenseVector.

open fun update(x: DoubleArray, y: Double, timestampNanos: Long, weight: Double = 1.0)

Timestamped convenience overload that wraps x as a DenseVector.

open override fun update(x: VectorView, y: Double, timestampNanos: Long, weight: Double = 1.0)

Record an (x, y) observation at timestampNanos with the given weight.

update

open override fun update(x: VectorView, y: Double, timestampNanos: Long, weight: Double = 1.0)(source)

Record an (x, y) observation at timestampNanos with the given weight.