com.eignex.kumulant/stat/regression/glm/BayesianRegressionStat

BayesianRegressionStat

class BayesianRegressionStat(val featureSize: Int, val priorVariance: Double = 1.0, val link: Link = Link.Identity, val concurrency: Concurrency = Concurrency.None, priorMean: VectorView? = null, priorCovariance: MatrixView? = null) : RegressionStat<CovarianceRegressionResult> (source)

Bayesian generalised linear regression with a Gaussian prior on the weights and a canonical Link for the response. Produces a full posterior covariance S = H^-1 alongside the point estimates. Suitable for Thompson-sampling-style bandits drawing a fresh weight vector from N(weights, exploration * S) per round.

Maintained incrementally via Sherman-Morrison-Woodbury for per-observation precision w_c = weight * link.curvature(eta):

z       = sqrt(w_c) * S * x / sqrt(1 + w_c * xT * S * x)
S       = S - z * zT                     (rank-1 downdate)
w       = w + weight * S_new * x * (y - mu)

Under Link.Identity this is the strict closed-form conjugate Gaussian posterior (curvature = 1). Under Link.Logit / Link.Log it is the online Laplace approximation - each observation tightens the posterior by the local Hessian curvature * x xT, which is exact only at the current linear predictor. The approximation tracks the true GLM posterior closely for well-identified problems.

Regularisation is the Gaussian prior, controlled by priorVariance / priorMean / priorCovariance; tighten the prior to shrink weights toward zero (or a target).

The Cholesky factor L of S is tracked in parallel via choleskyDowndateInPlace so w ~ N(weights, S) draws are an O(n^2) weights + L * u op (no fresh Cholesky per sample). When the rank-1 downdate falls outside the positive-definite cone, the factor is rebuilt from a regularised covariance and the update is retried with a smaller step.

Residual variance: sigma^2 = 1. Callers wanting heteroscedastic noise can re-scale y before update() or pass per-observation precision via weight.

Use cases: Thompson-sampling-style linear-bandit arms (draws from N(weights, exploration · S) per round), full-covariance online regression for low-to-mid dimensions, GLM fitting where the joint posterior is needed. Reach for DiagonalRegressionStat when only marginal posteriors are required and dimensions are high.

Memory: O(featureSize^2); weights, covariance, and Cholesky factor.

Update: O(featureSize^2) per observation; rank-1 SMW downdate plus Cholesky downdate. Re-Cholesky on overflow is O(featureSize^3) but rare.

Concurrency: Body serialised by an internal lock under any concurrent Concurrency level (no-op under Concurrency.None). Exact under every level up to floating-point reorder ULPs; throughput bound by lock contention; shard and merge for higher write rates.

Constructors

BayesianRegressionStat

constructor(featureSize: Int, priorVariance: Double = 1.0, link: Link = Link.Identity, concurrency: Concurrency = Concurrency.None, priorMean: VectorView? = null, priorCovariance: MatrixView? = null)(source)

Types

Companion

object Companion

Empirical-Bayes / hierarchical helpers that operate on populations of fitted snapshots.

Properties

concurrency

open override val concurrency: Concurrency(source)

The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:

Concurrency.None: single-threaded; no synchronisation. Cheapest path.
Concurrency.Relaxed: lock-free best-effort. Multi-cell stats (Welford-style MeanStat, VarianceStat, MomentsStat) may drift under contention but never throw.
Concurrency.Strict: serialised when needed for full correctness across coupled cells. Sketches always self-serialise; Welford stats lock per update.
Concurrency.HighWrite: optimised for many concurrent writers; JVM uses striped adders for naively additive stats.

Picked at construction; immutable after.

featureSize

open override val featureSize: Int(source)

Number of features expected in x on each update. Mismatched lengths throw.

link

val link: Link(source)

Canonical GLM link function; Link.Identity is the strict closed-form Gaussian posterior.

priorVariance

val priorVariance: Double(source)

Isotropic prior variance used when neither priorCovariance nor priorMean is supplied.

Functions

create

open override fun create(concurrency: Concurrency? = null): BayesianRegressionStat(source)

Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.

The returned stat is independent: its state starts at the configured baseline, not at the source's current state. Each modality subtype narrows the return type so chaining doesn't lose the modality.

merge

open override fun merge(values: CovarianceRegressionResult)(source)

Combine two independent Gaussian posteriors over the same parameter by multiplying their densities and renormalising. Each posterior already includes one prior factor, so the combined precision subtracts one copy back out:

H_new  = H_self + H_other - H_prior
b_new  = H_self * mu_self + H_other * mu_other - H_prior * mu_prior
mu_new = H_new^-1 * b_new
S_new  = H_new^-1

For a non-zero prior mean the H_prior * mu_prior correction is subtracted from the information vector as well, otherwise the merged posterior would count the prior shift twice. When H_new drifts outside SPD the Cholesky helper's diagonal clamp catches it and returns a regularised result rather than NaNs. Bias is merged the same way, treating the intercept as a scalar Gaussian with zero prior mean.

read

open override fun read(timestampNanos: Long = currentTimeNanos()): CovarianceRegressionResult(source)

Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.

Snapshot consistency depends on the configured Concurrency. Under Concurrency.Strict / Concurrency.HighWrite a read locks against writers so coupled cells stay consistent. Under Concurrency.Relaxed the cells race and the snapshot may drift by ULPs of the workload under heavy contention; the drift is bounded and the read never throws.

timestampNanos is the read timestamp. Stats that don't care about time silently drop it; stats that do (rates, decay families, recency, windowed wrappers) use it as the ordering signal.

reset

open override fun reset()(source)

Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.

update

open fun update(x: VectorView, y: Double, weight: Double = 1.0)

Record an (x, y) observation with the given weight at the current time.

open fun update(x: DoubleArray, y: Double, weight: Double = 1.0)

Convenience overload that wraps x as a DenseVector.

open fun update(x: DoubleArray, y: Double, timestampNanos: Long, weight: Double = 1.0)

Timestamped convenience overload that wraps x as a DenseVector.

open override fun update(x: VectorView, y: Double, timestampNanos: Long, weight: Double = 1.0)

Record an (x, y) observation at timestampNanos with the given weight.

update

open override fun update(x: VectorView, y: Double, timestampNanos: Long, weight: Double = 1.0)(source)

Record an (x, y) observation at timestampNanos with the given weight.