kumulant

core

Foundation types every other package builds on. Everything else in the library imports from here: the modality interfaces, the result hierarchy, the cross-cutting result traits, and the concurrency contract.

The Stat contract

Stat is the root interface for all accumulators. A stat is built once, fed observations through update, snapshotted with read, and combined with peers via merge. The exact signature of update depends on the modality.

The five modalities

InterfaceupdateTypical input
SeriesStatupdate(value: Double, weight: Double = 1.0)One scalar per observation
DiscreteStatupdate(value: Long, weight: Double = 1.0)Opaque keys, integer counts
PairedStatupdate(x: Double, y: Double, weight: Double = 1.0)Scalar (x, y) pairs
VectorStatupdate(vector: VectorView, weight: Double = 1.0)Multi-channel observations
RegressionStatupdate(x: VectorView, y: Double, weight: Double = 1.0)Vector covariate, scalar response

Every update overload has a sibling that takes an explicit timestampNanos; the no-timestamp form calls com.eignex.kumulant.stream.currentTimeNanos. Stats that ignore time silently drop the stamp. Stats that care about it (rates, windowed wrappers, decaying accumulators) treat it as the ordering signal; pass a monotonic stamp when replaying a log.

VectorStat and RegressionStat both accept a VectorView (com.eignex.kumulant.math.VectorView) so sparse callers can feed sparse vectors without materialising them. Each also exposes a DoubleArray convenience overload that wraps the array in a DenseVector before forwarding.

Snapshots

A Result is an immutable snapshot of a stat's state at a moment in time. Concrete results are @Serializable data classes so the same value that comes out of read goes into merge over the wire. The serializer round-trip is the merge boundary; workers can ship snapshots across a process boundary without sharing live stats.

ResultList wraps an ordered list of results with per-entry names; it is the result type of fan-out wrappers (com.eignex.kumulant.schema.Vectorized, the various ListStats materializations) so consumers can look up per-entry snapshots by name or position.

IndexedResult wraps an inner result with the coordinate index currently being evaluated. Per-coordinate feedback wrappers in the schema layer pass this to the projection AST so it can branch on VIndex and still address primary-snapshot fields (Center, Scale, Low, High).

Cross-cutting result traits

Traits in StatTraits.kt surface on multiple stat families. A consumer written against a trait works for every concrete result that implements it; that is how one downstream pipeline handles both a univariate fit and a multivariate one, or both a MeanStat and a DecayingMeanStat.

TraitExposes
HasRaterate (events per second), per(duration)
HasSampleVariancetotalWeights, variance, stdDev, sampleVariance, sampleStdDev
HasShapeMomentsextends HasSampleVariance with m3, m4, skewness, kurtosis, and the size-adjusted unbiased variants
HasLinearModelweights: VectorView, bias: Double, predict(VectorView) over a fitted hyperplane
HasSlopescalar special case: slope, intercept, predict(Double); implements HasLinearModel
HasRegressionsse, ssr, mse, rmse, rSquared on top of HasSampleVariance
HasCenterScalecenter: Double, scale: Double; consumed by standardize projections and the band wrapper
HasMinMaxmin: Double, max: Double; consumed by min-max projections

Concurrency contract

Concurrency is the deployment knob; None / Relaxed / Strict / HighWrite. Each stat translates the chosen level into a cell encoding and lock strategy that honours it for the stat's mathematical structure. The enum's own KDoc covers the four modes in detail; the short version:

  • None: single-threaded, no synchronisation, default.

  • Relaxed: lock-free atomic cells; coupled-state stats may drift by ULPs under contention but never throw.

  • Strict: coarse lock around coupled state; exact arithmetic.

  • HighWrite: JVM-only striped adders for naively additive stats under heavy concurrent writes; falls back to Strict elsewhere.

To configure a coherent bag of stats with one contract, declare them inside a com.eignex.kumulant.schema.StatSchema with the desired concurrency; the schema propagates the choice to every registered stat at delegate registration.

Lifecycle

Samples

val mean = MeanStat()
for (x in doubleArrayOf(1.0, 2.0, 3.0)) mean.update(x)
val snapshot = mean.read()
println(snapshot.mean) // 2.0

val peer = MeanStat()
for (x in doubleArrayOf(4.0, 5.0)) peer.update(x)
mean.merge(peer.read())
println(mean.read().mean) // 3.0

Types

Link copied to clipboard

User-facing concurrency contract for stats. Each stat translates the chosen level into a cell-encoding and lock strategy that honours it for that stat's mathematical structure.

Link copied to clipboard
interface DiscreteStat<R : Result> : Stat<R>

Accumulator over a stream of discrete Long values. The Long carries two interpretations across the family:

Link copied to clipboard

Result trait for accumulators that expose a center estimate and a scale estimate. Consumed by the band operator (which derives center ± k * scale) and by the Standardize AST node (which projects (x - center) / scale).

Link copied to clipboard

Result trait for accumulators that produce a fitted linear model y = bias + weights . x. Covers both the univariate special case (HasSlope; slope + intercept) and the multivariate case (com.eignex.kumulant.stat.regression.glm.LinearRegressionResult; weights vector + scalar bias) behind one surface.

Link copied to clipboard
interface HasMinMax : Result

Result trait for accumulators that expose observed minimum and maximum values. Consumed by the Low and High AST nodes for min-max scaling and by any downstream that needs the observed range.

Link copied to clipboard
interface HasRate : Result

Result trait for accumulators that produce a normalised throughput. Implemented by com.eignex.kumulant.stat.rate.RateResult (and friends), so downstream code written against HasRate works for any rate-shaped stat regardless of the underlying mechanism; uniform-over-window (com.eignex.kumulant.stat.rate.RateStat), counter-differentiated (com.eignex.kumulant.stat.rate.CounterRateStat), or exponentially-decayed (com.eignex.kumulant.stat.rate.DecayingRateStat).

Link copied to clipboard

Result trait for regression error metrics. Extends HasSampleVariance because R² is defined as 1 - sse/sst, and sst is the variance-family sum of squared deviations from the mean.

Link copied to clipboard

Result trait for accumulators that expose variance-family quantities. Derived properties variance / stdDev / sampleVariance / sampleStdDev all fall out of sst (sum of squared deviations) and totalWeights without storing redundant fields.

Link copied to clipboard

Result trait for accumulators that expose third and fourth central moments plus skewness and kurtosis. Extends HasSampleVariance; every shape moment result is also a variance result.

Link copied to clipboard

Univariate special case of HasLinearModel: y = slope * x + intercept. The general weights vector and bias surface are derived from slope / intercept, so univariate regression results compose with any consumer written against HasLinearModel without storing redundant fields.

Link copied to clipboard
data class IndexedResult(val inner: Result, val index: Int) : Result

Wraps an inner result with the coordinate index currently being evaluated. Element-wise feedback wrappers (vector / regression / paired) pass this to the projection AST so it can branch on VIndex and still address primary-snapshot fields (Center, Scale, Low, High) via the transparent unwrap performed by those AST nodes.

Link copied to clipboard
interface PairedStat<R : Result> : Stat<R>

Accumulator over paired (x, y) scalar observations. The shape covers scalar-on-scalar regression (UnivariateRegressionStat), weighted covariance and correlation (CovarianceStat), and every paired evaluation metric in com.eignex.kumulant.stat.score / com.eignex.kumulant.stat.calibration: (prediction, truth) pairs, (score, label) pairs, etc.

Link copied to clipboard
interface RegressionStat<R : Result> : Stat<R>

Accumulator over vector-covariate / scalar-response observations (x, y, weight), where x is a fixed-dimensional feature vector and y is the scalar target. The multivariate generalisation of PairedStat and the input shape for every linear / non-linear regressor.

Link copied to clipboard
interface Result

Marker for a snapshot returned by a Stat's read/merge pipeline.

Link copied to clipboard
@Serializable
@SerialName(value = "ResultList")
data class ResultList<R : Result>(val names: List<String>, val results: List<R>) : Result

Ordered list of results with per-entry names. Produced by ListStats and the vector expansion helpers.

Link copied to clipboard
interface SeriesStat<R : Result> : Stat<R>

Accumulator over a single scalar time series. The default modality; most descriptive statistics (MeanStat, VarianceStat, the quantile sketches, the rate family, the decay family) implement this shape.

Link copied to clipboard
interface Stat<R : Result>

The base interface for all statistical accumulators. Implementations accumulate a streaming view of some input, expose the current state as an immutable Result via read, and merge another snapshot in via merge.

Link copied to clipboard
interface VectorStat<R : Result> : Stat<R>

Accumulator over fixed-dimensional vector observations without a response axis. The natural fit for per-coordinate aggregations (VectorizedStat's fan-out of any series stat across dimensions channels) and for the multivariate anomaly detector (HalfSpaceTreesStat).