kumulant

stat.anomaly

Online anomaly detectors. All three primitives produce a score(x) method on their result so the same downstream pipeline can consume "how anomalous is this observation?" regardless of which detector generated it.

Picking a detector

StatInput shapeWhen to reach for it
GaussianScorerStatScalarStreams that are roughly bell-shaped. The z-score `
QuantileFilterStatScalarNon-Gaussian, possibly skewed or heavy-tailed streams. Threshold is the running q-quantile of the input; score(x) = 1.0 flags anything in the tail. Adapts as the distribution drifts.
HalfSpaceTreesStatVectorMultivariate signals where correlations between features carry the anomaly signal. Ensemble of pre-built random half-space trees; low score → anomaly. Cheap per update, parallel across trees.

How they relate

GaussianScorerStat is the simplest possible parametric detector and the natural baseline. It assumes the stream is well-summarised by mean and variance; if it isn't, the score saturates uselessly. The single-line implementation wrapping com.eignex.kumulant.stat.summary.VarianceStat makes this clear.

QuantileFilterStat is the non-parametric upgrade: instead of assuming a distribution, it tracks the actual q-quantile via com.eignex.kumulant.stat.quantile.DDSketchStat. The threshold adapts with the stream, so concept drift in the body of the distribution shifts the anomaly bar automatically.

HalfSpaceTreesStat is the multivariate generalisation. Each tree projects the input onto random axis-aligned half-spaces; leaves track mass over a sliding reference window vs the latest window. An input whose leaf has tiny reference-window mass falls into a region the recent stream rarely visited; anomaly. The reference window rotates every windowSize observations so the detector tracks slow concept drift.

Score semantics

The three detectors don't agree on which direction is "more anomalous":

  • GaussianScorerStat: higher score = more anomalous. Threshold against a fixed multiple of standard deviations.

  • QuantileFilterStat: binary 0/1: 1.0 means "above the running quantile."

  • HalfSpaceTreesStat: lower score = more anomalous. The reported number is leaf mass times depth, which is large for inputs in dense regions of the reference window. Invert it if you want "higher = more anomalous" semantics in a downstream pipeline.

This asymmetry tracks the literature; documenting it here so callers don't unify the directions accidentally.

Merge

GaussianScorerStat inherits Chan-style parallel merge from com.eignex.kumulant.stat.summary.VarianceStat; exact across parallel workers. QuantileFilterStat and HalfSpaceTreesStat do not support merge directly: the quantile-filter result only carries the scalar threshold (the bin layout would need to travel too), and a half-space-trees result only merges when the tree structures match (same randomSeed). For distributed anomaly detection, ship the underlying com.eignex.kumulant.stat.quantile.DDSketchStat / HalfSpaceTreesStat snapshots and merge those.

Concurrency

GaussianScorerStat inherits com.eignex.kumulant.stat.summary.VarianceStat's Welford-coupled cells: locked under com.eignex.kumulant.core.Concurrency.Strict / com.eignex.kumulant.core.Concurrency.HighWrite, racing with bounded drift under com.eignex.kumulant.core.Concurrency.Relaxed but never throwing. QuantileFilterStat inherits com.eignex.kumulant.stat.quantile.DDSketchStat's striped histogram counters; lock-free and exact under every level. HalfSpaceTreesStat applies independent atomic mass updates per leaf and serialises only the periodic reference-window rotation under com.eignex.kumulant.core.Concurrency.Strict / com.eignex.kumulant.core.Concurrency.HighWrite.

Types

Link copied to clipboard
@Serializable
@SerialName(value = "FeatureRange")
data class FeatureRange(val low: Double, val high: Double)

Per-feature (low, high) range used to seed random thresholds at tree construction.

Link copied to clipboard
@Serializable
@SerialName(value = "GaussianScoreResult")
data class GaussianScoreResult(val mean: Double, val variance: Double, val totalWeights: Double) : Result

Snapshot from GaussianScorerStat: running mean / variance plus the most recently scored value. The score(x) helper computes a z-score |x - mean| / stdDev on demand using the captured stats.

Link copied to clipboard
class GaussianScorerStat(val concurrency: Concurrency = Concurrency.None) : SeriesStat<GaussianScoreResult>

Streaming Gaussian anomaly scorer: tracks running mean and variance and returns the absolute z-score |x - mean| / stdDev of the most recent value. High scores flag observations that lie far from the running centre.

Link copied to clipboard
@Serializable
@SerialName(value = "HalfSpaceTreesResult")
data class HalfSpaceTreesResult(val featureSize: Int, val numTrees: Int, val height: Int, val totalWeights: Double, val featureIndices: IntArray, val thresholds: DoubleArray, val referenceMass: DoubleArray) : Result

Snapshot of HalfSpaceTreesStat: the immutable tree structure plus the reference-window per-leaf masses. Exposes score to evaluate a query vector against the trees' frozen distribution.

Link copied to clipboard
class HalfSpaceTreesStat(val featureSize: Int, val featureRanges: List<FeatureRange>, val numTrees: Int = 25, val height: Int = 8, val windowSize: Int = 250, val randomSeed: Int = 0, val concurrency: Concurrency = Concurrency.None) : VectorStat<HalfSpaceTreesResult>

Online Half-Space-Trees anomaly detector (Tan, Ting & Liu 2011). An ensemble of pre-built random half-space trees of fixed depth height; each internal node picks a random feature and a random threshold from featureRanges at construction. Trees do not grow; the algorithm tracks two mass profiles per leaf; a reference window and the latest window; and swaps them every windowSize observations. The anomaly score is computed from the reference profile.

Link copied to clipboard
@Serializable
@SerialName(value = "QuantileFilterResult")
data class QuantileFilterResult(val probability: Double, val threshold: Double, val totalWeights: Double) : Result

Snapshot from QuantileFilterStat: the running probability-quantile of the input stream plus the helper score that flags an observation as anomalous when it exceeds that quantile.

Link copied to clipboard
class QuantileFilterStat(val probability: Double = 0.99, val relativeError: Double = 0.01, val concurrency: Concurrency = Concurrency.None) : SeriesStat<QuantileFilterResult>

Streaming quantile-threshold anomaly detector. Tracks the input distribution via a DDSketchStat and exposes the q-quantile as a threshold; the result's score(x) helper flags x > threshold as a binary anomaly.