kumulant

stat.quantile

Bounded-memory quantile estimators and histograms. Every entry trades a different precision-versus-cost knob: relative error guarantees, fixed-precision over a known range, reservoir sampling for raw values back, or constant memory at the cost of accuracy.

Picking a quantile estimator

StatMemoryPrecisionReach for it when
DDSketchStatO(1 / relativeError)Relative error guaranteeLatencies, payload sizes, any value spanning orders of magnitude. Merge across replicas is exact. The default percentile sketch.
TDigestStatO(compression)Tighter tail-quantile error than DDSketch at the same memory budgetYou specifically care about the 99th / 99.9th percentile (tails) and not the body.
HdrHistogramStatO(precision · log(range))Strictest precision in a bounded rangeThe value range is known up front (e.g. latencies between 1 µs and 1 hr) and you want guaranteed precision in that range.
LinearHistogramStatO(binCount)Equal-width bin precisionMeaningful breakpoints are known up front; you want bins that match them directly with no rebucketing on read.
ReservoirHistogramStatO(capacity)Raw values backDownstream needs the actual observations (to feed another stat or compute quantities the sketches don't expose).
FrugalQuantileStatO(1); two variablesCoarse, single-quantileYou can fit only a few bytes per stat and only care about one percentile.
ThresholdBucketStatO(thresholds)Caller-supplied edgesYou know the meaningful value buckets ahead of time and want per-bucket counts, not a quantile estimate.

Result shapes

ResultShape
SketchResultDDSketch snapshot: log-spaced bin map + precomputed quantiles at the configured probabilities
QuantileResultFrugalQuantileStat single-quantile scalar
TDigestResultt-digest centroids + precomputed quantiles
SparseHistogramResultParallel [lowerBounds, upperBounds) arrays with weights; produced by HdrHistogramStat, LinearHistogramStat, and SketchResult.toSparseHistogram
ReservoirResultBounded reservoir sample of raw values + the sampling weight
ThresholdBucketResultPer-bucket weighted counts over caller-supplied edges

SketchResult / TDigestResult / ReservoirResult all expose quantiles at the configured probabilities, so the result type a downstream consumer sees depends on which sketch was picked. For a uniform downstream interface, project to SparseHistogramResult (the shared histogram shape).

PIT-style equiprobable histogram

The pitHistogram(numBins) factory in com.eignex.kumulant.stat.score is built from this family: a stream of PIT values (which are uniform under correct distributional forecasts) fed into an equiprobable LinearHistogramStat over [0, 1] exposes the deviation from uniformity that the corresponding PIT test consumes.

Merge

  • DDSketch, HDR, t-digest merge exactly across replicas via cell-wise bin addition / centroid combination.

  • LinearHistogram, ThresholdBucket merge exactly via cell-wise bin addition (same bin layout required).

  • ReservoirHistogram merges sample-weighted via reservoir union: the result is statistically equivalent to one large reservoir.

  • FrugalQuantile does not have a clean merge: it averages the two point estimates. Use it for single-stream tracking, not distributed aggregation.

Concurrency

Histogram-shaped stats (DDSketchStat, HdrHistogramStat, LinearHistogramStat, ThresholdBucketStat) decompose updates into a single striped atomic increment on the destination bin; exact under every com.eignex.kumulant.core.Concurrency level. ReservoirHistogramStat and FrugalQuantileStat keep coupled state and self-serialise under concurrent access. TDigestStat self-serialises through its own lock.

Types

Link copied to clipboard
class DDSketchStat(val relativeError: Double = 0.01, val probabilities: DoubleArray = doubleArrayOf( 0.5, 0.75, 0.9, 0.95, 0.99, 0.999, ), val concurrency: Concurrency = Concurrency.None) : SeriesStat<SketchResult>

DDSketchStat: relative-error quantile sketch with logarithmic bins.

Link copied to clipboard
class FrugalQuantileStat(val q: Double, val stepSize: Double = 0.01, val initialEstimate: Double = 0.0, val concurrency: Concurrency = Concurrency.None) : SeriesStat<QuantileResult>

Frugal-streaming single-quantile estimator.

Link copied to clipboard
class HdrHistogramStat(val lowestDiscernibleValue: Double = 0.001, val initialHighestTrackableValue: Double = 100.0, val significantDigits: Int = 3, val concurrency: Concurrency = Concurrency.None) : SeriesStat<SparseHistogramResult>

Auto-resizing High Dynamic Range (HDR) Histogram with native Double support.

Link copied to clipboard
class LinearHistogramStat(val lowerBound: Double, val upperBound: Double, val binCount: Int, val concurrency: Concurrency = Concurrency.None) : SeriesStat<SparseHistogramResult>

Fixed-width binned histogram over [lowerBound, upperBound) split into binCount buckets.

Link copied to clipboard
@Serializable
@SerialName(value = "QuantileResult")
data class QuantileResult(val probability: Double, val quantile: Double) : Result

Single estimated quantile with the probability it targets.

Link copied to clipboard
class ReservoirHistogramStat(val capacity: Int = 1024, val seed: Long = Random.Default.nextLong(), val concurrency: Concurrency = Concurrency.None) : SeriesStat<ReservoirResult>

Weighted reservoir sample of size capacity via Algorithm A-Res (Efraimidis & Spirakis): each item gets a key u^(1/w) and the top-k keys are retained, giving an unbiased weight-proportional sample.

Link copied to clipboard
@Serializable
@SerialName(value = "ReservoirResult")
data class ReservoirResult(val values: DoubleArray, val keys: DoubleArray, val capacity: Int, val totalSeen: Long, val totalWeight: Double) : Result

Reservoir sampling snapshot.

Link copied to clipboard
@Serializable
@SerialName(value = "SketchResult")
data class SketchResult(val probabilities: DoubleArray, val quantiles: DoubleArray, val gamma: Double, val totalWeights: Double, val zeroCount: Double, val positiveBins: Map<Int, Double>, val negativeBins: Map<Int, Double>) : Result

DDSketch snapshot: logarithmic bins plus precomputed quantiles for probabilities.

Link copied to clipboard
@Serializable
@SerialName(value = "SparseHistogramResult")
data class SparseHistogramResult(val lowerBounds: DoubleArray, val upperBounds: DoubleArray, val weights: DoubleArray) : Result

Histogram as parallel [lowerBounds, upperBounds) bucket arrays with weights.

Link copied to clipboard
@Serializable
@SerialName(value = "TDigestResult")
data class TDigestResult(val probabilities: DoubleArray, val quantiles: DoubleArray, val means: DoubleArray, val weights: DoubleArray, val totalWeight: Double, val compression: Double) : Result

T-digest snapshot: means/weights are the centroid arrays sorted by mean, with quantiles precomputed for probabilities via CDF inversion.

Link copied to clipboard
class TDigestStat(val compression: Double = 100.0, val probabilities: DoubleArray = doubleArrayOf(0.5, 0.75, 0.9, 0.95, 0.99, 0.999), val concurrency: Concurrency = Concurrency.None) : SeriesStat<TDigestResult>

Buffered merging T-Digest (Dunning) with k1 scaling function for high-fidelity extreme-quantile estimates and bounded centroid count. compression (delta) caps centroids to roughly ~6*delta.

Link copied to clipboard
@Serializable
@SerialName(value = "ThresholdBucketResult")
data class ThresholdBucketResult(val thresholds: List<Double>, val counts: List<Double>) : Result

Per-bucket counts for a user-defined threshold list. For thresholds [t1, t2, ..., tK] (strictly increasing) the result holds K + 1 counts; bucket i contains t[i-1] < value <= t[i] with the open-ended ends value <= t[0] and value > t[K-1].

Link copied to clipboard
class ThresholdBucketStat(val thresholds: DoubleArray, val concurrency: Concurrency = Concurrency.None) : SeriesStat<ThresholdBucketResult>

Weighted counter over user-defined value buckets.

Functions

Link copied to clipboard

Linear-interpolated quantile at probability from a reservoir sample (treats sample as unweighted).

Link copied to clipboard

Project a SketchResult into a SparseHistogramResult by expanding its bin indices to bucket boundaries.

Convert centroids to a sparse histogram with bins centered on each centroid.

Bucket the retained sample into binCount equal-width bins between min and max.