stat.quantile
Bounded-memory quantile estimators and histograms. Every entry trades a different precision-versus-cost knob: relative error guarantees, fixed-precision over a known range, reservoir sampling for raw values back, or constant memory at the cost of accuracy.
Picking a quantile estimator
| Stat | Memory | Precision | Reach for it when |
|---|---|---|---|
| DDSketchStat | O(1 / relativeError) | Relative error guarantee | Latencies, payload sizes, any value spanning orders of magnitude. Merge across replicas is exact. The default percentile sketch. |
| TDigestStat | O(compression) | Tighter tail-quantile error than DDSketch at the same memory budget | You specifically care about the 99th / 99.9th percentile (tails) and not the body. |
| HdrHistogramStat | O(precision · log(range)) | Strictest precision in a bounded range | The value range is known up front (e.g. latencies between 1 µs and 1 hr) and you want guaranteed precision in that range. |
| LinearHistogramStat | O(binCount) | Equal-width bin precision | Meaningful breakpoints are known up front; you want bins that match them directly with no rebucketing on read. |
| ReservoirHistogramStat | O(capacity) | Raw values back | Downstream needs the actual observations (to feed another stat or compute quantities the sketches don't expose). |
| FrugalQuantileStat | O(1); two variables | Coarse, single-quantile | You can fit only a few bytes per stat and only care about one percentile. |
| ThresholdBucketStat | O(thresholds) | Caller-supplied edges | You know the meaningful value buckets ahead of time and want per-bucket counts, not a quantile estimate. |
Result shapes
| Result | Shape |
|---|---|
| SketchResult | DDSketch snapshot: log-spaced bin map + precomputed quantiles at the configured probabilities |
| QuantileResult | FrugalQuantileStat single-quantile scalar |
| TDigestResult | t-digest centroids + precomputed quantiles |
| SparseHistogramResult | Parallel [lowerBounds, upperBounds) arrays with weights; produced by HdrHistogramStat, LinearHistogramStat, and SketchResult.toSparseHistogram |
| ReservoirResult | Bounded reservoir sample of raw values + the sampling weight |
| ThresholdBucketResult | Per-bucket weighted counts over caller-supplied edges |
SketchResult / TDigestResult / ReservoirResult all expose quantiles at the configured probabilities, so the result type a downstream consumer sees depends on which sketch was picked. For a uniform downstream interface, project to SparseHistogramResult (the shared histogram shape).
PIT-style equiprobable histogram
The pitHistogram(numBins) factory in com.eignex.kumulant.stat.score is built from this family: a stream of PIT values (which are uniform under correct distributional forecasts) fed into an equiprobable LinearHistogramStat over [0, 1] exposes the deviation from uniformity that the corresponding PIT test consumes.
Merge
DDSketch, HDR, t-digest merge exactly across replicas via cell-wise bin addition / centroid combination.
LinearHistogram, ThresholdBucket merge exactly via cell-wise bin addition (same bin layout required).
ReservoirHistogram merges sample-weighted via reservoir union: the result is statistically equivalent to one large reservoir.
FrugalQuantile does not have a clean merge: it averages the two point estimates. Use it for single-stream tracking, not distributed aggregation.
Concurrency
Histogram-shaped stats (DDSketchStat, HdrHistogramStat, LinearHistogramStat, ThresholdBucketStat) decompose updates into a single striped atomic increment on the destination bin; exact under every com.eignex.kumulant.core.Concurrency level. ReservoirHistogramStat and FrugalQuantileStat keep coupled state and self-serialise under concurrent access. TDigestStat self-serialises through its own lock.
Types
DDSketchStat: relative-error quantile sketch with logarithmic bins.
Frugal-streaming single-quantile estimator.
Auto-resizing High Dynamic Range (HDR) Histogram with native Double support.
Fixed-width binned histogram over [lowerBound, upperBound) split into binCount buckets.
Single estimated quantile with the probability it targets.
Weighted reservoir sample of size capacity via Algorithm A-Res (Efraimidis & Spirakis): each item gets a key u^(1/w) and the top-k keys are retained, giving an unbiased weight-proportional sample.
Reservoir sampling snapshot.
DDSketch snapshot: logarithmic bins plus precomputed quantiles for probabilities.
Histogram as parallel [lowerBounds, upperBounds) bucket arrays with weights.
T-digest snapshot: means/weights are the centroid arrays sorted by mean, with quantiles precomputed for probabilities via CDF inversion.
Buffered merging T-Digest (Dunning) with k1 scaling function for high-fidelity extreme-quantile estimates and bounded centroid count. compression (delta) caps centroids to roughly ~6*delta.
Per-bucket counts for a user-defined threshold list. For thresholds [t1, t2, ..., tK] (strictly increasing) the result holds K + 1 counts; bucket i contains t[i-1] < value <= t[i] with the open-ended ends value <= t[0] and value > t[K-1].
Weighted counter over user-defined value buckets.
Functions
Linear-interpolated quantile at probability from a reservoir sample (treats sample as unweighted).
Project a SketchResult into a SparseHistogramResult by expanding its bin indices to bucket boundaries.
Convert centroids to a sparse histogram with bins centered on each centroid.
Bucket the retained sample into binCount equal-width bins between min and max.