stat.calibration
Probability calibration: diagnosing when a classifier's predicted probabilities are out of step with the empirical positive rate, and two complementary mappings that fix it.
Picking a calibration stat
| Stat | Role | Mapping |
|---|---|---|
| ReliabilityStat | Diagnostic. Bins (predicted, observed) pairs into per-bin reliability tiers; result exposes expected calibration error and the per-bin gap. | No mapping; read the result, plot, or pipe into an alarm. |
| PlattCalibratorStat | Parametric fix. Fits sigmoid(slope * x + intercept) over (rawScore, label). Two parameters. | Smooth global sigmoid. Right when miscalibration is roughly sigmoidal; the classic SVM / boosted-tree pattern. |
| IsotonicCalibratorStat | Non-parametric fix. Bins raw scores; runs Pool Adjacent Violators at read time to produce a monotone step function. | Arbitrary monotone shape with linear interpolation between bin midpoints. Right when miscalibration has a non-sigmoidal pattern (kinks, plateaus, asymmetric tails). |
The diagnostic → fix workflow
Standard pipeline:
Train a classifier (anything producing a probability or score in
[0, 1]).Feed
(prediction, label)to a ReliabilityStat alongside training (or on held-out evaluation data) and read the per-bin gap. IfexpectedCalibrationError()is acceptable, stop.Otherwise plumb the same
(prediction, label)stream into a PlattCalibratorStat or IsotonicCalibratorStat. At inference, replacerawProbwithcalibrator.calibrate(rawProb).Continue feeding ReliabilityStat from the calibrated outputs to verify the fix held.
Reuse with ReliabilityStat
IsotonicCalibratorStat is built on top of ReliabilityStat; the binned (positives, total) book-keeping is exactly what ReliabilityResult already exposes. The PAV pass is purely a read post-process; updates remain O(1). If you already have a ReliabilityStat in the pipeline for diagnostics, attaching an IsotonicCalibratorStat costs only the second instance's per-bin cells.
Merge
ReliabilityStat merges per-bin sums element-wise, safe across parallel workers. PlattCalibratorStat merges sample-weighted weight vectors via the inner com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat: same approximation as any SGD merge. IsotonicCalibratorStat does not support merge directly: the result only carries the threshold step function, not the bin layout. To pool isotonic calibration across workers, merge the underlying ReliabilityStat and re-derive.
Concurrency
ReliabilityStat applies independent striped atomic adds to its per-bin counters; lock-free and exact under every com.eignex.kumulant.core.Concurrency level. IsotonicCalibratorStat inherits that model, with the read-time Pool Adjacent Violators pass running single-threaded. PlattCalibratorStat inherits the com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat model: the update body is locked under com.eignex.kumulant.core.Concurrency.Strict / com.eignex.kumulant.core.Concurrency.HighWrite and runs lock-free (Hogwild) under com.eignex.kumulant.core.Concurrency.Relaxed.
Types
Snapshot from IsotonicCalibratorStat: a non-decreasing step function from raw score in [0, 1] to calibrated probability, derived from a binned (positives, total) histogram via Pool Adjacent Violators.
Online isotonic calibration: bins raw scores in [0, 1] into numBins equal-width buckets, tracks per-bin (positives, total) weights, and at read time runs Pool Adjacent Violators to produce a non-decreasing calibrated probability per bin. Unlike PlattCalibratorStat this is non-parametric and can absorb arbitrary monotonic miscalibration patterns.
Snapshot from PlattCalibratorStat: the learned sigmoid parameters and a helper that maps a raw classifier score to a calibrated probability.
Online Platt scaling: fits a one-feature logistic regression sigmoid(slope * rawScore + intercept) over paired (rawScore, label) observations where label is in {0, 1}. Use to fix the calibration of a classifier whose probability output is poorly aligned with the empirical positive rate.
Per-bin reliability snapshot for a binary probabilistic classifier. Bins are equal-width over [0, 1] indexed by predicted probability. Underlying sums use weights so soft labels and importance-weighted streams compose correctly.
Reliability diagram primitive for binary probabilistic forecasts. Paired input is (predictedProbability, outcome); predictions are bucketed into numBins equal-width bins across [0, 1]. Outcomes are typically {0, 1} but soft labels and weighted updates work uniformly.