kumulant

stat.calibration

Probability calibration: diagnosing when a classifier's predicted probabilities are out of step with the empirical positive rate, and two complementary mappings that fix it.

Picking a calibration stat

StatRoleMapping
ReliabilityStatDiagnostic. Bins (predicted, observed) pairs into per-bin reliability tiers; result exposes expected calibration error and the per-bin gap.No mapping; read the result, plot, or pipe into an alarm.
PlattCalibratorStatParametric fix. Fits sigmoid(slope * x + intercept) over (rawScore, label). Two parameters.Smooth global sigmoid. Right when miscalibration is roughly sigmoidal; the classic SVM / boosted-tree pattern.
IsotonicCalibratorStatNon-parametric fix. Bins raw scores; runs Pool Adjacent Violators at read time to produce a monotone step function.Arbitrary monotone shape with linear interpolation between bin midpoints. Right when miscalibration has a non-sigmoidal pattern (kinks, plateaus, asymmetric tails).

The diagnostic → fix workflow

Standard pipeline:

  1. Train a classifier (anything producing a probability or score in [0, 1]).

  2. Feed (prediction, label) to a ReliabilityStat alongside training (or on held-out evaluation data) and read the per-bin gap. If expectedCalibrationError() is acceptable, stop.

  3. Otherwise plumb the same (prediction, label) stream into a PlattCalibratorStat or IsotonicCalibratorStat. At inference, replace rawProb with calibrator.calibrate(rawProb).

  4. Continue feeding ReliabilityStat from the calibrated outputs to verify the fix held.

Reuse with ReliabilityStat

IsotonicCalibratorStat is built on top of ReliabilityStat; the binned (positives, total) book-keeping is exactly what ReliabilityResult already exposes. The PAV pass is purely a read post-process; updates remain O(1). If you already have a ReliabilityStat in the pipeline for diagnostics, attaching an IsotonicCalibratorStat costs only the second instance's per-bin cells.

Merge

ReliabilityStat merges per-bin sums element-wise, safe across parallel workers. PlattCalibratorStat merges sample-weighted weight vectors via the inner com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat: same approximation as any SGD merge. IsotonicCalibratorStat does not support merge directly: the result only carries the threshold step function, not the bin layout. To pool isotonic calibration across workers, merge the underlying ReliabilityStat and re-derive.

Concurrency

ReliabilityStat applies independent striped atomic adds to its per-bin counters; lock-free and exact under every com.eignex.kumulant.core.Concurrency level. IsotonicCalibratorStat inherits that model, with the read-time Pool Adjacent Violators pass running single-threaded. PlattCalibratorStat inherits the com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat model: the update body is locked under com.eignex.kumulant.core.Concurrency.Strict / com.eignex.kumulant.core.Concurrency.HighWrite and runs lock-free (Hogwild) under com.eignex.kumulant.core.Concurrency.Relaxed.

Types

Link copied to clipboard
@Serializable
@SerialName(value = "IsotonicCalibratorResult")
data class IsotonicCalibratorResult(val numBins: Int, val binMidpoints: DoubleArray, val probabilities: DoubleArray, val totalWeights: Double) : Result

Snapshot from IsotonicCalibratorStat: a non-decreasing step function from raw score in [0, 1] to calibrated probability, derived from a binned (positives, total) histogram via Pool Adjacent Violators.

Link copied to clipboard
class IsotonicCalibratorStat(val numBins: Int = 16, val concurrency: Concurrency = Concurrency.None) : PairedStat<IsotonicCalibratorResult>

Online isotonic calibration: bins raw scores in [0, 1] into numBins equal-width buckets, tracks per-bin (positives, total) weights, and at read time runs Pool Adjacent Violators to produce a non-decreasing calibrated probability per bin. Unlike PlattCalibratorStat this is non-parametric and can absorb arbitrary monotonic miscalibration patterns.

Link copied to clipboard
@Serializable
@SerialName(value = "PlattCalibratorResult")
data class PlattCalibratorResult(val slope: Double, val intercept: Double, val totalWeights: Double) : Result

Snapshot from PlattCalibratorStat: the learned sigmoid parameters and a helper that maps a raw classifier score to a calibrated probability.

Link copied to clipboard
class PlattCalibratorStat(val optimizer: OptimizerSpec = Sgd(ConstantRate(1e-2)), val concurrency: Concurrency = Concurrency.None) : PairedStat<PlattCalibratorResult>

Online Platt scaling: fits a one-feature logistic regression sigmoid(slope * rawScore + intercept) over paired (rawScore, label) observations where label is in {0, 1}. Use to fix the calibration of a classifier whose probability output is poorly aligned with the empirical positive rate.

Link copied to clipboard
@Serializable
@SerialName(value = "ReliabilityResult")
data class ReliabilityResult(val numBins: Int, val sumProbability: DoubleArray, val sumOutcome: DoubleArray, val totalWeights: DoubleArray) : Result

Per-bin reliability snapshot for a binary probabilistic classifier. Bins are equal-width over [0, 1] indexed by predicted probability. Underlying sums use weights so soft labels and importance-weighted streams compose correctly.

Link copied to clipboard
class ReliabilityStat(val numBins: Int, val concurrency: Concurrency = Concurrency.None) : PairedStat<ReliabilityResult>

Reliability diagram primitive for binary probabilistic forecasts. Paired input is (predictedProbability, outcome); predictions are bucketed into numBins equal-width bins across [0, 1]. Outcomes are typically {0, 1} but soft labels and weighted updates work uniformly.