com.eignex.kumulant/stat/score

stat.score

Online evaluation metrics. Inputs are paired (prediction, truth) observations (or richer shapes for distributional metrics) and outputs are accuracy / discrimination / calibration / distributional summaries.

The calibration diagnostic (Reliability) and the calibration fixes live in com.eignex.kumulant.stat.calibration. This package is for the rest: regression errors, proper scoring rules, discrimination metrics, classification metrics, and distributional forecast diagnostics.

Regression errors

Stat	Result	Use
`MseLoss` (in `Loss.kt`)	com.eignex.kumulant.stat.summary.WeightedMeanResult	Mean squared error; weights large errors quadratically.
`MaeLoss` (in `Loss.kt`)	com.eignex.kumulant.stat.summary.WeightedMeanResult	Mean absolute error; robust median-error-style alternative.
PinballLossStat	com.eignex.kumulant.stat.summary.WeightedMeanResult	Quantile (pinball) loss at quantile `tau`. The right pick when the model emits a quantile rather than a mean.

Binary proper scoring rules

Stat	Result	Use
`LogLoss` (in `Loss.kt`)	com.eignex.kumulant.stat.summary.WeightedMeanResult	Cross-entropy / log-likelihood. Log-likelihood-shaped objectives.
BrierScoreStat	com.eignex.kumulant.stat.summary.WeightedMeanResult	Bounded squared-error counterpart of log loss. Reliability-decomposable.

LogLoss penalises confident-wrong predictions much more harshly than Brier. Pick LogLoss when you want a likelihood-shaped objective; pick Brier when you want a bounded, calibration-decomposable error.

Discrimination

AucStat reports streaming ROC AUC over a fixed-resolution score histogram. AUC measures whether positives score higher than negatives on average and is calibration-agnostic; a perfectly-discriminative model can still be miscalibrated, and a perfectly-calibrated model can have mediocre AUC.

Classification

Stat	Result	Use
AccuracyStat	com.eignex.kumulant.stat.summary.WeightedMeanResult	Weighted classification accuracy: fraction of `predicted == truth`.
ConfusionMatrixStat	ConfusionMatrixResult	K-by-K confusion matrix with per-class precision / recall / F1, macro / micro averages, multiclass MCC.

AccuracyStat is the O(1) shortcut when only the scalar accuracy matters. ConfusionMatrixStat is the full P/R/F1 surface with a per-class breakdown; reach for it when accuracy alone hides class-imbalance effects.

Distributional forecast diagnostics

The PIT (probability integral transform) family covers calibration of distributional forecasts:

pitHistogram(numBins) (factory in PitHistogram.kt): feeds PIT values into an equiprobable LinearHistogramStat over [0, 1]. Under correct distributional forecasts the histogram should be uniform; deviations diagnose under- or over-coverage and tail mis-specifications.
The functions in PitTests.kt run the standard PIT uniformity tests on the histogram (Kolmogorov-Smirnov-style summary statistics).

Use these when the model emits a CDF (not just a point estimate) and you want to check whether the predicted distribution matches the empirical one.

Compose patterns

MseLoss.windowed(window) for windowed regression error.
BrierScore.transform(...) after a Platt or Isotonic step to score calibrated probabilities.
Auc + Reliability in parallel: AUC tells you discrimination, reliability tells you calibration. A pipeline that monitors both catches different failure modes.

Merge

All paired-mean-shaped metrics (MseLoss, MaeLoss, LogLoss, BrierScore, PinballLoss, Accuracy) merge via the underlying MeanStat's Chan-style parallel formula; exact across replicas. AucStat and ConfusionMatrixStat merge via cell-wise bin / matrix addition.

Concurrency

The mean-shaped metrics (AccuracyStat, BrierScoreStat, PinballLossStat, and the MseLoss / MaeLoss / LogLoss stats in Loss.kt) inherit MeanStat's Welford-coupled model: locked under com.eignex.kumulant.core.Concurrency.Strict / com.eignex.kumulant.core.Concurrency.HighWrite, drifting by ULPs under com.eignex.kumulant.core.Concurrency.Relaxed but never throwing. AucStat and ConfusionMatrixStat apply independent striped atomic increments to their histogram / matrix cells; lock-free and exact under every level, with the trapezoidal / precision-recall read running single-threaded.

Types

AccuracyStat

class AccuracyStat(val concurrency: Concurrency = Concurrency.None) : PairedStat<WeightedMeanResult>

Streaming classification accuracy: paired (predictedClass, trueClass) aggregated as the weighted mean of 1[predicted == truth]. Classes are compared on toLong() so floating-point class indices round-trip safely.

AucResult

@Serializable

@SerialName(value = "AucResult")

data class AucResult(val auc: Double, val totalPositives: Double, val totalNegatives: Double, val positives: DoubleArray, val negatives: DoubleArray, val lowerBound: Double, val upperBound: Double) : Result

AUC snapshot with the per-bin counts needed for merge. auc is NaN until at least one positive and one negative have been observed; consult totalPositives / totalNegatives to detect that case.

AucStat

class AucStat(val numBins: Int = 256, val lowerBound: Double = 0.0, val upperBound: Double = 1.0, val concurrency: Concurrency = Concurrency.None) : PairedStat<AucResult>

Streaming binary ROC-AUC by score-binning. Each update is paired (score, label) with label in {0, 1} (soft labels work too via the convex split into pos/neg weights).

BrierScoreStat

class BrierScoreStat(val concurrency: Concurrency = Concurrency.None) : PairedStat<WeightedMeanResult>

Streaming Brier score for binary probabilistic forecasts. Paired input is (probability, outcome) where outcome in {0, 1}; aggregated as the mean of (probability - outcome)^2.

ConfusionMatrixResult

@Serializable

@SerialName(value = "ConfusionMatrixResult")

data class ConfusionMatrixResult(val numClasses: Int, val counts: DoubleArray) : Result

Snapshot of a weighted K-by-K confusion matrix indexed as counts[predicted][truth].

ConfusionMatrixStat

class ConfusionMatrixStat(val numClasses: Int, val concurrency: Concurrency = Concurrency.None) : PairedStat<ConfusionMatrixResult>

Streaming K-by-K confusion matrix over paired (predictedClass, trueClass) observations. Inputs are class indices in [0, numClasses); the doubles are truncated to ints via toInt() and out-of-range pairs are ignored. Use for online classifier evaluation; pair with the metric getters on ConfusionMatrixResult for accuracy, per-class P/R/F1, macro F1, and MCC.

LogLossStat

class LogLossStat(val concurrency: Concurrency = Concurrency.None) : PairedStat<WeightedMeanResult>

Streaming binary log loss (cross-entropy): paired (probability, outcome) aggregated as the mean of -[y*ln(p) + (1-y)*ln(1-p)].

MaeLossStat

class MaeLossStat(val concurrency: Concurrency = Concurrency.None) : PairedStat<WeightedMeanResult>

Streaming mean absolute error: paired (prediction, truth) aggregated as the mean of |prediction - truth|.

MseLossStat

class MseLossStat(val concurrency: Concurrency = Concurrency.None) : PairedStat<WeightedMeanResult>

Streaming mean squared error: paired (prediction, truth) aggregated as the mean of (prediction - truth)^2.

PinballLossStat

class PinballLossStat(val tau: Double, val concurrency: Concurrency = Concurrency.None) : PairedStat<WeightedMeanResult>

Streaming pinball / quantile loss at level tau. Paired input is (prediction, truth); the per-row loss is max(tau*(y - yhat), (tau - 1)*(y - yhat)), which equals |y - yhat| when tau = 0.5.

Functions

pitChiSquared

fun SparseHistogramResult.pitChiSquared(numBins: Int): Double

Pearson chi-squared statistic for uniformity on [0, 1]. Compares the empirical bin counts against the uniform expectation total / numBins and sums (observed - expected)^2 / expected over all numBins bins.

pitHistogram

fun pitHistogram(numBins: Int, concurrency: Concurrency = Concurrency.None): SeriesStat<SparseHistogramResult>

Probability Integral Transform histogram: bins F(y) (the forecast CDF evaluated at the observed truth) into numBins equal-width buckets across [0, 1]. A uniform empirical distribution indicates a well-calibrated forecaster; concentrated mass indicates miscalibration.

pitKsDistance

fun SparseHistogramResult.pitKsDistance(numBins: Int): Double

Kolmogorov-Smirnov statistic against the uniform distribution on [0, 1]. Walks every bin (including empty ones) and returns the supremum of |empCdf(x) - x| evaluated at bin upper boundaries.