kumulant

math

Vector / matrix primitives, distribution sampling, and stream-hash functions consumed by the rest of the library. The public surface splits into three groups; the rest of the package is internal SIMD, Cholesky, and BLAS-style helpers used by regression and Bayesian stats.

Vectors and matrices

TypeRole
VectorViewSealed read interface; size, operator get(i), forEachStored. Accepted by VectorStat / RegressionStat update, by every result's predict(x: VectorView) method, and by every spec that consumes a feature vector.
DenseVectorBacked by a flat DoubleArray. Constructed via DenseVector.of(doubleArrayOf(...)). The default carrier when the caller has a dense array on hand.
SparseVectorBacked by parallel index/value arrays. Constructed via SparseVector.of(indices, values, size). Forwarded by every regressor's sparse-aware update path; forEachStored walks only the nonzero entries.
MatrixViewSealed read interface; rows, cols, operator get(i, j). Carried by covariance / Cholesky results.
DenseMatrixRow-major flat DoubleArray backing. Carried by com.eignex.kumulant.stat.regression.glm.CovarianceRegressionResult for posterior covariance and Cholesky factors.

Both vector types accept both dense and sparse on the same API path; forEachStored { i, v -> ... } is the universal entry point that lets a consumer iterate only the populated entries, regardless of the backing. That's what gives com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat its O(nnz(x)) update cost.

Distribution sampling

Convenience extensions on kotlin.random.Random:

  • nextNormal(mean, std): Gaussian. Used by Thompson-sampling posteriors throughout the bandit and regression layers.

  • nextLogNormal(mean, variance): log-normal. Used by composite arms modelling multiplicative reward.

  • nextGamma(alpha): gamma. Building block for Beta / Dirichlet.

  • nextBeta(alpha, beta): beta. Used by Beta-Bernoulli posteriors.

  • nextPoissonOne(): Poisson(1). Used by Oza & Russell online bagging in RandomForestRegressionStat.

These are mostly internal to the library but exposed in case downstream code wants the same well-tested implementations.

Hash functions

The streaming sketches and cardinality estimators (com.eignex.kumulant.stat.cardinality, com.eignex.kumulant.stat.sketch) need their input to carry uniform 64-bit entropy. The JVM's Object.hashCode() only provides 32 bits and tends to be biased for low-cardinality domains, so the hash pre-step is the right way to feed opaque keys into those sketches.

Function / typeRole
hash64Default 64-bit hash of a ByteArray (or String via UTF-8). Currently delegates to SplitMixChunkHasher. The unqualified entry; downstream code that doesn't care which algorithm should use this.
Hasher64Pluggable 64-bit byte-hash interface. Implementations must be deterministic and pure. Implement a custom one to pin a specific hash variant.
SplitMixChunkHasherThe current default implementation: SplitMix64 over 8-byte chunks. Pin to this directly when stability across library versions matters.
splitmix64Bit-mixing 64-bit integer transform. Used internally and exposed for completeness.

Note that these are non-cryptographic; passes BigCrush but is not collision-resistant. Use a cryptographic hash function for adversarial input.

Types

Link copied to clipboard
@Serializable(with = DenseMatrixSerializer::class)
@SerialName(value = "DenseMatrix")
class DenseMatrix : MatrixView

Dense row-major matrix backed by a single contiguous DoubleArray of length rows * cols. Element (i, j) lives at data[i * cols + j].

Link copied to clipboard
@Serializable
@SerialName(value = "DenseVector")
class DenseVector : VectorView

Dense double-precision vector backed by a flat DoubleArray. The default carrier when the caller already has a dense array or expects most entries to be populated.

Link copied to clipboard
fun interface Hasher64

Pluggable 64-bit byte hash. Implementations must be deterministic and pure.

Link copied to clipboard
@Serializable
value class HasherRef(val name: String)

Typed, serializable reference to a LongHasher by name. Carried by the discrete sketch specs and results in place of a bare string, and resolved to a live mixer via Hashers.resolve. Serializes transparently as its name, so the wire form stays a plain string and needs no custom serializer.

Link copied to clipboard
object Hashers

Registry resolving a LongHasher.name back to its live implementation. The sketch families serialize only the mixer's name; resolve reconstructs the function when a spec is materialized or a sketch result is queried. SplitMix64 is pre-registered.

Link copied to clipboard
interface LongHasher

Pluggable Long -> Long mixer used by the discrete sketch family (HyperLogLog, LinearCounting, MinHash, BloomFilter, CountMinSketch) to spread a key's bits across the full 64-bit range before bucketing. Distinct from Hasher64 (ByteArray -> Long): callers reduce a domain key to a Long first (e.g. via hash64), and the sketch then mixes that Long through here.

Link copied to clipboard
@Serializable
sealed interface MatrixView

Read-only N-by-M matrix. Sealed alongside VectorView so snapshots round-trip through kotlinx.serialization with their concrete storage preserved. Public surface is read-only; shape, entry access, materialise to Array<DoubleArray>. Mutation, factorisations, and arithmetic are internal to kumulant.

Link copied to clipboard
@Serializable
@SerialName(value = "SparseVector")
class SparseVector : VectorView

Compressed sparse vector: parallel indices/values arrays of equal length, each holding one nonzero entry. Immutable from the caller's perspective; to change the sparsity pattern, rebuild.

Link copied to clipboard

Hashes byte arrays by feeding 8-byte little-endian chunks through splitmix64 and folding tail bytes in last. The starting state is the input length, so different-length zero-prefixed inputs hash distinctly. Stable byte-for-byte across platforms; currently the default for hash64.

Link copied to clipboard
@Serializable
sealed interface VectorView

Read-only N-vector with sealed dense / sparse backing. Callers see the same surface either way; query the size, read entries by index, materialise to a DoubleArray. The same VectorView is used everywhere a vector observation flows: as the input to com.eignex.kumulant.core.VectorStat.update and com.eignex.kumulant.core.RegressionStat.update, as the weights of every fitted com.eignex.kumulant.core.HasLinearModel result, as the argument to every predict(VectorView) method.

Properties

Link copied to clipboard
expect val mathBackend: String

Short human-readable identifier for the math backend the current process resolved. Examples: "scalar" (any non-JVM target, or a JVM started without --add-modules=jdk.incubator.vector), "simd(4 lanes)" (JVM with AVX2), "simd(8 lanes)" (JVM with AVX-512). Print at startup to verify your runtime picked up the Vector API module.

actual val mathBackend: String

Identifies the runtime math backend powering the SIMD-like primitives.

actual val mathBackend: String

Identifies the runtime math backend powering the SIMD-like primitives.

Link copied to clipboard

Default LongHasher: the library's splitmix64 mixer. Pre-registered with Hashers.

Functions

Link copied to clipboard
fun hash64(bytes: ByteArray): Long

Default 64-bit hash of bytes for cardinality / sketch families. Currently delegates to SplitMixChunkHasher - pin to that hasher directly if you need a stable byte stream across library versions.

fun hash64(value: String): Long

UTF-8 byte hash convenience over hash64.

Link copied to clipboard
fun Random.nextBeta(alpha: Double, beta: Double): Double

Draw from Beta(alpha, beta) via the two-gamma quotient X / (X + Y) where X ~ Gamma(alpha), Y ~ Gamma(beta). Fast paths for the trivial special cases:

Link copied to clipboard

Draw from Gamma(alpha, 1) (unit rate). Marsaglia-Tsang (2000) for alpha >= 1 with Stuart's power-of-uniform boost for alpha < 1. Two fast paths for common parameter values:

Link copied to clipboard
fun Random.nextLogNormal(mean: Double, variance: Double): Double

Draw from a log-normal distribution parameterised by real-scale mean and variance (not the underlying Normal's mu/sigma). Used by log-normal posteriors where the bandit observes positive-valued rewards under a multiplicative noise model.

Link copied to clipboard
fun Random.nextNormal(mean: Double = 0.0, std: Double = 1.0): Double

Draw from N(mean, std^2) via Marsaglia & Tsang's Ziggurat algorithm. The fast path is one nextInt() + table lookup + comparison; ~97% of draws complete there. The slow path handles the tail beyond R = 3.4426 and the "wedge" regions outside each layer's inner rectangle.

fun Random.nextNormal(mean: Float, std: Float): Float

Float overload of nextNormal; widens to Double, samples, narrows back.

Link copied to clipboard

Knuth's Poisson sampler at lambda=1; returns 0/1/2/... with mass e^{-1} / k!.

Link copied to clipboard
fun splitmix64(value: Long): Long

SplitMix64 - a fast, high-quality 64-bit mixer suitable for spreading sequential or low-entropy keys into a uniform 64-bit hash before feeding them to cardinality sketches. Output passes BigCrush; not collision-resistant (use a cryptographic hash if adversarial input is a concern).