math
Vector / matrix primitives, distribution sampling, and stream-hash functions consumed by the rest of the library. The public surface splits into three groups; the rest of the package is internal SIMD, Cholesky, and BLAS-style helpers used by regression and Bayesian stats.
Vectors and matrices
| Type | Role |
|---|---|
| VectorView | Sealed read interface; size, operator get(i), forEachStored. Accepted by VectorStat / RegressionStat update, by every result's predict(x: VectorView) method, and by every spec that consumes a feature vector. |
| DenseVector | Backed by a flat DoubleArray. Constructed via DenseVector.of(doubleArrayOf(...)). The default carrier when the caller has a dense array on hand. |
| SparseVector | Backed by parallel index/value arrays. Constructed via SparseVector.of(indices, values, size). Forwarded by every regressor's sparse-aware update path; forEachStored walks only the nonzero entries. |
| MatrixView | Sealed read interface; rows, cols, operator get(i, j). Carried by covariance / Cholesky results. |
| DenseMatrix | Row-major flat DoubleArray backing. Carried by com.eignex.kumulant.stat.regression.glm.CovarianceRegressionResult for posterior covariance and Cholesky factors. |
Both vector types accept both dense and sparse on the same API path; forEachStored { i, v -> ... } is the universal entry point that lets a consumer iterate only the populated entries, regardless of the backing. That's what gives com.eignex.kumulant.stat.regression.glm.StochasticRegressionStat its O(nnz(x)) update cost.
Distribution sampling
Convenience extensions on kotlin.random.Random:
nextNormal(mean, std): Gaussian. Used by Thompson-sampling posteriors throughout the bandit and regression layers.nextLogNormal(mean, variance): log-normal. Used by composite arms modelling multiplicative reward.nextGamma(alpha): gamma. Building block for Beta / Dirichlet.nextBeta(alpha, beta): beta. Used by Beta-Bernoulli posteriors.nextPoissonOne(): Poisson(1). Used by Oza & Russell online bagging in RandomForestRegressionStat.
These are mostly internal to the library but exposed in case downstream code wants the same well-tested implementations.
Hash functions
The streaming sketches and cardinality estimators (com.eignex.kumulant.stat.cardinality, com.eignex.kumulant.stat.sketch) need their input to carry uniform 64-bit entropy. The JVM's Object.hashCode() only provides 32 bits and tends to be biased for low-cardinality domains, so the hash pre-step is the right way to feed opaque keys into those sketches.
| Function / type | Role |
|---|---|
| hash64 | Default 64-bit hash of a ByteArray (or String via UTF-8). Currently delegates to SplitMixChunkHasher. The unqualified entry; downstream code that doesn't care which algorithm should use this. |
| Hasher64 | Pluggable 64-bit byte-hash interface. Implementations must be deterministic and pure. Implement a custom one to pin a specific hash variant. |
| SplitMixChunkHasher | The current default implementation: SplitMix64 over 8-byte chunks. Pin to this directly when stability across library versions matters. |
| splitmix64 | Bit-mixing 64-bit integer transform. Used internally and exposed for completeness. |
Note that these are non-cryptographic; passes BigCrush but is not collision-resistant. Use a cryptographic hash function for adversarial input.
Types
Dense row-major matrix backed by a single contiguous DoubleArray of length rows * cols. Element (i, j) lives at data[i * cols + j].
Dense double-precision vector backed by a flat DoubleArray. The default carrier when the caller already has a dense array or expects most entries to be populated.
Typed, serializable reference to a LongHasher by name. Carried by the discrete sketch specs and results in place of a bare string, and resolved to a live mixer via Hashers.resolve. Serializes transparently as its name, so the wire form stays a plain string and needs no custom serializer.
Registry resolving a LongHasher.name back to its live implementation. The sketch families serialize only the mixer's name; resolve reconstructs the function when a spec is materialized or a sketch result is queried. SplitMix64 is pre-registered.
Pluggable Long -> Long mixer used by the discrete sketch family (HyperLogLog, LinearCounting, MinHash, BloomFilter, CountMinSketch) to spread a key's bits across the full 64-bit range before bucketing. Distinct from Hasher64 (ByteArray -> Long): callers reduce a domain key to a Long first (e.g. via hash64), and the sketch then mixes that Long through here.
Read-only N-by-M matrix. Sealed alongside VectorView so snapshots round-trip through kotlinx.serialization with their concrete storage preserved. Public surface is read-only; shape, entry access, materialise to Array<DoubleArray>. Mutation, factorisations, and arithmetic are internal to kumulant.
Compressed sparse vector: parallel indices/values arrays of equal length, each holding one nonzero entry. Immutable from the caller's perspective; to change the sparsity pattern, rebuild.
Hashes byte arrays by feeding 8-byte little-endian chunks through splitmix64 and folding tail bytes in last. The starting state is the input length, so different-length zero-prefixed inputs hash distinctly. Stable byte-for-byte across platforms; currently the default for hash64.
Read-only N-vector with sealed dense / sparse backing. Callers see the same surface either way; query the size, read entries by index, materialise to a DoubleArray. The same VectorView is used everywhere a vector observation flows: as the input to com.eignex.kumulant.core.VectorStat.update and com.eignex.kumulant.core.RegressionStat.update, as the weights of every fitted com.eignex.kumulant.core.HasLinearModel result, as the argument to every predict(VectorView) method.
Properties
Short human-readable identifier for the math backend the current process resolved. Examples: "scalar" (any non-JVM target, or a JVM started without --add-modules=jdk.incubator.vector), "simd(4 lanes)" (JVM with AVX2), "simd(8 lanes)" (JVM with AVX-512). Print at startup to verify your runtime picked up the Vector API module.
Identifies the runtime math backend powering the SIMD-like primitives.
Identifies the runtime math backend powering the SIMD-like primitives.
Default LongHasher: the library's splitmix64 mixer. Pre-registered with Hashers.
Functions
Draw from N(mean, std^2) via Marsaglia & Tsang's Ziggurat algorithm. The fast path is one nextInt() + table lookup + comparison; ~97% of draws complete there. The slow path handles the tail beyond R = 3.4426 and the "wedge" regions outside each layer's inner rectangle.
Float overload of nextNormal; widens to Double, samples, narrows back.
Knuth's Poisson sampler at lambda=1; returns 0/1/2/... with mass e^{-1} / k!.
SplitMix64 - a fast, high-quality 64-bit mixer suitable for spreading sequential or low-entropy keys into a uniform 64-bit hash before feeding them to cardinality sketches. Output passes BigCrush; not collision-resistant (use a cryptographic hash if adversarial input is a concern).