kumulant

HyperLogLogStat

class HyperLogLogStat(val precision: Int = 14, val hasher: LongHasher = SplitMix64, val concurrency: Concurrency = Concurrency.None) : DiscreteStat<HyperLogLogResult> (source)

HyperLogLog cardinality estimator with a small-range linear-counting fallback.

Allocates m = 2^precision byte-sized registers and uses the standard alpha_m * m^2 / Sum 2^-Mj estimator, switching to linear counting on small inputs (rawE <= 2.5*m with at least one empty register) to eliminate the well-known HLL bias near zero. Inputs are run through the hasher (default SplitMix64) before bucketing so callers can pass raw IDs without worrying about hash quality.

Memory: m Longs (registers) plus a counter. Standard error is ~ 1.04/sqrtm (~ 0.81% at the default precision = 14). 64-bit hashing makes the original HLL large-range correction unnecessary.

This is plain HLL with the standard small-range linear-counting fix; not HLL++. The Heule et al. (2013) empirical bias-correction tables are not implemented; empirically, with SplitMix64 prehashing the medium-range bias stays inside 1.04/sqrtm across m ... 5*m (see the accuracy test in HyperLogLogTest). The sparse representation is also omitted; the linear-counting fallback already gives near-exact estimates at low cardinalities.

Use cases: distinct-value estimation under tight memory (count unique users, unique error fingerprints, unique IPs). Reach for LinearCountingStat instead when cardinality is bounded and known to stay below the bitset size.

Memory: O(m) = O(2^precision) bytes, plus a totalSeen counter.

Update: O(1) per observation; one hash + register CAS-max.

Concurrency: Per-register single-cell CAS-max loop on a striped Long array; totalSeen is a separate atomic add. Lock-free and exact under every Concurrency level; racing writers on the same register preserve the max-over-incoming-rho invariant via CAS retry.

Constructors

Link copied to clipboard
constructor(precision: Int = 14, hasher: LongHasher = SplitMix64, concurrency: Concurrency = Concurrency.None)

Properties

Link copied to clipboard
open override val concurrency: Concurrency

The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:

Link copied to clipboard

Mixer applied to each input before bucketing; defaults to SplitMix64.

Link copied to clipboard

Number of register-index bits; memory is 2^precision bytes.

Functions

Link copied to clipboard
open override fun create(concurrency: Concurrency? = null): HyperLogLogStat

Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.

Link copied to clipboard
open override fun merge(values: HyperLogLogResult)

Fold another accumulator's snapshot into this one. The unit of merge is the immutable Result; not a live Stat; which is what lets the merge cross a process boundary. Many workers track slices of the same stream, call read periodically, ship snapshots to a coordinator, and the coordinator merges them in.

Link copied to clipboard
open override fun read(timestampNanos: Long = currentTimeNanos()): HyperLogLogResult

Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.

Link copied to clipboard
open override fun reset()

Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.

Link copied to clipboard
open fun update(value: Long, weight: Double = 1.0)

Record an observation with the given weight, stamped at the current time.

open override fun update(value: Long, timestampNanos: Long, weight: Double = 1.0)

Record an observation at timestampNanos with the given weight. Time matters for rate-shaped discrete stats; for cardinality / sketch stats the stamp is dropped.

HyperLogLogStat

constructor(precision: Int = 14, hasher: LongHasher = SplitMix64, concurrency: Concurrency = Concurrency.None)(source)

concurrency

open override val concurrency: Concurrency(source)

The thread-safety contract this stat was constructed with. Each stat picks the cell-encoding and lock strategy that honours this contract for its mathematical structure:

Picked at construction; immutable after.

create

open override fun create(concurrency: Concurrency? = null): HyperLogLogStat(source)

Spawn a fresh accumulator with the same configuration. Optionally override the Concurrency; useful for materialising a wire spec at a different concurrency level than the source.

The returned stat is independent: its state starts at the configured baseline, not at the source's current state. Each modality subtype narrows the return type so chaining doesn't lose the modality.

hasher

Mixer applied to each input before bucketing; defaults to SplitMix64.

merge

open override fun merge(values: HyperLogLogResult)(source)

Fold another accumulator's snapshot into this one. The unit of merge is the immutable Result; not a live Stat; which is what lets the merge cross a process boundary. Many workers track slices of the same stream, call read periodically, ship snapshots to a coordinator, and the coordinator merges them in.

Most stat families implement merge exactly (Chan-style parallel formulas for Welford, cell-wise additions for histograms, cell-wise max for HLL). SGD-based regressors merge approximately; they have no second-moment information for the principled combine. Each stat's KDoc documents its merge semantics.

precision

Number of register-index bits; memory is 2^precision bytes.

read

open override fun read(timestampNanos: Long = currentTimeNanos()): HyperLogLogResult(source)

Materialise the current state as an immutable Result. Reads never mutate, so the caller can read as often as it likes without affecting the stream.

Snapshot consistency depends on the configured Concurrency. Under Concurrency.Strict / Concurrency.HighWrite a read locks against writers so coupled cells stay consistent. Under Concurrency.Relaxed the cells race and the snapshot may drift by ULPs of the workload under heavy contention; the drift is bounded and the read never throws.

timestampNanos is the read timestamp. Stats that don't care about time silently drop it; stats that do (rates, decay families, recency, windowed wrappers) use it as the ordering signal.

reset

open override fun reset()(source)

Reset the stat to its prior-seeded baseline. Equivalent to constructing a fresh stat with the same configuration, but in place; keeps the same Concurrency and any per-stat tunables.

update

open override fun update(value: Long, timestampNanos: Long, weight: Double = 1.0)(source)

Record an observation at timestampNanos with the given weight. Time matters for rate-shaped discrete stats; for cardinality / sketch stats the stamp is dropped.