kumulant

MinHash

@Serializable
@SerialName(value = "MinHash")
data class MinHash(val numHashes: Int = 128, val seed: Long = -3724518991637283867L, val hasher: HasherRef = HasherRef.SplitMix64) : DiscreteStatSpec<MinHashResult> (source)

Spec for MinHashStat: Jaccard-similarity signature over numHashes independent hash functions.

Constructors

Link copied to clipboard
constructor(numHashes: Int = 128, seed: Long = -3724518991637283867L, hasher: HasherRef = HasherRef.SplitMix64)

Properties

Link copied to clipboard

HasherRef for the mixer applied per signature slot; resolved via the Hashers registry.

Link copied to clipboard

Signature length; higher means better Jaccard accuracy at more memory.

Link copied to clipboard
val seed: Long

PRNG seed used to derive the per-hash salts.

Functions

Link copied to clipboard

Adapt a discrete spec into a series spec - the series sees value.toDouble() per update.

Link copied to clipboard

Wrap this discrete spec so updates are forwarded only when pred evaluates true.

Link copied to clipboard
fun <R : Result> DiscreteStatSpec<R>.materialize(concurrency: Concurrency = Concurrency.None): DiscreteStat<R>
fun StatSpec.materialize(concurrency: Concurrency = Concurrency.None): Stat<*>

Construct a live stat from any StatSpec, dispatching on its modality. Useful for code paths (like StatSchemaDef.materialize) that iterate over an erased Map<String, StatSpec> and don't statically know the modality.

Link copied to clipboard

Wrap this discrete spec to keep each update with probability rate; seed feeds the PRNG.

Link copied to clipboard

Wrap this discrete spec so it only sees one in every every updates.

Link copied to clipboard

Wrap this discrete spec to apply expr to every update before the inner stat sees it.

Link copied to clipboard

Wrap this discrete spec so every update's weight is multiplied by expr.eval(value.toDouble()).

Link copied to clipboard
fun <R : Result> DiscreteStatSpec<R>.windowed(durationMillis: Long, slices: Int = 10): DiscreteStatSpec<R>

Wrap this discrete spec in a sliding time window of durationMillis split into slices buckets.

Link copied to clipboard

Wrap this discrete spec so every update pushes the constant value regardless of input.

Link copied to clipboard

Wrap this discrete spec so every update applies the per-observation weight multiplier.

MinHash

constructor(numHashes: Int = 128, seed: Long = -3724518991637283867L, hasher: HasherRef = HasherRef.SplitMix64)(source)

hasher

HasherRef for the mixer applied per signature slot; resolved via the Hashers registry.

numHashes

Signature length; higher means better Jaccard accuracy at more memory.

seed

PRNG seed used to derive the per-hash salts.