From Stringly to Strongly Typed
Imagine a D&D character sheet. It has typed fields (Strength 1 to 18, Class is Fighter or Wizard or Rogue) and rules between them (Halflings can’t be Paladins, Hit Points depend on Class and Constitution). The blank sheet is the schema; a filled-in character is one instance of it.
If you only have one schema, you can just write a CharacterSheet data class with the right fields plus some validation, and call it a day. This post is about the harder version: writing the library behind the sheet, where every user brings their own. Pathfinder, 5e, Call of Cthulhu, all different fields, all different rules, all driven by your code. The type system has to help, even though you don’t know any of the user schemas in advance.
A few years ago I built combo (Constraint Oriented Multi-variate Bandit Optimization), an A/B-testing tool that picks variants subject to constraints between variables. I’ve been splitting the rewrite into two libraries: kumulant, a streaming aggregator with just the variables; and klause, an SMT solver with the variables and the rules between them. Both face the same design question: how does a user declare a typed schema, and how do call sites read variables back without dissolving into casts and string lookups?
I’ll start with kumulant since it’s the smaller half.
The reason to lean hard on the types: the more the compiler catches (a misnamed read, a wrong-typed access, an illegal combination), the less the user has to remember. The strongest form is Lean, where a successful compile is a proof; we’re nowhere near that, but every step toward the types is worth taking.
The alternative is to skip embedding and write the schema in its own language (MiniZinc, Protobuf, and friends) with a tool generating typed bindings. That works, at the cost of a toolchain and a wall between the schema and any host-language logic. I’d rather keep it embedded (the schema is just Kotlin), which is what this post is about.
1. Imperative registries
Every classical solver library starts the same way: a constructor for each kind of variable, references kept as locals, constraints as objects imposed onto the Problem. Choco in Java, Z3 via its Python and Java bindings, and combo’s first version all look like this.
val problem = Problem()
val budget = IntVar(problem, "budget", 1000, 4000)
val color = IntVar(problem, "color", 0, 2) // 0=RED, 1=GREEN, 2=BLUE
// "if color=RED, then budget ≤ 2000"
problem.impose(IfThen(XeqC(color, 0), XlteqC(budget, 2000)))
The natural upgrade in Kotlin is to hide the problem receiver inside a problem { ... } block and put each kind of variable in a sealed Variable<V, T> hierarchy so it knows its own type. This is the type-safe builder pattern that powers most Kotlin DSLs (e.g. kotlinx.html): lambdas with receivers, infix functions, and operator overloading, enough to make the body look like the original mathematical notation. That’s what combo does:
val p = problem {
val budget = int("budget", min = 1000, max = 4000)
val color = nominal("color", RED, GREEN, BLUE)
impose {
color[RED] implies budget.atMost(2000)
}
}
User-supplied relations reference typed value literals like color[RED] against variables in scope, not strings.
Underneath, it’s the same problem as the bare solver, though. To read a variable at a call site you either keep the typed reference around, threading it through every function that touches it, or fall back to by-name lookup that returns a Variable<*, *> you have to cast. With nested scopes the references fan out faster than you can keep clean, and the common fallback is a Map<String, Var> keyed by name, with reads as unchecked casts.
![Is this a pigeon meme: anime man in glasses points at a butterfly labeled problem["budget"] as IntVar, captioned 'Is this type safety?'](/images/knqyRlNpQQ-400.jpeg)
Building the Problem is also imperative, not declarative: the body runs in order, and there’s no static structure to inspect or serialize without first executing the lambda.
2. Arity-indexed products
Pivoting to kumulant here: the same schema problem shows up for streaming statistics, where each “variable” is an accumulator like Mean or Sum and call sites need typed reads of its snapshot.
Next attempt: give every variable a value that carries its type with it. A call site should be able to write snap.mean and get a typed value end-to-end, no cast.
Encode the schema as a product: a tuple where each position holds a variable, and the type carries the arity.
data class Stat2<A : Stat, B : Stat>(val first: A, val second: B) // accumulator product
data class Result2<A : Result, B : Result>(val first: A, val second: B) // snapshot from .read()
// Stat3 / Result3 / … same shape, one per arity
operator fun <A : Stat, B : Stat> A.plus(other: B): Stat2<A, B> = Stat2(this, other)
// Per-trait accessors: one extension per (position, trait) combo
val <B : HasMean> Result2<*, B>.mean get() = second.mean
val <A : HasMean> Result2<A, *>.mean get() = first.mean
The + is defined on Stat itself. A schema is built by adding stats together; the type carries the arity, and a StatGroup wraps the schema as the runtime accumulator:
val schema = Mean("avg_ms") + Sum("total_ms") // Stat2<Mean, Sum>
val group = StatGroup(schema)
group.update(105.0)
group.update(80.0)
val snap = group.read() // Result2<MeanResult, SumResult>
snap.first.mean // Double, typed
snap.second.sum // Double, but "second" is positional
snap.mean // works because only one position has HasMean
// Two stats sharing a trait kills the trait extension:
val decaySchema = DecayingSum(15.minutes) + DecayingSum(1.minutes)
val decayGroup = StatGroup(decaySchema)
decayGroup.read().sum // ambiguous; back to .first.sum / .second.sum
The types are fully preserved: add a stat and the type changes; combine two schemas and the types fuse via +. But at the call site you write snap.first.mean, and first is the problem. Position isn’t name. Reorder the stats and call sites change. And as soon as two stats share a trait (the DecayingSum + DecayingSum above), the trait extensions become ambiguous and you fall back to .first.sum / .second.sum anyway.

I built the N×M expansion with a KSP processor that generated a trait accessor per position-trait combo, and it compiled. But the abstraction leaked: every call site had to import the right extensions for the traits it read, the parameterized-instances pattern still had no clean read, and the whole thing felt like a hack. Languages with higher-kinded or dependent types make this natural (shapeless is the closest analogue on the JVM), but that’s not exactly mainstream territory. Without those features you’re encoding a record with positional bookkeeping. I cut it.
3. Typed-key schemas
The fix is to bundle the name and the type into one value: a heterogeneous map keyed by a typed key. Kumulant does it in two layers: a typed key as the plumbing, and a class on top of it for declaring lots of them at once. The plumbing first, a StatKey<R> paired with a GroupResult:
interface Result
interface Stat<R : Result> {
fun update(value: Double)
fun read(): R
}
open class StatKey<R : Result>(val name: String, val stat: Stat<R>)
@Serializable
data class GroupResult(val results: Map<String, Result>) : Result {
operator fun <R : Result> get(key: StatKey<R>): R =
results[key.name] as R
}
class StatGroup(val keys: List<StatKey<*>>) : Stat<GroupResult> {
override fun update(value: Double) { keys.forEach { it.stat.update(value) } }
override fun read(): GroupResult =
GroupResult(keys.associate { it.name to it.stat.read() })
}
Now keys can be declared directly, and the typed get returns the right result type at the call site:
val mean = StatKey("mean", Mean())
val count = StatKey("count", Sum())
val group = StatGroup(listOf(mean, count))
group.update(105.0)
val snap = group.read()
snap[mean] // MeanResult, no cast at the call site
snap[count] // SumResult
Each StatKey<R> pairs a name with the type it indexes. The container is a Map<String, Result> underneath, but the typed get returns the declared type, so the call site never sees the cast. Compared to the imperative registry, the strings still exist, but they’re bound to the key value, not typed by the user at every read. The key is the variable’s identity.
Declaring keys by hand is clunky: you’d be tracking them yourself for the StatGroup, and writing each name twice (once on the property, once as a string). Kumulant uses property delegates on a singleton object instead. Same pattern as JetBrains’ Exposed, minus the duplicate name.
abstract class StatSchema {
private val _keys = mutableListOf<StatKey<*>>()
val keys: List<StatKey<*>> get() = _keys
fun <R : Result> stat(s: Stat<R>): PropertyDelegateProvider<StatSchema, StatKey<R>>
// returns a delegate that registers _keys += StatKey(propertyName, s) and yields the key
// group(schema): same idea, registers a nested StatGroup as one key
}
fun StatGroup(schema: StatSchema): StatGroup = StatGroup(schema.keys)
User schemas are objects, and every property is a by-delegate:
object HttpMetrics : StatSchema() {
val requests by stat(Sum().withValue(1.0))
// tracks p50, p99, and p999 latency quantiles
val latencyMs by stat(DDSketch(probabilities = doubleArrayOf(0.5, 0.99, 0.999)))
}
object ServiceMetrics : StatSchema() {
val requests by stat(Sum().withValue(1.0))
val billableMsTotal by stat(Sum())
val http by group(HttpMetrics)
val db by group(DbMetrics)
}
val service = StatGroup(ServiceMetrics)
service.update(120.0); service.update(80.0)
val snap = service.read()
snap[ServiceMetrics.requests].sum // Double, typed
snap[ServiceMetrics.billableMsTotal].sum // Double, typed
snap[ServiceMetrics.http, { requests }].sum // dotted lookup into a nested group
Now the schema is a class. Each property is a typed StatKey<R> whose result type matches the stat that constructed it. No magic strings to sync, no references to thread between definition and use, no imperative builder to run before the schema exists; the schema declaration is the structure.
For streaming statistics, this is the design I’m currently happy with. The remaining tradeoffs sit inside individual variables (e.g. derived variables with non-invertible projections need the programmer to handle merge correctness), not in the schema design.
Other languages bake schema-as-types in more directly (Swift KeyPaths, Rust derive macros, TypeScript mapped types plus zod). Kotlin doesn’t have a dedicated mechanism, but the schema-object trick is established enough through Exposed.

This design doesn’t address constraints between variables. Aggregation is fine; “variable A must always be less than variable B” or “variable C can only be set when variable D fires” has nowhere to live. Fine for kumulant, but klause’s case still needs them.
Where this goes next
Klause adds constraints, which is its own design problem (DSL, AST, wire format) and not one I’d want to cram into this post.
I’ve pulled the pattern out as Eignex/skema, now at 0.1.0 (Swedish skema means template). It’s a Kotlin Multiplatform library where one definition does double duty: typed compile-time access on the producer side, and a kotlinx-serializable wire format so a consumer that doesn’t share your Kotlin code can still decode the schema and walk it by name. kumulant, klause, and combo will all settle onto it eventually, just haven’t gotten there yet.
Anyway, that’s where the design sits today. If you’ve worked something like this out, especially in a language without dependent types, I’d love to hear about it. Combo, kumulant, and klause are at github.com/Eignex/combo, github.com/Eignex/kumulant, and github.com/Eignex/klause if you want to poke at them.
