Skip to content

Emergent discovery

The synthesis pipeline maintains and grows the corpus from what is already written. Discovery goes one step further: it surfaces facts nobody wrote down, by inference — both forward (new consequences) and backward (unstated keystones).

The most load-bearing facts are the least likely to be written down — precisely because they are so foundational that everyone assumes them. Retrieval finds what is similar to your query, but a keystone underwrites facts that are dissimilar to each other, so similarity can never surface it. That is not a tuning problem; it is structural. knomit reads the shape of the graph instead: a tag shared across two otherwise-unrelated clusters is the seam where an unwritten premise hides.

similarity cluster A similarity cluster B shared token domain / entity · different clusters fact A fact B E — keystone unstated · origin: discovered
Embedding similarity only ever draws the dense within-cluster edges — it is structurally blind to the cross-cluster token. The bridge is that missed link; the keystone is the load-bearing fact it implies — the one nobody wrote down precisely because it underwrites things that look unrelated.
forward · knomit_review {A, B, C} → E a consequence that follows from the bridged facts but none states alone → new synthesis
backward · knomit_hypothesize E → {A, B, C} an unstated premise that, if false, breaks them → new hypothesis, ranked by blast radius

A bridge is two facts that share a domain or entity yet live in different similarity clusters (distinct Louvain communities over the embedding graph). That cross-cluster shared token is the signal similarity missed. Discovery seeds from bridges and runs them in two directions.

DirectionShapeOperationProduces
Forwardconsequence — E follows from {A,B,…} but no single fact states itknomit_reviewsynthesis fact
Backwardkeystone — unstated premise E that, if false, invalidates {A,B,…}knomit_hypothesizehypothesis fact, ranked by blast radius

Both write origin: discovered. The boundary is deliberate: synthesis emits synthesis facts, hypothesize emits hypothesis facts — discovery only adds a direction to each, never a new fact type.

effort (normal · medium · high) is a dial on the existing review / hypothesize operations rather than a separate tool — and it doubles as a budget. normal is the default and reproduces pre-discovery behaviour byte-for-byte (a hard invariant); medium / high engage the structural-bridge engine (single-hop bridges at medium, multi-hop at high). On an unfiltered run, effort also bounds how many bridge candidates are considered, so a high run never attempts the whole corpus.

An optional scope filter (domain / entities args) bounds the seed pool; empty = whole corpus. A scoped run is exempt from the synthesis watermark, so you can re-target discovery at one area without disturbing unscoped runs. Discovery never feeds on its own output — origin: discovered facts are excluded as bridge seeds.

There is no second adversarial model — the connected MCP agent is the sole reasoner. Quality is enforced by a strict default-skip prompt plus an ingest gate chain:

  • KNOMIT_DISCOVERY_CONFIDENCE_THRESHOLD (default 0.5) — minimum confidence to write a proposal.
  • KNOMIT_DISCOVERY_BLAST_RADIUS_THRESHOLD (default 1, 0 disables) — a backward keystone’s anchor must transitively reach at least this many live dependents (transitive reverse-DERIVED_FROM count, live at HEAD).
  • Embedding dedup against the corpus rejects a proposal already stated elsewhere.

Bridge behaviour is per-repo configurable via KNOMIT_DISCOVERY_BRIDGE (domain · entity · both, default both).

Discovery is one value of a fact’s origin — the record of how it came to exist, orthogonal to type and kind: