Skip to content

Observability

knomit ships its own operability, all in the Go standard library — no sidecars, no agents. There are two layers:

  • Always on, zero-config — structured logging and native crash bundles. They cost nothing to leave running and need no port.
  • Opt-in, off by default — a runtime diagnostics port that exposes pprof, Prometheus metrics, expvar, and live process controls. Enable it only when you need to look inside a running process.

Logging is config-driven ([log] in knomit.toml, or KNOMIT_LOG_* — see Configuration). Two shapes:

formatDestinationUse
console (default)stderr, human-readablelocal / desktop
jsonstdout, structuredcontainers & log collectors
  • LevelKNOMIT_LOG_LEVEL (tracepanic, default info). It can also be changed live, without a restart, through the diagnostics port (below).
  • Rotating file sink — set log.file (or --log-file) to add a rotating JSON file in addition to the console/stdout sink. Rotation is bounded by max_size_mb (10), max_backups (3), and max_age_days (7). Leave it off in containers — the log driver owns rotation.
  • Slow-request log — any HTTP or MCP request slower than slow_request_ms (default 1000) is logged at WARN. Set 0 to disable.

These are always on and write under KNOMIT_HOME — no port required.

  • Crash bundles — a recovered HTTP/task panic, or a fatal panic on the serve path, writes a JSON bundle to KNOMIT_HOME/crashes/. Each bundle carries the timestamp, component, panic cause, the faulting stack, a full all-goroutine dump, runtime.MemStats, build info (Go version + VCS settings), and the tail of the log ring.
  • Crash-loop markerKNOMIT_HOME/running.marker is written at startup and removed on clean shutdown. If it is still present at the next boot, the prior run exited uncleanly (possible crash) and knomit logs a WARN pointing at crashes/. A panic unwind deliberately leaves the marker in place so a crash loop stays detectable.
  • KNOMIT_CRASH_LOG — redirects fd 2 (stderr) to an append-only file so Go runtime fatal tracebacks and CGO crashes (ONNX Runtime), which bypass the logger and write straight to fd 2, are persisted. Daemon-only; leave it unset in containers, where the log driver already captures fd 2.
  • Live goroutine dump (unix)kill -USR1 <pid> dumps every goroutine to KNOMIT_HOME/dumps/ without exiting, so a stuck-but-alive server can be inspected in place.

Set KNOMIT_RUNTIME_ADDR (or [runtime] addr) to a local address — e.g. localhost:6060 — to start a second, separate HTTP listener for diagnostics. It is off unless configured, mounted on its own port (never on the public API), and carries zero steady-state cost when disabled.

PathMethodPurpose
/runtime/statusGETUptime, goroutines, heap/sys memory, GC count, GOMAXPROCS, CPUs — plus repos, read_only, and the agent branch
/runtime/loglevelGET · POSTRead the global log level, or set it live: POST ?level=debug
/runtime/gcPOSTForce a garbage collection
/runtime/heapdumpPOSTWrite a heap profile to KNOMIT_HOME/dumps/heap-<ts>.pprof
/runtime/profile/mutexPOSTSet the mutex-profile fraction: ?rate=N (0 disables)
/runtime/profile/blockPOSTSet the block-profile rate: ?rate=N (0 disables)
/debug/pprof/GETStandard net/http/pprof index (+ /cmdline, /profile, /symbol, /trace)
/debug/varsGETexpvar JSON — includes the knomit metrics snapshot
/metricsGETPrometheus text exposition (v0.0.4)

pprof and expvar are mounted explicitly on this mux, not via the usual http.DefaultServeMux side-effect import — so they exist only here, never on the public API port.

Point the Go toolchain straight at the port:

Terminal window
# 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# live heap
go tool pprof http://localhost:6060/debug/pprof/heap

Mutex and block profiles are collected only after you turn them on (they are off by default for zero cost):

Terminal window
curl -sX POST 'localhost:6060/runtime/profile/mutex?rate=5'
go tool pprof http://localhost:6060/debug/pprof/mutex
Terminal window
curl -s localhost:6060/runtime/loglevel # {"level":"info"}
curl -sX POST 'localhost:6060/runtime/loglevel?level=debug'

/metrics renders a process-global registry in Prometheus text format. The registry is recorded into unconditionally — the numbers accumulate whether or not the port is ever enabled; the port only exposes them. The same registry is also published as the knomit expvar variable, so /debug/vars carries a JSON view of the identical data.

Always-present runtime gauges:

MetricMeaning
knomit_goroutinesLive goroutines
knomit_mem_alloc_bytesHeap bytes in use
knomit_mem_sys_bytesBytes obtained from the OS
knomit_gc_totalCompleted GC cycles

Application metrics:

MetricTypeMeaning
knomit_embed_inference_secondshistogramONNX embedding inference latency per batch
knomit_cypher_retry_totalcounterGraphQLite cypher() transient-collision retries (read contention)