Architecture (light)

This page is intentionally shaped: enough detail for a technical reader to evaluate whether Kairos is real engineering rather than a marketing surface, without becoming a recipe for re-implementation. The specific tuning numbers (decay rates, scoring weights, pressure-trigger signals, dream-stage internals) are deliberately omitted — they are tuned ongoing and are not part of the public contract.

How it works at a glance

Kairos is a wrapper around your Claude Code installation. It keeps Claude Code working against your repository: reading recent activity, deciding what to pick up next, shipping the change, sleeping briefly, then doing it again. The wrapper is what makes it work at all without human attention — managing state between turns, pacing the request budget, and supplying persistent memory that Claude Code can read and write through MCP. Sleep length adapts when --pace is set to your Anthropic rate-limit headroom.

The memory system

Memory has two tiers with asymmetric decay:

STM (short-term) — fast decay. Captures in-session observations, retries, anomalies. Entries that aren’t reinforced collapse below the archive threshold quickly.
LTM (long-term) — slow decay. Captures consolidated, durable facts: architecture decisions, conventions, distilled failure signatures. Survives long disuse.

The two-tier asymmetric-decay design follows the complementary learning systems idea from cognitive science: a fast hippocampal-like store for current state, a slow neocortical-like store for durable knowledge. Catastrophic interference makes a single-rate store impractical at this scale.

Hybrid recall

Recall combines four signals via late fusion:

Channel	What it measures
Lexical (BM25-style)	Surface-form overlap between the cue and the entry’s content
Dense (cosine over local embeddings)	Semantic similarity
Entry weight	The entry’s current lifecycle weight (decays over time, rises on citation)
Recency	Exponential decay since the entry was last reinforced

The exact channel weights are tuned ongoing. No learned reranker — late linear fusion, intentionally simple.

Local embeddings

Embeddings are produced by Qwen3-Embedding served from a local Ollama container. The model tier (0.6B / 4B / 8B) is selected from detected hardware. Vector storage and the lexical index share a single SQLite file via sqlite-vec and FTS5. No memory operation requires network outside the host.

The dream pass

A periodic offline consolidation. Triggered by a pressure score that combines several operational signals (work elapsed since last dream, unresolved retries, observation novelty, log volume, plus others). When pressure crosses a threshold, the next turn is a dream. A safety floor guarantees a dream every N turns; an anti-thrash floor prevents back-to-back dreams.

Pipeline shape

Slow-wave-style curation        — distill failure signatures from recent logs
Candidate sampling              — pick pairs of entries to consider together
Co-activation (LLM judgment)    — does this pair belong together?
Reflection (LLM rewrite)        — what should the merged / linked form look like?
Bucket-write + edge-propose     — stage candidates for the agent's review at wake

Crucially, the dream pass never directly mutates long-term memory. Its output is a set of staged candidates and proposed edges, which the agent triages on the next wake. Consolidations only become durable with explicit verdict. The two-stage analogy is to sleep-dependent consolidation: a slow-wave-style replay phase that finds the regularities, then a REM-style reflective phase that recombines them. Mileage on the analogy varies; treat it as inspiration rather than a load-bearing claim.

The agent surface

The reasoning agent does not have privileged direct database access. Every memory operation is exposed as a tool over the Model Context Protocol (MCP):

memory_recall, memory_recall_by_id, memory_recall_with_edges
memory_write_stm, memory_write_ltm, memory_reinforce
memory_promote, memory_demote, memory_decay_sweep
memory_edge_propose, memory_edge_approve, memory_edge_reject
wake_edge_triage, wake_ground_epic_topic
dream_bucket_pending, dream_bucket_verdict, dream_bucket_write
Plus a handful of audit / introspection tools

MCP-as-only-interface makes the memory system observable (every operation is a tool call), sandboxable (the policy layer can filter), and replaceable (a different memory backend could expose the same tool surface).

Eval

backant eval executes a fixed simulated-scenario replay against the current memory state alongside the production metrics. Intentionally adversarial — small, fixed, deterministic — designed to surface regressions in the memory layer that live operation wouldn’t notice. The replay corpus is curated and frozen per release. New scenarios are added when a real production issue would have been caught by one.

Context hygiene

Three complementary mechanisms keep the in-process state of the daemon fresh on long-lived deployments:

--fresh flag: manual escape hatch. Hard-resets .session/ and .state/ so the next turn re-reads memory from disk.
Freshness manager: a small meta-agent that periodically inspects recent signals (repeated failures, lesson churn, decay patterns) and decides whether the next turn should start fresh.
Reactive overflow detector: catches the Claude context-window overflow signature in the stream and writes the fresh-flag automatically.

The freshness layer is itself configurable in .backant.toml. Long-lived daemons accumulate stale judgment in their in-process context, and the cheapest correction is “start the next turn as if you’d just booted”.

​How it works at a glance

​The memory system

​Hybrid recall

​Local embeddings

​The dream pass

​Pipeline shape

​The agent surface

​Eval

​Context hygiene