Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.backant.io/llms.txt

Use this file to discover all available pages before exploring further.

This page is intentionally shaped: enough detail for a technical reader to evaluate whether Kairos is real engineering rather than a marketing surface, without becoming a recipe for re-implementation. The specific tuning numbers (decay rates, scoring weights, pressure-trigger signals, dream-stage internals) are deliberately omitted — they are tuned ongoing and are not part of the public contract.

How it works at a glance

Kairos is a wrapper around your Claude Code installation. It keeps Claude Code working against your repository: reading recent activity, deciding what to pick up next, shipping the change, sleeping briefly, then doing it again. The wrapper is what makes it work at all without human attention — managing state between turns, pacing the request budget, and supplying persistent memory that Claude Code can read and write through MCP. Sleep length adapts when --pace is set to your Anthropic rate-limit headroom.

The memory system

Memory has two tiers with asymmetric decay:
  • STM (short-term) — fast decay. Captures in-session observations, retries, anomalies. Entries that aren’t reinforced collapse below the archive threshold quickly.
  • LTM (long-term) — slow decay. Captures consolidated, durable facts: architecture decisions, conventions, distilled failure signatures. Survives long disuse.
The two-tier asymmetric-decay design follows the complementary learning systems idea from cognitive science: a fast hippocampal-like store for current state, a slow neocortical-like store for durable knowledge. Catastrophic interference makes a single-rate store impractical at this scale.

Hybrid recall

Recall combines four signals via late fusion:
ChannelWhat it measures
Lexical (BM25-style)Surface-form overlap between the cue and the entry’s content
Dense (cosine over local embeddings)Semantic similarity
Entry weightThe entry’s current lifecycle weight (decays over time, rises on citation)
RecencyExponential decay since the entry was last reinforced
The exact channel weights are tuned ongoing. No learned reranker — late linear fusion, intentionally simple.

Local embeddings

Embeddings are produced by Qwen3-Embedding served from a local Ollama container. The model tier (0.6B / 4B / 8B) is selected from detected hardware. Vector storage and the lexical index share a single SQLite file via sqlite-vec and FTS5. No memory operation requires network outside the host.

The dream pass

A periodic offline consolidation. Triggered by a pressure score that combines several operational signals (work elapsed since last dream, unresolved retries, observation novelty, log volume, plus others). When pressure crosses a threshold, the next turn is a dream. A safety floor guarantees a dream every N turns; an anti-thrash floor prevents back-to-back dreams.

Pipeline shape

1. Slow-wave-style curation        — distill failure signatures from recent logs
2. Candidate sampling              — pick pairs of entries to consider together
3. Co-activation (LLM judgment)    — does this pair belong together?
4. Reflection (LLM rewrite)        — what should the merged / linked form look like?
5. Bucket-write + edge-propose     — stage candidates for the agent's review at wake
Crucially, the dream pass never directly mutates long-term memory. Its output is a set of staged candidates and proposed edges, which the agent triages on the next wake. Consolidations only become durable with explicit verdict. The two-stage analogy is to sleep-dependent consolidation: a slow-wave-style replay phase that finds the regularities, then a REM-style reflective phase that recombines them. Mileage on the analogy varies; treat it as inspiration rather than a load-bearing claim.

The agent surface

The reasoning agent does not have privileged direct database access. Every memory operation is exposed as a tool over the Model Context Protocol (MCP):
  • memory_recall, memory_recall_by_id, memory_recall_with_edges
  • memory_write_stm, memory_write_ltm, memory_reinforce
  • memory_promote, memory_demote, memory_decay_sweep
  • memory_edge_propose, memory_edge_approve, memory_edge_reject
  • wake_edge_triage, wake_ground_epic_topic
  • dream_bucket_pending, dream_bucket_verdict, dream_bucket_write
  • Plus a handful of audit / introspection tools
MCP-as-only-interface makes the memory system observable (every operation is a tool call), sandboxable (the policy layer can filter), and replaceable (a different memory backend could expose the same tool surface).

Eval

backant eval executes a fixed simulated-scenario replay against the current memory state alongside the production metrics. Intentionally adversarial — small, fixed, deterministic — designed to surface regressions in the memory layer that live operation wouldn’t notice. The replay corpus is curated and frozen per release. New scenarios are added when a real production issue would have been caught by one.

Context hygiene

Three complementary mechanisms keep the in-process state of the daemon fresh on long-lived deployments:
  • --fresh flag: manual escape hatch. Hard-resets .session/ and .state/ so the next turn re-reads memory from disk.
  • Freshness manager: a small meta-agent that periodically inspects recent signals (repeated failures, lesson churn, decay patterns) and decides whether the next turn should start fresh.
  • Reactive overflow detector: catches the Claude context-window overflow signature in the stream and writes the fresh-flag automatically.
The freshness layer is itself configurable in .backant.toml. Long-lived daemons accumulate stale judgment in their in-process context, and the cheapest correction is “start the next turn as if you’d just booted”.