Architecture Overview
Architecture Overview
Section titled “Architecture Overview”Umi’s backend is a dependency-layered crate graph designed so the four structural shifts in production local-AI (§1) land by swapping a leaf crate, never refactoring the tree.
The full working design lives in the backend scratchpad. This page records the confirmed, stable decisions.
The crate graph
Section titled “The crate graph”Bottom-up; an arrow means “depends on”. No cycles. Each tier compiles on its own.
umi-engine ← composition root (step 9 — not yet re-derived) / | \ umi-transport umi-agent ─────────────┐ | / | \ │ | umi-pipeline umi-ingest │ | \ | / │ +────── umi-db ──────+ umi-runtime ── umi-hardware / | \ / | \ umi-crypto umi-store umi-security \ | / / umi-error umi-primitives| Crate | Status | Responsibility |
|---|---|---|
umi-primitives | ✅ done | Shared vocabulary: Provenance, Capability, Purpose, ModelChoice, ModelSuite, Backend, ProcessEvent. Pure serde. |
umi-error | ✅ done | Unified Error + Result. |
umi-crypto | ✅ done | Passphrase-gated at-rest vault — Argon2id + XChaCha20-Poly1305 + zeroize. |
umi-store | ✅ done | Content-addressed filesystem blob storage. |
umi-security | ✅ done | Human-in-the-loop consent gate — decide() capability policy + async ConsentGate. The MCP security boundary. |
umi-silo | ✅ done | silo.toml manifest contract + validation. |
umi-db | ✅ done | Embedded SurrealDB — per-silo namespace isolation, migrations, durable job/step spine. |
umi-runtime | ✅ done | InferenceEngine seam — OpenAI-compatible client, ModelRouter, ManagedSidecar (llama.cpp). |
umi-pipeline | ✅ done | Durable Worker executor — claims and runs jobs from umi-db, survives restart. |
umi-agent | ✅ done | V3 MCP seam — ToolProvider / ToolRegistry / McpClient + AgentRunner loop. |
umi-hardware | ⬜ step 4 | Device topology + PlacementPlanner (V2 seam). |
umi-ingest | ⬜ step 6 | Ingestor, IMAP source, classify. |
umi-transport | ⬜ step 8 | LAN HTTP/WS server — the frontend API surface. |
umi-engine | ⬜ step 9 | Composition root — re-exports + wires apps/umi-os/src-tauri. |
Silo backends (standalone, no umi-* deps):
| Crate | Status | Responsibility |
|---|---|---|
umi-silo-email (packages/silos/email/backend) | ✅ done | Email domain model, MIME parsing, threading, triage heuristics. |
The four future-proofing seams
Section titled “The four future-proofing seams”Each seam is a research-grounded decision that keeps a shifting frontier swappable without refactoring the callers.
V1 — Swappable inference engine (umi-runtime::InferenceEngine). WebGPU, CUDA, Metal, CPU — all are InferenceEngine impls. Today’s managed llama.cpp sidecar and external OpenAI-compatible servers are both impls. See Inference.
V2 — Hardware topology + placement (umi-hardware::{DeviceTopology, PlacementPlanner}). Hardware is a topology of accelerators, not a single VRAM scalar. Placement (which device runs which layer) is a pluggable planner. SingleDevicePlanner ships first; a MaxFlowPlanner drops in later.
V3 — MCP tool registry (umi-agent::{ToolProvider, ToolRegistry}). Tools are a registry of providers — BuiltinTools and McpClient both implement ToolProvider. Every invocation passes through umi-security’s capability + consent gate. No tool bypasses it. See Silos.
V4 — Model suites (umi-primitives::ModelSuite). A model selection is a suite (target + optional draft) to enable speculative decoding. ModelSuite is in primitives today; the draft path activates when umi-runtime gains a second engine slot.
Foundational decisions
Section titled “Foundational decisions”Inference: llama.cpp default, managed sidecar, no Provider enum. The 2026 local-AI stack converged on one OpenAI-compatible wire (Ollama / LM Studio / llama.cpp). The axis code branches on at runtime is WireApi { OpenAi, Ollama } + base_url, not a vendor brand. Provider enum is dropped. Full rationale: Inference.
One durable-execution spine, no graph-flow. The hard part of production agents is durable state management. umi-pipeline’s SurrealDB-backed job/step model is the single spine for both the pipeline and the agent loop. graph-flow is dropped.
umi-core deleted. The v1 monolith was removed at 0ec50d4. The rebuild re-derives each crate from requirements + research — no code is salvaged from v1.