Skip to content

Architecture Overview

Umi’s backend is a dependency-layered crate graph designed so the four structural shifts in production local-AI (§1) land by swapping a leaf crate, never refactoring the tree.

The full working design lives in the backend scratchpad. This page records the confirmed, stable decisions.


Bottom-up; an arrow means “depends on”. No cycles. Each tier compiles on its own.

umi-engine ← composition root (step 9 — not yet re-derived)
/ | \
umi-transport umi-agent ─────────────┐
| / | \ │
| umi-pipeline umi-ingest │
| \ | / │
+────── umi-db ──────+ umi-runtime ── umi-hardware
/ | \ / | \
umi-crypto umi-store umi-security
\ | / /
umi-error umi-primitives
CrateStatusResponsibility
umi-primitives✅ doneShared vocabulary: Provenance, Capability, Purpose, ModelChoice, ModelSuite, Backend, ProcessEvent. Pure serde.
umi-error✅ doneUnified Error + Result.
umi-crypto✅ donePassphrase-gated at-rest vault — Argon2id + XChaCha20-Poly1305 + zeroize.
umi-store✅ doneContent-addressed filesystem blob storage.
umi-security✅ doneHuman-in-the-loop consent gate — decide() capability policy + async ConsentGate. The MCP security boundary.
umi-silo✅ donesilo.toml manifest contract + validation.
umi-db✅ doneEmbedded SurrealDB — per-silo namespace isolation, migrations, durable job/step spine.
umi-runtime✅ doneInferenceEngine seam — OpenAI-compatible client, ModelRouter, ManagedSidecar (llama.cpp).
umi-pipeline✅ doneDurable Worker executor — claims and runs jobs from umi-db, survives restart.
umi-agent✅ doneV3 MCP seam — ToolProvider / ToolRegistry / McpClient + AgentRunner loop.
umi-hardware⬜ step 4Device topology + PlacementPlanner (V2 seam).
umi-ingest⬜ step 6Ingestor, IMAP source, classify.
umi-transport⬜ step 8LAN HTTP/WS server — the frontend API surface.
umi-engine⬜ step 9Composition root — re-exports + wires apps/umi-os/src-tauri.

Silo backends (standalone, no umi-* deps):

CrateStatusResponsibility
umi-silo-email (packages/silos/email/backend)✅ doneEmail domain model, MIME parsing, threading, triage heuristics.

Each seam is a research-grounded decision that keeps a shifting frontier swappable without refactoring the callers.

V1 — Swappable inference engine (umi-runtime::InferenceEngine). WebGPU, CUDA, Metal, CPU — all are InferenceEngine impls. Today’s managed llama.cpp sidecar and external OpenAI-compatible servers are both impls. See Inference.

V2 — Hardware topology + placement (umi-hardware::{DeviceTopology, PlacementPlanner}). Hardware is a topology of accelerators, not a single VRAM scalar. Placement (which device runs which layer) is a pluggable planner. SingleDevicePlanner ships first; a MaxFlowPlanner drops in later.

V3 — MCP tool registry (umi-agent::{ToolProvider, ToolRegistry}). Tools are a registry of providers — BuiltinTools and McpClient both implement ToolProvider. Every invocation passes through umi-security’s capability + consent gate. No tool bypasses it. See Silos.

V4 — Model suites (umi-primitives::ModelSuite). A model selection is a suite (target + optional draft) to enable speculative decoding. ModelSuite is in primitives today; the draft path activates when umi-runtime gains a second engine slot.


Inference: llama.cpp default, managed sidecar, no Provider enum. The 2026 local-AI stack converged on one OpenAI-compatible wire (Ollama / LM Studio / llama.cpp). The axis code branches on at runtime is WireApi { OpenAi, Ollama } + base_url, not a vendor brand. Provider enum is dropped. Full rationale: Inference.

One durable-execution spine, no graph-flow. The hard part of production agents is durable state management. umi-pipeline’s SurrealDB-backed job/step model is the single spine for both the pipeline and the agent loop. graph-flow is dropped.

umi-core deleted. The v1 monolith was removed at 0ec50d4. The rebuild re-derives each crate from requirements + research — no code is salvaged from v1.