Architecture Overview

Umi’s backend is a dependency-layered crate graph designed so the four structural shifts in production local-AI (§1) land by swapping a leaf crate, never refactoring the tree.

The full working design lives in the backend scratchpad. This page records the confirmed, stable decisions.

The crate graph

Bottom-up; an arrow means “depends on”. No cycles. Each tier compiles on its own.

                         umi-engine  ← composition root (step 9 — not yet re-derived)
                        /     |     \
              umi-transport  umi-agent ─────────────┐
                    |       /   |   \               │
                    |  umi-pipeline umi-ingest      │
                    |       \   |   /               │
                    +────── umi-db ──────+    umi-runtime ── umi-hardware
                            /  |  \            /   |   \
                  umi-crypto umi-store umi-security
                            \  |  /  /
                          umi-error  umi-primitives

Crate	Status	Responsibility
`umi-primitives`	✅ done	Shared vocabulary: `Provenance`, `Capability`, `Purpose`, `ModelChoice`, `ModelSuite`, `Backend`, `ProcessEvent`. Pure serde.
`umi-error`	✅ done	Unified `Error` + `Result`.
`umi-crypto`	✅ done	Passphrase-gated at-rest vault — Argon2id + XChaCha20-Poly1305 + zeroize.
`umi-store`	✅ done	Content-addressed filesystem blob storage.
`umi-security`	✅ done	Human-in-the-loop consent gate — `decide()` capability policy + async `ConsentGate`. The MCP security boundary.
`umi-silo`	✅ done	`silo.toml` manifest contract + validation.
`umi-db`	✅ done	Embedded SurrealDB — per-silo namespace isolation, migrations, durable job/step spine.
`umi-runtime`	✅ done	`InferenceEngine` seam — OpenAI-compatible client, `ModelRouter`, `ManagedSidecar` (llama.cpp).
`umi-pipeline`	✅ done	Durable `Worker` executor — claims and runs jobs from `umi-db`, survives restart.
`umi-agent`	✅ done	V3 MCP seam — `ToolProvider` / `ToolRegistry` / `McpClient` + `AgentRunner` loop.
`umi-hardware`	⬜ step 4	Device topology + `PlacementPlanner` (V2 seam).
`umi-ingest`	⬜ step 6	`Ingestor`, IMAP source, classify.
`umi-transport`	⬜ step 8	LAN HTTP/WS server — the frontend API surface.
`umi-engine`	⬜ step 9	Composition root — re-exports + wires `apps/umi-os/src-tauri`.

Silo backends (standalone, no umi-* deps):

Crate	Status	Responsibility
`umi-silo-email` (`packages/silos/email/backend`)	✅ done	Email domain model, MIME parsing, threading, triage heuristics.

The four future-proofing seams

Each seam is a research-grounded decision that keeps a shifting frontier swappable without refactoring the callers.

V1 — Swappable inference engine (umi-runtime::InferenceEngine). WebGPU, CUDA, Metal, CPU — all are InferenceEngine impls. Today’s managed llama.cpp sidecar and external OpenAI-compatible servers are both impls. See Inference.

V2 — Hardware topology + placement (umi-hardware::{DeviceTopology, PlacementPlanner}). Hardware is a topology of accelerators, not a single VRAM scalar. Placement (which device runs which layer) is a pluggable planner. SingleDevicePlanner ships first; a MaxFlowPlanner drops in later.

V3 — MCP tool registry (umi-agent::{ToolProvider, ToolRegistry}). Tools are a registry of providers — BuiltinTools and McpClient both implement ToolProvider. Every invocation passes through umi-security’s capability + consent gate. No tool bypasses it. See Silos.

V4 — Model suites (umi-primitives::ModelSuite). A model selection is a suite (target + optional draft) to enable speculative decoding. ModelSuite is in primitives today; the draft path activates when umi-runtime gains a second engine slot.

Foundational decisions

Inference: llama.cpp default, managed sidecar, no Provider enum. The 2026 local-AI stack converged on one OpenAI-compatible wire (Ollama / LM Studio / llama.cpp). The axis code branches on at runtime is WireApi { OpenAi, Ollama } + base_url, not a vendor brand. Provider enum is dropped. Full rationale: Inference.

One durable-execution spine, no graph-flow. The hard part of production agents is durable state management. umi-pipeline’s SurrealDB-backed job/step model is the single spine for both the pipeline and the agent loop. graph-flow is dropped.

umi-core deleted. The v1 monolith was removed at 0ec50d4. The rebuild re-derives each crate from requirements + research — no code is salvaged from v1.