# AI Second Brain — System Architecture > Version: 1.0.0 > Date: 2026-03-05 > Status: Design Document --- ## Table of Contents 1. [Overview](#overview) 2. [Core Components](#core-components) 3. [Data Flow](#data-flow) 4. [Database Schema](#database-schema) 5. [API Design](#api-design) 6. [Agent Architecture](#agent-architecture) 7. [Ingestion Pipeline](#ingestion-pipeline) 8. [Infrastructure](#infrastructure) 9. [Design Principles](#design-principles) --- ## Overview The AI Second Brain is a fully self-hosted, offline-capable knowledge management system that treats a Markdown vault (Obsidian/Logseq compatible) as the single source of truth. All AI capabilities—embeddings, retrieval, generation, and autonomous agents—run locally. ``` ┌─────────────────────────────────────────────────────────────────────┐ │ AI SECOND BRAIN │ │ │ │ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────────┐ │ │ │ EDITOR │───▶│ INGESTION │───▶│ STORAGE │───▶│ API │ │ │ │ LAYER │ │ PIPELINE │ │ LAYER │ │ LAYER │ │ │ └──────────┘ └────────────┘ └──────────┘ └────────────┘ │ │ │ │ │ │ Markdown Vault ┌────▼───────┐ │ │ (Obsidian/Logseq) │ AI LAYER │ │ │ │ (Ollama) │ │ │ └────────────┘ │ │ │ │ │ ┌────▼───────┐ │ │ │ INTERFACE │ │ │ │ LAYER │ │ │ └────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## Core Components ### 1. Editor Layer - **Vault directory**: `./vault/` — plain Markdown files, fully compatible with Obsidian and Logseq - **Format**: CommonMark + YAML frontmatter + `[[WikiLinks]]` - **Source of truth**: All knowledge lives here; the database is a derived index - **Sync**: File-system watching via `watchdog` triggers the ingestion pipeline ### 2. Storage Layer - **PostgreSQL 16** with **pgvector** extension - Stores: document metadata, text chunks, embeddings (1536-dim or 768-dim), extracted entities, wikilink relations - Vector index: IVFFlat or HNSW for ANN search ### 3. Processing Layer (Ingestion Pipeline) - File watcher monitors `./vault/**/*.md` - Parser: frontmatter extraction (YAML), Markdown-to-text, WikiLink graph extraction - Chunker: 500–800 token sliding window with 10% overlap - Embeddings: Ollama (`nomic-embed-text`) or `sentence-transformers` (offline fallback) - Idempotent: SHA-256 content hashing prevents redundant re-indexing ### 4. API Layer - **FastAPI** service exposing REST endpoints - Retrieval: hybrid search (vector similarity + full-text BM25-style) - Reranking: optional cross-encoder via `sentence-transformers` - Async throughout; connection pooling with `asyncpg` ### 5. AI Layer - **Ollama** sidecar providing local LLM inference (Mistral, Llama 3, Phi-3, etc.) - Embedding model: `nomic-embed-text` (768-dim) - Chat/generation model: configurable (default: `mistral`) - Agents use LangChain/LlamaIndex or direct Ollama API calls ### 6. Agent Layer - Long-running Python workers - Agents: Ingestion, Knowledge Linking, Tagging, Summarization, Maintenance - Message queue: Redis-backed job queue (ARQ) or simple PostgreSQL-backed queue - Scheduled via cron-style configuration ### 7. Interface Layer - **Next.js** (React) web application - Pages: Search, Chat, Document Viewer, Graph View (knowledge graph), Tag Browser - API client calls the FastAPI backend - Served as a Docker container (Node.js) --- ## Data Flow ### Ingestion Flow ``` Markdown File (vault/) │ ▼ File Watcher (watchdog) │ ▼ Parse & Validate ├── Extract YAML frontmatter (title, tags, date, aliases) ├── Extract WikiLinks [[target]] └── Convert Markdown → plain text │ ▼ Content Hash (SHA-256) └── Skip if unchanged (idempotent) │ ▼ Chunker (500-800 tokens, 10% overlap) │ ▼ Embedding Generation (Ollama nomic-embed-text) │ ▼ Store in PostgreSQL ├── documents table (metadata + full text) ├── chunks table (chunk text + embedding vector) ├── entities table (extracted NER if enabled) └── relations table (WikiLink graph edges) ``` ### Retrieval (RAG) Flow ``` User Query │ ▼ Query Embedding (Ollama) │ ▼ Hybrid Search ├── Vector similarity (pgvector cosine distance) └── Full-text search (PostgreSQL tsvector) │ ▼ Reranker (optional cross-encoder) │ ▼ Context Assembly (top-k chunks + metadata) │ ▼ LLM Generation (Ollama) │ ▼ Response + Citations ``` --- ## Database Schema ### Tables #### `documents` ```sql CREATE TABLE documents ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), path TEXT NOT NULL UNIQUE, -- relative path in vault title TEXT, content TEXT NOT NULL, -- full markdown source content_hash TEXT NOT NULL, -- SHA-256 for change detection frontmatter JSONB DEFAULT '{}', -- parsed YAML frontmatter tags TEXT[] DEFAULT '{}', aliases TEXT[] DEFAULT '{}', word_count INTEGER, created_at TIMESTAMPTZ DEFAULT now(), updated_at TIMESTAMPTZ DEFAULT now(), indexed_at TIMESTAMPTZ, fts_vector TSVECTOR -- full-text search index ); CREATE INDEX idx_documents_path ON documents(path); CREATE INDEX idx_documents_tags ON documents USING GIN(tags); CREATE INDEX idx_documents_fts ON documents USING GIN(fts_vector); ``` #### `chunks` ```sql CREATE TABLE chunks ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE, chunk_index INTEGER NOT NULL, content TEXT NOT NULL, token_count INTEGER, embedding VECTOR(768), -- nomic-embed-text dimension metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_chunks_document_id ON chunks(document_id); CREATE INDEX idx_chunks_embedding ON chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); ``` #### `entities` ```sql CREATE TABLE entities ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE, name TEXT NOT NULL, entity_type TEXT NOT NULL, -- PERSON, ORG, CONCEPT, etc. context TEXT, created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_entities_document_id ON entities(document_id); CREATE INDEX idx_entities_name ON entities(name); CREATE INDEX idx_entities_type ON entities(entity_type); ``` #### `relations` ```sql CREATE TABLE relations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), source_doc_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE, target_path TEXT NOT NULL, -- may not exist yet (forward links) target_doc_id UUID REFERENCES documents(id) ON DELETE SET NULL, relation_type TEXT DEFAULT 'wikilink', -- wikilink, tag, explicit context TEXT, -- surrounding text created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_relations_source ON relations(source_doc_id); CREATE INDEX idx_relations_target ON relations(target_doc_id); CREATE INDEX idx_relations_target_path ON relations(target_path); ``` #### `agent_jobs` ```sql CREATE TABLE agent_jobs ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), agent_type TEXT NOT NULL, -- ingestion, linking, tagging, etc. status TEXT DEFAULT 'pending', -- pending, running, done, failed payload JSONB DEFAULT '{}', result JSONB, error TEXT, created_at TIMESTAMPTZ DEFAULT now(), started_at TIMESTAMPTZ, completed_at TIMESTAMPTZ, retry_count INTEGER DEFAULT 0 ); CREATE INDEX idx_agent_jobs_status ON agent_jobs(status); CREATE INDEX idx_agent_jobs_type ON agent_jobs(agent_type); ``` #### `agent_logs` ```sql CREATE TABLE agent_logs ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), job_id UUID REFERENCES agent_jobs(id) ON DELETE SET NULL, agent_type TEXT NOT NULL, level TEXT DEFAULT 'info', message TEXT NOT NULL, metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX idx_agent_logs_job_id ON agent_logs(job_id); CREATE INDEX idx_agent_logs_created ON agent_logs(created_at DESC); ``` --- ## API Design ### Base URL: `http://localhost:8000/api/v1` | Method | Endpoint | Description | |--------|-----------------------|------------------------------------------| | POST | `/search` | Hybrid vector + full-text search | | POST | `/chat` | RAG chat with streaming response | | GET | `/document/{id}` | Get document by ID | | GET | `/document/path` | Get document by vault path | | POST | `/index` | Manually trigger index of a file | | POST | `/reindex` | Full vault reindex | | GET | `/related/{id}` | Get related documents by embedding sim | | GET | `/tags` | List all tags with counts | | GET | `/graph` | WikiLink graph (nodes + edges) | | GET | `/health` | Health check | | GET | `/stats` | System statistics | ### Request/Response Shapes #### POST `/search` ```json // Request { "query": "machine learning concepts", "limit": 10, "threshold": 0.7, "tags": ["ml", "ai"], "hybrid": true } // Response { "results": [ { "document_id": "uuid", "chunk_id": "uuid", "title": "Introduction to ML", "path": "notes/ml-intro.md", "content": "chunk text...", "score": 0.92, "tags": ["ml", "ai"], "highlight": "...matched text..." } ], "total": 42, "query_time_ms": 23 } ``` #### POST `/chat` ```json // Request (SSE stream) { "message": "What do I know about transformers?", "conversation_id": "optional-uuid", "context_limit": 5 } // Response (Server-Sent Events) data: {"token": "Transformers", "type": "token"} data: {"token": " are", "type": "token"} data: {"sources": [...], "type": "sources"} data: {"type": "done"} ``` --- ## Agent Architecture All agents inherit from a common `BaseAgent` class: ``` BaseAgent ├── IngestionAgent — watches vault, triggers indexing ├── LinkingAgent — discovers and creates knowledge links ├── TaggingAgent — auto-tags documents using LLM ├── SummarizationAgent — generates/updates document summaries └── MaintenanceAgent — detects orphans, broken links, stale content ``` ### Agent Lifecycle 1. Agent starts, reads config from environment 2. Polls `agent_jobs` table (or subscribes to PostgreSQL NOTIFY) 3. Claims job atomically (`UPDATE ... WHERE status='pending' RETURNING *`) 4. Executes job with retry logic (exponential backoff, max 3 retries) 5. Writes result / error back to `agent_jobs` 6. Logs to `agent_logs` ### Scheduling - **IngestionAgent**: event-driven (file watcher) + fallback poll every 30s - **LinkingAgent**: runs after every ingestion batch - **TaggingAgent**: runs on new/modified documents without tags - **SummarizationAgent**: runs on documents >1000 words without summary - **MaintenanceAgent**: scheduled daily at midnight --- ## Ingestion Pipeline ``` services/ingestion-worker/ ├── watcher.py — watchdog file system monitor ├── parser.py — frontmatter + markdown + wikilink parser ├── chunker.py — token-aware sliding window chunker ├── embedder.py — Ollama / sentence-transformers embeddings ├── indexer.py — PostgreSQL upsert logic └── pipeline.py — orchestrates the full ingestion flow ``` ### Chunking Strategy - **Method**: Sliding window, 500–800 tokens, 10% overlap - **Splitter**: Prefer semantic boundaries (paragraphs, headings) over hard token cuts - **Metadata preserved**: document_id, chunk_index, source heading path ### Embedding Strategy - **Primary**: Ollama `nomic-embed-text` (768-dim, fully offline) - **Fallback**: `sentence-transformers/all-MiniLM-L6-v2` (384-dim, local model) - **Batching**: 32 chunks per embedding request for efficiency --- ## Infrastructure ### Docker Services | Service | Image | Port | Description | |--------------------|------------------------------|-------|----------------------------------| | `postgres` | pgvector/pgvector:pg16 | 5432 | PostgreSQL + pgvector | | `ollama` | ollama/ollama:latest | 11434 | Local LLM inference | | `rag-api` | local/rag-api | 8000 | FastAPI retrieval service | | `ingestion-worker` | local/ingestion-worker | — | Vault watcher + indexer | | `agents` | local/agents | — | Background AI agents | | `web-ui` | local/web-ui | 3000 | Next.js frontend | | `redis` | redis:7-alpine | 6379 | Job queue + caching | ### Volume Mounts - `./vault:/vault:rw` — shared across all services needing vault access - `postgres_data:/var/lib/postgresql/data` — persistent database - `ollama_data:/root/.ollama` — pulled LLM models ### Network - Internal Docker network `second-brain-net` - External ports: `3000` (UI), `8000` (API), `11434` (Ollama) --- ## Design Principles 1. **Vault is source of truth** — database is always a derived index, fully rebuildable 2. **Offline-first** — zero external API calls required; all AI runs locally via Ollama 3. **Idempotent ingestion** — SHA-256 hashing ensures files are not re-indexed unless changed 4. **No vendor lock-in** — all components are open source and self-hosted 5. **Modular** — each service can be replaced independently (swap Ollama for another runtime) 6. **Graceful degradation** — system works without agents running; agents enhance, not gate 7. **Markdown compatibility** — vault works as a standalone Obsidian/Logseq vault at all times