second-brain/docs/architecture.md

# AI Second Brain — System Architecture

> Version: 1.0.0
> Date: 2026-03-05
> Status: Design Document

---

## Table of Contents

1. [Overview](#overview)
2. [Core Components](#core-components)
3. [Data Flow](#data-flow)
4. [Database Schema](#database-schema)
5. [API Design](#api-design)
6. [Agent Architecture](#agent-architecture)
7. [Ingestion Pipeline](#ingestion-pipeline)
8. [Infrastructure](#infrastructure)
9. [Design Principles](#design-principles)

---

## Overview

The AI Second Brain is a fully self-hosted, offline-capable knowledge management system that treats a Markdown vault (Obsidian/Logseq compatible) as the single source of truth. All AI capabilities—embeddings, retrieval, generation, and autonomous agents—run locally.

```
┌─────────────────────────────────────────────────────────────────────┐
│                         AI SECOND BRAIN                             │
│                                                                     │
│  ┌──────────┐    ┌────────────┐    ┌──────────┐    ┌────────────┐  │
│  │  EDITOR  │───▶│ INGESTION  │───▶│ STORAGE  │───▶│    API     │  │
│  │  LAYER   │    │  PIPELINE  │    │  LAYER   │    │   LAYER    │  │
│  └──────────┘    └────────────┘    └──────────┘    └────────────┘  │
│       │                                                  │          │
│  Markdown Vault                                     ┌────▼───────┐  │
│  (Obsidian/Logseq)                                  │  AI LAYER  │  │
│                                                     │  (Ollama)  │  │
│                                                     └────────────┘  │
│                                                          │          │
│                                                     ┌────▼───────┐  │
│                                                     │ INTERFACE  │  │
│                                                     │   LAYER    │  │
│                                                     └────────────┘  │
└─────────────────────────────────────────────────────────────────────┘
```

---

## Core Components

### 1. Editor Layer
- **Vault directory**: `./vault/` — plain Markdown files, fully compatible with Obsidian and Logseq
- **Format**: CommonMark + YAML frontmatter + `[[WikiLinks]]`
- **Source of truth**: All knowledge lives here; the database is a derived index
- **Sync**: File-system watching via `watchdog` triggers the ingestion pipeline

### 2. Storage Layer
- **PostgreSQL 16** with **pgvector** extension
- Stores: document metadata, text chunks, embeddings (1536-dim or 768-dim), extracted entities, wikilink relations
- Vector index: IVFFlat or HNSW for ANN search

### 3. Processing Layer (Ingestion Pipeline)
- File watcher monitors `./vault/**/*.md`
- Parser: frontmatter extraction (YAML), Markdown-to-text, WikiLink graph extraction
- Chunker: 500–800 token sliding window with 10% overlap
- Embeddings: Ollama (`nomic-embed-text`) or `sentence-transformers` (offline fallback)
- Idempotent: SHA-256 content hashing prevents redundant re-indexing

### 4. API Layer
- **FastAPI** service exposing REST endpoints
- Retrieval: hybrid search (vector similarity + full-text BM25-style)
- Reranking: optional cross-encoder via `sentence-transformers`
- Async throughout; connection pooling with `asyncpg`

### 5. AI Layer
- **Ollama** sidecar providing local LLM inference (Mistral, Llama 3, Phi-3, etc.)
- Embedding model: `nomic-embed-text` (768-dim)
- Chat/generation model: configurable (default: `mistral`)
- Agents use LangChain/LlamaIndex or direct Ollama API calls

### 6. Agent Layer
- Long-running Python workers
- Agents: Ingestion, Knowledge Linking, Tagging, Summarization, Maintenance
- Message queue: Redis-backed job queue (ARQ) or simple PostgreSQL-backed queue
- Scheduled via cron-style configuration

### 7. Interface Layer
- **Next.js** (React) web application
- Pages: Search, Chat, Document Viewer, Graph View (knowledge graph), Tag Browser
- API client calls the FastAPI backend
- Served as a Docker container (Node.js)

---

## Data Flow

### Ingestion Flow
```
Markdown File (vault/)
       │
       ▼
   File Watcher (watchdog)
       │
       ▼
   Parse & Validate
   ├── Extract YAML frontmatter (title, tags, date, aliases)
   ├── Extract WikiLinks [[target]]
   └── Convert Markdown → plain text
       │
       ▼
   Content Hash (SHA-256)
   └── Skip if unchanged (idempotent)
       │
       ▼
   Chunker (500-800 tokens, 10% overlap)
       │
       ▼
   Embedding Generation (Ollama nomic-embed-text)
       │
       ▼
   Store in PostgreSQL
   ├── documents table (metadata + full text)
   ├── chunks table (chunk text + embedding vector)
   ├── entities table (extracted NER if enabled)
   └── relations table (WikiLink graph edges)
```

### Retrieval (RAG) Flow
```
User Query
    │
    ▼
Query Embedding (Ollama)
    │
    ▼
Hybrid Search
├── Vector similarity (pgvector cosine distance)
└── Full-text search (PostgreSQL tsvector)
    │
    ▼
Reranker (optional cross-encoder)
    │
    ▼
Context Assembly (top-k chunks + metadata)
    │
    ▼
LLM Generation (Ollama)
    │
    ▼
Response + Citations
```

---

## Database Schema

### Tables

#### `documents`
```sql
CREATE TABLE documents (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    path        TEXT NOT NULL UNIQUE,        -- relative path in vault
    title       TEXT,
    content     TEXT NOT NULL,               -- full markdown source
    content_hash TEXT NOT NULL,              -- SHA-256 for change detection
    frontmatter JSONB DEFAULT '{}',          -- parsed YAML frontmatter
    tags        TEXT[] DEFAULT '{}',
    aliases     TEXT[] DEFAULT '{}',
    word_count  INTEGER,
    created_at  TIMESTAMPTZ DEFAULT now(),
    updated_at  TIMESTAMPTZ DEFAULT now(),
    indexed_at  TIMESTAMPTZ,
    fts_vector  TSVECTOR                     -- full-text search index
);
CREATE INDEX idx_documents_path ON documents(path);
CREATE INDEX idx_documents_tags ON documents USING GIN(tags);
CREATE INDEX idx_documents_fts ON documents USING GIN(fts_vector);
```

#### `chunks`
```sql
CREATE TABLE chunks (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    chunk_index INTEGER NOT NULL,
    content     TEXT NOT NULL,
    token_count INTEGER,
    embedding   VECTOR(768),                 -- nomic-embed-text dimension
    metadata    JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_embedding ON chunks USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);
```

#### `entities`
```sql
CREATE TABLE entities (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    name        TEXT NOT NULL,
    entity_type TEXT NOT NULL,               -- PERSON, ORG, CONCEPT, etc.
    context     TEXT,
    created_at  TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_entities_document_id ON entities(document_id);
CREATE INDEX idx_entities_name ON entities(name);
CREATE INDEX idx_entities_type ON entities(entity_type);
```

#### `relations`
```sql
CREATE TABLE relations (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_doc_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    target_path   TEXT NOT NULL,             -- may not exist yet (forward links)
    target_doc_id UUID REFERENCES documents(id) ON DELETE SET NULL,
    relation_type TEXT DEFAULT 'wikilink',   -- wikilink, tag, explicit
    context       TEXT,                      -- surrounding text
    created_at    TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_relations_source ON relations(source_doc_id);
CREATE INDEX idx_relations_target ON relations(target_doc_id);
CREATE INDEX idx_relations_target_path ON relations(target_path);
```

#### `agent_jobs`
```sql
CREATE TABLE agent_jobs (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_type  TEXT NOT NULL,               -- ingestion, linking, tagging, etc.
    status      TEXT DEFAULT 'pending',      -- pending, running, done, failed
    payload     JSONB DEFAULT '{}',
    result      JSONB,
    error       TEXT,
    created_at  TIMESTAMPTZ DEFAULT now(),
    started_at  TIMESTAMPTZ,
    completed_at TIMESTAMPTZ,
    retry_count INTEGER DEFAULT 0
);
CREATE INDEX idx_agent_jobs_status ON agent_jobs(status);
CREATE INDEX idx_agent_jobs_type ON agent_jobs(agent_type);
```

#### `agent_logs`
```sql
CREATE TABLE agent_logs (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    job_id      UUID REFERENCES agent_jobs(id) ON DELETE SET NULL,
    agent_type  TEXT NOT NULL,
    level       TEXT DEFAULT 'info',
    message     TEXT NOT NULL,
    metadata    JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_agent_logs_job_id ON agent_logs(job_id);
CREATE INDEX idx_agent_logs_created ON agent_logs(created_at DESC);
```

---

## API Design

### Base URL: `http://localhost:8000/api/v1`

| Method | Endpoint              | Description                              |
|--------|-----------------------|------------------------------------------|
| POST   | `/search`             | Hybrid vector + full-text search         |
| POST   | `/chat`               | RAG chat with streaming response         |
| GET    | `/document/{id}`      | Get document by ID                       |
| GET    | `/document/path`      | Get document by vault path               |
| POST   | `/index`              | Manually trigger index of a file         |
| POST   | `/reindex`            | Full vault reindex                       |
| GET    | `/related/{id}`       | Get related documents by embedding sim   |
| GET    | `/tags`               | List all tags with counts                |
| GET    | `/graph`              | WikiLink graph (nodes + edges)           |
| GET    | `/health`             | Health check                             |
| GET    | `/stats`              | System statistics                        |

### Request/Response Shapes

#### POST `/search`
```json
// Request
{
  "query": "machine learning concepts",
  "limit": 10,
  "threshold": 0.7,
  "tags": ["ml", "ai"],
  "hybrid": true
}

// Response
{
  "results": [
    {
      "document_id": "uuid",
      "chunk_id": "uuid",
      "title": "Introduction to ML",
      "path": "notes/ml-intro.md",
      "content": "chunk text...",
      "score": 0.92,
      "tags": ["ml", "ai"],
      "highlight": "...matched text..."
    }
  ],
  "total": 42,
  "query_time_ms": 23
}
```

#### POST `/chat`
```json
// Request (SSE stream)
{
  "message": "What do I know about transformers?",
  "conversation_id": "optional-uuid",
  "context_limit": 5
}

// Response (Server-Sent Events)
data: {"token": "Transformers", "type": "token"}
data: {"token": " are", "type": "token"}
data: {"sources": [...], "type": "sources"}
data: {"type": "done"}
```

---

## Agent Architecture

All agents inherit from a common `BaseAgent` class:

```
BaseAgent
├── IngestionAgent      — watches vault, triggers indexing
├── LinkingAgent        — discovers and creates knowledge links
├── TaggingAgent        — auto-tags documents using LLM
├── SummarizationAgent  — generates/updates document summaries
└── MaintenanceAgent    — detects orphans, broken links, stale content
```

### Agent Lifecycle
1. Agent starts, reads config from environment
2. Polls `agent_jobs` table (or subscribes to PostgreSQL NOTIFY)
3. Claims job atomically (`UPDATE ... WHERE status='pending' RETURNING *`)
4. Executes job with retry logic (exponential backoff, max 3 retries)
5. Writes result / error back to `agent_jobs`
6. Logs to `agent_logs`

### Scheduling
- **IngestionAgent**: event-driven (file watcher) + fallback poll every 30s
- **LinkingAgent**: runs after every ingestion batch
- **TaggingAgent**: runs on new/modified documents without tags
- **SummarizationAgent**: runs on documents >1000 words without summary
- **MaintenanceAgent**: scheduled daily at midnight

---

## Ingestion Pipeline

```
services/ingestion-worker/
├── watcher.py          — watchdog file system monitor
├── parser.py           — frontmatter + markdown + wikilink parser
├── chunker.py          — token-aware sliding window chunker
├── embedder.py         — Ollama / sentence-transformers embeddings
├── indexer.py          — PostgreSQL upsert logic
└── pipeline.py         — orchestrates the full ingestion flow
```

### Chunking Strategy
- **Method**: Sliding window, 500–800 tokens, 10% overlap
- **Splitter**: Prefer semantic boundaries (paragraphs, headings) over hard token cuts
- **Metadata preserved**: document_id, chunk_index, source heading path

### Embedding Strategy
- **Primary**: Ollama `nomic-embed-text` (768-dim, fully offline)
- **Fallback**: `sentence-transformers/all-MiniLM-L6-v2` (384-dim, local model)
- **Batching**: 32 chunks per embedding request for efficiency

---

## Infrastructure

### Docker Services

| Service            | Image                        | Port  | Description                      |
|--------------------|------------------------------|-------|----------------------------------|
| `postgres`         | pgvector/pgvector:pg16        | 5432  | PostgreSQL + pgvector            |
| `ollama`           | ollama/ollama:latest          | 11434 | Local LLM inference              |
| `rag-api`          | local/rag-api                 | 8000  | FastAPI retrieval service        |
| `ingestion-worker` | local/ingestion-worker        | —     | Vault watcher + indexer          |
| `agents`           | local/agents                  | —     | Background AI agents             |
| `web-ui`           | local/web-ui                  | 3000  | Next.js frontend                 |
| `redis`            | redis:7-alpine                | 6379  | Job queue + caching              |

### Volume Mounts
- `./vault:/vault:rw` — shared across all services needing vault access
- `postgres_data:/var/lib/postgresql/data` — persistent database
- `ollama_data:/root/.ollama` — pulled LLM models

### Network
- Internal Docker network `second-brain-net`
- External ports: `3000` (UI), `8000` (API), `11434` (Ollama)

---

## Design Principles

1. **Vault is source of truth** — database is always a derived index, fully rebuildable
2. **Offline-first** — zero external API calls required; all AI runs locally via Ollama
3. **Idempotent ingestion** — SHA-256 hashing ensures files are not re-indexed unless changed
4. **No vendor lock-in** — all components are open source and self-hosted
5. **Modular** — each service can be replaced independently (swap Ollama for another runtime)
6. **Graceful degradation** — system works without agents running; agents enhance, not gate
7. **Markdown compatibility** — vault works as a standalone Obsidian/Logseq vault at all times