16 KiB

Raw Blame History Unescape Escape

AI Second Brain — System Architecture

Version: 1.0.0
Date: 2026-03-05
Status: Design Document

Overview
Core Components
Data Flow
Database Schema
API Design
Agent Architecture
Ingestion Pipeline
Infrastructure
Design Principles

Overview

The AI Second Brain is a fully self-hosted, offline-capable knowledge management system that treats a Markdown vault (Obsidian/Logseq compatible) as the single source of truth. All AI capabilities—embeddings, retrieval, generation, and autonomous agents—run locally.

┌─────────────────────────────────────────────────────────────────────┐
│                         AI SECOND BRAIN                             │
│                                                                     │
│  ┌──────────┐    ┌────────────┐    ┌──────────┐    ┌────────────┐  │
│  │  EDITOR  │───▶│ INGESTION  │───▶│ STORAGE  │───▶│    API     │  │
│  │  LAYER   │    │  PIPELINE  │    │  LAYER   │    │   LAYER    │  │
│  └──────────┘    └────────────┘    └──────────┘    └────────────┘  │
│       │                                                  │          │
│  Markdown Vault                                     ┌────▼───────┐  │
│  (Obsidian/Logseq)                                  │  AI LAYER  │  │
│                                                     │  (Ollama)  │  │
│                                                     └────────────┘  │
│                                                          │          │
│                                                     ┌────▼───────┐  │
│                                                     │ INTERFACE  │  │
│                                                     │   LAYER    │  │
│                                                     └────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Core Components

1. Editor Layer

Vault directory: ./vault/ — plain Markdown files, fully compatible with Obsidian and Logseq
Format: CommonMark + YAML frontmatter + [[WikiLinks]]
Source of truth: All knowledge lives here; the database is a derived index
Sync: File-system watching via watchdog triggers the ingestion pipeline

2. Storage Layer

PostgreSQL 16 with pgvector extension
Stores: document metadata, text chunks, embeddings (1536-dim or 768-dim), extracted entities, wikilink relations
Vector index: IVFFlat or HNSW for ANN search

3. Processing Layer (Ingestion Pipeline)

File watcher monitors ./vault/**/*.md
Parser: frontmatter extraction (YAML), Markdown-to-text, WikiLink graph extraction
Chunker: 500–800 token sliding window with 10% overlap
Embeddings: Ollama (nomic-embed-text) or sentence-transformers (offline fallback)
Idempotent: SHA-256 content hashing prevents redundant re-indexing

4. API Layer

FastAPI service exposing REST endpoints
Retrieval: hybrid search (vector similarity + full-text BM25-style)
Reranking: optional cross-encoder via sentence-transformers
Async throughout; connection pooling with asyncpg

5. AI Layer

Ollama sidecar providing local LLM inference (Mistral, Llama 3, Phi-3, etc.)
Embedding model: nomic-embed-text (768-dim)
Chat/generation model: configurable (default: mistral)
Agents use LangChain/LlamaIndex or direct Ollama API calls

6. Agent Layer

Long-running Python workers
Agents: Ingestion, Knowledge Linking, Tagging, Summarization, Maintenance
Message queue: Redis-backed job queue (ARQ) or simple PostgreSQL-backed queue
Scheduled via cron-style configuration

7. Interface Layer

Next.js (React) web application
Pages: Search, Chat, Document Viewer, Graph View (knowledge graph), Tag Browser
API client calls the FastAPI backend
Served as a Docker container (Node.js)

Data Flow

Ingestion Flow

Markdown File (vault/)
       │
       ▼
   File Watcher (watchdog)
       │
       ▼
   Parse & Validate
   ├── Extract YAML frontmatter (title, tags, date, aliases)
   ├── Extract WikiLinks [[target]]
   └── Convert Markdown → plain text
       │
       ▼
   Content Hash (SHA-256)
   └── Skip if unchanged (idempotent)
       │
       ▼
   Chunker (500-800 tokens, 10% overlap)
       │
       ▼
   Embedding Generation (Ollama nomic-embed-text)
       │
       ▼
   Store in PostgreSQL
   ├── documents table (metadata + full text)
   ├── chunks table (chunk text + embedding vector)
   ├── entities table (extracted NER if enabled)
   └── relations table (WikiLink graph edges)

Retrieval (RAG) Flow

User Query
    │
    ▼
Query Embedding (Ollama)
    │
    ▼
Hybrid Search
├── Vector similarity (pgvector cosine distance)
└── Full-text search (PostgreSQL tsvector)
    │
    ▼
Reranker (optional cross-encoder)
    │
    ▼
Context Assembly (top-k chunks + metadata)
    │
    ▼
LLM Generation (Ollama)
    │
    ▼
Response + Citations

Database Schema

Tables

`documents`

CREATE TABLE documents (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    path        TEXT NOT NULL UNIQUE,        -- relative path in vault
    title       TEXT,
    content     TEXT NOT NULL,               -- full markdown source
    content_hash TEXT NOT NULL,              -- SHA-256 for change detection
    frontmatter JSONB DEFAULT '{}',          -- parsed YAML frontmatter
    tags        TEXT[] DEFAULT '{}',
    aliases     TEXT[] DEFAULT '{}',
    word_count  INTEGER,
    created_at  TIMESTAMPTZ DEFAULT now(),
    updated_at  TIMESTAMPTZ DEFAULT now(),
    indexed_at  TIMESTAMPTZ,
    fts_vector  TSVECTOR                     -- full-text search index
);
CREATE INDEX idx_documents_path ON documents(path);
CREATE INDEX idx_documents_tags ON documents USING GIN(tags);
CREATE INDEX idx_documents_fts ON documents USING GIN(fts_vector);

`chunks`

CREATE TABLE chunks (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    chunk_index INTEGER NOT NULL,
    content     TEXT NOT NULL,
    token_count INTEGER,
    embedding   VECTOR(768),                 -- nomic-embed-text dimension
    metadata    JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_embedding ON chunks USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

`entities`

CREATE TABLE entities (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    name        TEXT NOT NULL,
    entity_type TEXT NOT NULL,               -- PERSON, ORG, CONCEPT, etc.
    context     TEXT,
    created_at  TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_entities_document_id ON entities(document_id);
CREATE INDEX idx_entities_name ON entities(name);
CREATE INDEX idx_entities_type ON entities(entity_type);

`relations`

CREATE TABLE relations (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_doc_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
    target_path   TEXT NOT NULL,             -- may not exist yet (forward links)
    target_doc_id UUID REFERENCES documents(id) ON DELETE SET NULL,
    relation_type TEXT DEFAULT 'wikilink',   -- wikilink, tag, explicit
    context       TEXT,                      -- surrounding text
    created_at    TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_relations_source ON relations(source_doc_id);
CREATE INDEX idx_relations_target ON relations(target_doc_id);
CREATE INDEX idx_relations_target_path ON relations(target_path);

`agent_jobs`

CREATE TABLE agent_jobs (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    agent_type  TEXT NOT NULL,               -- ingestion, linking, tagging, etc.
    status      TEXT DEFAULT 'pending',      -- pending, running, done, failed
    payload     JSONB DEFAULT '{}',
    result      JSONB,
    error       TEXT,
    created_at  TIMESTAMPTZ DEFAULT now(),
    started_at  TIMESTAMPTZ,
    completed_at TIMESTAMPTZ,
    retry_count INTEGER DEFAULT 0
);
CREATE INDEX idx_agent_jobs_status ON agent_jobs(status);
CREATE INDEX idx_agent_jobs_type ON agent_jobs(agent_type);

`agent_logs`

CREATE TABLE agent_logs (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    job_id      UUID REFERENCES agent_jobs(id) ON DELETE SET NULL,
    agent_type  TEXT NOT NULL,
    level       TEXT DEFAULT 'info',
    message     TEXT NOT NULL,
    metadata    JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_agent_logs_job_id ON agent_logs(job_id);
CREATE INDEX idx_agent_logs_created ON agent_logs(created_at DESC);

API Design

Base URL: `http://localhost:8000/api/v1`

Method	Endpoint	Description
POST	`/search`	Hybrid vector + full-text search
POST	`/chat`	RAG chat with streaming response
GET	`/document/{id}`	Get document by ID
GET	`/document/path`	Get document by vault path
POST	`/index`	Manually trigger index of a file
POST	`/reindex`	Full vault reindex
GET	`/related/{id}`	Get related documents by embedding sim
GET	`/tags`	List all tags with counts
GET	`/graph`	WikiLink graph (nodes + edges)
GET	`/health`	Health check
GET	`/stats`	System statistics

Request/Response Shapes

POST `/search`

// Request
{
  "query": "machine learning concepts",
  "limit": 10,
  "threshold": 0.7,
  "tags": ["ml", "ai"],
  "hybrid": true
}

// Response
{
  "results": [
    {
      "document_id": "uuid",
      "chunk_id": "uuid",
      "title": "Introduction to ML",
      "path": "notes/ml-intro.md",
      "content": "chunk text...",
      "score": 0.92,
      "tags": ["ml", "ai"],
      "highlight": "...matched text..."
    }
  ],
  "total": 42,
  "query_time_ms": 23
}

POST `/chat`

// Request (SSE stream)
{
  "message": "What do I know about transformers?",
  "conversation_id": "optional-uuid",
  "context_limit": 5
}

// Response (Server-Sent Events)
data: {"token": "Transformers", "type": "token"}
data: {"token": " are", "type": "token"}
data: {"sources": [...], "type": "sources"}
data: {"type": "done"}

Agent Architecture

All agents inherit from a common BaseAgent class:

BaseAgent
├── IngestionAgent      — watches vault, triggers indexing
├── LinkingAgent        — discovers and creates knowledge links
├── TaggingAgent        — auto-tags documents using LLM
├── SummarizationAgent  — generates/updates document summaries
└── MaintenanceAgent    — detects orphans, broken links, stale content

Agent Lifecycle

Agent starts, reads config from environment
Polls agent_jobs table (or subscribes to PostgreSQL NOTIFY)
Claims job atomically (UPDATE ... WHERE status='pending' RETURNING *)
Executes job with retry logic (exponential backoff, max 3 retries)
Writes result / error back to agent_jobs
Logs to agent_logs

Scheduling

IngestionAgent: event-driven (file watcher) + fallback poll every 30s
LinkingAgent: runs after every ingestion batch
TaggingAgent: runs on new/modified documents without tags
SummarizationAgent: runs on documents >1000 words without summary
MaintenanceAgent: scheduled daily at midnight

Ingestion Pipeline

services/ingestion-worker/
├── watcher.py          — watchdog file system monitor
├── parser.py           — frontmatter + markdown + wikilink parser
├── chunker.py          — token-aware sliding window chunker
├── embedder.py         — Ollama / sentence-transformers embeddings
├── indexer.py          — PostgreSQL upsert logic
└── pipeline.py         — orchestrates the full ingestion flow

Chunking Strategy

Method: Sliding window, 500–800 tokens, 10% overlap
Splitter: Prefer semantic boundaries (paragraphs, headings) over hard token cuts
Metadata preserved: document_id, chunk_index, source heading path

Embedding Strategy

Primary: Ollama nomic-embed-text (768-dim, fully offline)
Fallback: sentence-transformers/all-MiniLM-L6-v2 (384-dim, local model)
Batching: 32 chunks per embedding request for efficiency

Infrastructure

Docker Services

Service	Image	Port	Description
`postgres`	pgvector/pgvector:pg16	5432	PostgreSQL + pgvector
`ollama`	ollama/ollama:latest	11434	Local LLM inference
`rag-api`	local/rag-api	8000	FastAPI retrieval service
`ingestion-worker`	local/ingestion-worker	—	Vault watcher + indexer
`agents`	local/agents	—	Background AI agents
`web-ui`	local/web-ui	3000	Next.js frontend
`redis`	redis:7-alpine	6379	Job queue + caching

Volume Mounts

./vault:/vault:rw — shared across all services needing vault access
postgres_data:/var/lib/postgresql/data — persistent database
ollama_data:/root/.ollama — pulled LLM models

Network

Internal Docker network second-brain-net
External ports: 3000 (UI), 8000 (API), 11434 (Ollama)

Design Principles

Vault is source of truth — database is always a derived index, fully rebuildable
Offline-first — zero external API calls required; all AI runs locally via Ollama
Idempotent ingestion — SHA-256 hashing ensures files are not re-indexed unless changed
No vendor lock-in — all components are open source and self-hosted
Modular — each service can be replaced independently (swap Ollama for another runtime)
Graceful degradation — system works without agents running; agents enhance, not gate
Markdown compatibility — vault works as a standalone Obsidian/Logseq vault at all times

16 KiB Raw Blame History Unescape Escape