You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
16 KiB
16 KiB
AI Second Brain — System Architecture
Version: 1.0.0
Date: 2026-03-05
Status: Design Document
Table of Contents
- Overview
- Core Components
- Data Flow
- Database Schema
- API Design
- Agent Architecture
- Ingestion Pipeline
- Infrastructure
- Design Principles
Overview
The AI Second Brain is a fully self-hosted, offline-capable knowledge management system that treats a Markdown vault (Obsidian/Logseq compatible) as the single source of truth. All AI capabilities—embeddings, retrieval, generation, and autonomous agents—run locally.
┌─────────────────────────────────────────────────────────────────────┐
│ AI SECOND BRAIN │
│ │
│ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────────┐ │
│ │ EDITOR │───▶│ INGESTION │───▶│ STORAGE │───▶│ API │ │
│ │ LAYER │ │ PIPELINE │ │ LAYER │ │ LAYER │ │
│ └──────────┘ └────────────┘ └──────────┘ └────────────┘ │
│ │ │ │
│ Markdown Vault ┌────▼───────┐ │
│ (Obsidian/Logseq) │ AI LAYER │ │
│ │ (Ollama) │ │
│ └────────────┘ │
│ │ │
│ ┌────▼───────┐ │
│ │ INTERFACE │ │
│ │ LAYER │ │
│ └────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Core Components
1. Editor Layer
- Vault directory:
./vault/— plain Markdown files, fully compatible with Obsidian and Logseq - Format: CommonMark + YAML frontmatter +
[[WikiLinks]] - Source of truth: All knowledge lives here; the database is a derived index
- Sync: File-system watching via
watchdogtriggers the ingestion pipeline
2. Storage Layer
- PostgreSQL 16 with pgvector extension
- Stores: document metadata, text chunks, embeddings (1536-dim or 768-dim), extracted entities, wikilink relations
- Vector index: IVFFlat or HNSW for ANN search
3. Processing Layer (Ingestion Pipeline)
- File watcher monitors
./vault/**/*.md - Parser: frontmatter extraction (YAML), Markdown-to-text, WikiLink graph extraction
- Chunker: 500–800 token sliding window with 10% overlap
- Embeddings: Ollama (
nomic-embed-text) orsentence-transformers(offline fallback) - Idempotent: SHA-256 content hashing prevents redundant re-indexing
4. API Layer
- FastAPI service exposing REST endpoints
- Retrieval: hybrid search (vector similarity + full-text BM25-style)
- Reranking: optional cross-encoder via
sentence-transformers - Async throughout; connection pooling with
asyncpg
5. AI Layer
- Ollama sidecar providing local LLM inference (Mistral, Llama 3, Phi-3, etc.)
- Embedding model:
nomic-embed-text(768-dim) - Chat/generation model: configurable (default:
mistral) - Agents use LangChain/LlamaIndex or direct Ollama API calls
6. Agent Layer
- Long-running Python workers
- Agents: Ingestion, Knowledge Linking, Tagging, Summarization, Maintenance
- Message queue: Redis-backed job queue (ARQ) or simple PostgreSQL-backed queue
- Scheduled via cron-style configuration
7. Interface Layer
- Next.js (React) web application
- Pages: Search, Chat, Document Viewer, Graph View (knowledge graph), Tag Browser
- API client calls the FastAPI backend
- Served as a Docker container (Node.js)
Data Flow
Ingestion Flow
Markdown File (vault/)
│
▼
File Watcher (watchdog)
│
▼
Parse & Validate
├── Extract YAML frontmatter (title, tags, date, aliases)
├── Extract WikiLinks [[target]]
└── Convert Markdown → plain text
│
▼
Content Hash (SHA-256)
└── Skip if unchanged (idempotent)
│
▼
Chunker (500-800 tokens, 10% overlap)
│
▼
Embedding Generation (Ollama nomic-embed-text)
│
▼
Store in PostgreSQL
├── documents table (metadata + full text)
├── chunks table (chunk text + embedding vector)
├── entities table (extracted NER if enabled)
└── relations table (WikiLink graph edges)
Retrieval (RAG) Flow
User Query
│
▼
Query Embedding (Ollama)
│
▼
Hybrid Search
├── Vector similarity (pgvector cosine distance)
└── Full-text search (PostgreSQL tsvector)
│
▼
Reranker (optional cross-encoder)
│
▼
Context Assembly (top-k chunks + metadata)
│
▼
LLM Generation (Ollama)
│
▼
Response + Citations
Database Schema
Tables
documents
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
path TEXT NOT NULL UNIQUE, -- relative path in vault
title TEXT,
content TEXT NOT NULL, -- full markdown source
content_hash TEXT NOT NULL, -- SHA-256 for change detection
frontmatter JSONB DEFAULT '{}', -- parsed YAML frontmatter
tags TEXT[] DEFAULT '{}',
aliases TEXT[] DEFAULT '{}',
word_count INTEGER,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now(),
indexed_at TIMESTAMPTZ,
fts_vector TSVECTOR -- full-text search index
);
CREATE INDEX idx_documents_path ON documents(path);
CREATE INDEX idx_documents_tags ON documents USING GIN(tags);
CREATE INDEX idx_documents_fts ON documents USING GIN(fts_vector);
chunks
CREATE TABLE chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
token_count INTEGER,
embedding VECTOR(768), -- nomic-embed-text dimension
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_embedding ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
entities
CREATE TABLE entities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
name TEXT NOT NULL,
entity_type TEXT NOT NULL, -- PERSON, ORG, CONCEPT, etc.
context TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_entities_document_id ON entities(document_id);
CREATE INDEX idx_entities_name ON entities(name);
CREATE INDEX idx_entities_type ON entities(entity_type);
relations
CREATE TABLE relations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_doc_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
target_path TEXT NOT NULL, -- may not exist yet (forward links)
target_doc_id UUID REFERENCES documents(id) ON DELETE SET NULL,
relation_type TEXT DEFAULT 'wikilink', -- wikilink, tag, explicit
context TEXT, -- surrounding text
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_relations_source ON relations(source_doc_id);
CREATE INDEX idx_relations_target ON relations(target_doc_id);
CREATE INDEX idx_relations_target_path ON relations(target_path);
agent_jobs
CREATE TABLE agent_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_type TEXT NOT NULL, -- ingestion, linking, tagging, etc.
status TEXT DEFAULT 'pending', -- pending, running, done, failed
payload JSONB DEFAULT '{}',
result JSONB,
error TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
retry_count INTEGER DEFAULT 0
);
CREATE INDEX idx_agent_jobs_status ON agent_jobs(status);
CREATE INDEX idx_agent_jobs_type ON agent_jobs(agent_type);
agent_logs
CREATE TABLE agent_logs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID REFERENCES agent_jobs(id) ON DELETE SET NULL,
agent_type TEXT NOT NULL,
level TEXT DEFAULT 'info',
message TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_agent_logs_job_id ON agent_logs(job_id);
CREATE INDEX idx_agent_logs_created ON agent_logs(created_at DESC);
API Design
Base URL: http://localhost:8000/api/v1
| Method | Endpoint | Description |
|---|---|---|
| POST | /search |
Hybrid vector + full-text search |
| POST | /chat |
RAG chat with streaming response |
| GET | /document/{id} |
Get document by ID |
| GET | /document/path |
Get document by vault path |
| POST | /index |
Manually trigger index of a file |
| POST | /reindex |
Full vault reindex |
| GET | /related/{id} |
Get related documents by embedding sim |
| GET | /tags |
List all tags with counts |
| GET | /graph |
WikiLink graph (nodes + edges) |
| GET | /health |
Health check |
| GET | /stats |
System statistics |
Request/Response Shapes
POST /search
// Request
{
"query": "machine learning concepts",
"limit": 10,
"threshold": 0.7,
"tags": ["ml", "ai"],
"hybrid": true
}
// Response
{
"results": [
{
"document_id": "uuid",
"chunk_id": "uuid",
"title": "Introduction to ML",
"path": "notes/ml-intro.md",
"content": "chunk text...",
"score": 0.92,
"tags": ["ml", "ai"],
"highlight": "...matched text..."
}
],
"total": 42,
"query_time_ms": 23
}
POST /chat
// Request (SSE stream)
{
"message": "What do I know about transformers?",
"conversation_id": "optional-uuid",
"context_limit": 5
}
// Response (Server-Sent Events)
data: {"token": "Transformers", "type": "token"}
data: {"token": " are", "type": "token"}
data: {"sources": [...], "type": "sources"}
data: {"type": "done"}
Agent Architecture
All agents inherit from a common BaseAgent class:
BaseAgent
├── IngestionAgent — watches vault, triggers indexing
├── LinkingAgent — discovers and creates knowledge links
├── TaggingAgent — auto-tags documents using LLM
├── SummarizationAgent — generates/updates document summaries
└── MaintenanceAgent — detects orphans, broken links, stale content
Agent Lifecycle
- Agent starts, reads config from environment
- Polls
agent_jobstable (or subscribes to PostgreSQL NOTIFY) - Claims job atomically (
UPDATE ... WHERE status='pending' RETURNING *) - Executes job with retry logic (exponential backoff, max 3 retries)
- Writes result / error back to
agent_jobs - Logs to
agent_logs
Scheduling
- IngestionAgent: event-driven (file watcher) + fallback poll every 30s
- LinkingAgent: runs after every ingestion batch
- TaggingAgent: runs on new/modified documents without tags
- SummarizationAgent: runs on documents >1000 words without summary
- MaintenanceAgent: scheduled daily at midnight
Ingestion Pipeline
services/ingestion-worker/
├── watcher.py — watchdog file system monitor
├── parser.py — frontmatter + markdown + wikilink parser
├── chunker.py — token-aware sliding window chunker
├── embedder.py — Ollama / sentence-transformers embeddings
├── indexer.py — PostgreSQL upsert logic
└── pipeline.py — orchestrates the full ingestion flow
Chunking Strategy
- Method: Sliding window, 500–800 tokens, 10% overlap
- Splitter: Prefer semantic boundaries (paragraphs, headings) over hard token cuts
- Metadata preserved: document_id, chunk_index, source heading path
Embedding Strategy
- Primary: Ollama
nomic-embed-text(768-dim, fully offline) - Fallback:
sentence-transformers/all-MiniLM-L6-v2(384-dim, local model) - Batching: 32 chunks per embedding request for efficiency
Infrastructure
Docker Services
| Service | Image | Port | Description |
|---|---|---|---|
postgres |
pgvector/pgvector:pg16 | 5432 | PostgreSQL + pgvector |
ollama |
ollama/ollama:latest | 11434 | Local LLM inference |
rag-api |
local/rag-api | 8000 | FastAPI retrieval service |
ingestion-worker |
local/ingestion-worker | — | Vault watcher + indexer |
agents |
local/agents | — | Background AI agents |
web-ui |
local/web-ui | 3000 | Next.js frontend |
redis |
redis:7-alpine | 6379 | Job queue + caching |
Volume Mounts
./vault:/vault:rw— shared across all services needing vault accesspostgres_data:/var/lib/postgresql/data— persistent databaseollama_data:/root/.ollama— pulled LLM models
Network
- Internal Docker network
second-brain-net - External ports:
3000(UI),8000(API),11434(Ollama)
Design Principles
- Vault is source of truth — database is always a derived index, fully rebuildable
- Offline-first — zero external API calls required; all AI runs locally via Ollama
- Idempotent ingestion — SHA-256 hashing ensures files are not re-indexed unless changed
- No vendor lock-in — all components are open source and self-hosted
- Modular — each service can be replaced independently (swap Ollama for another runtime)
- Graceful degradation — system works without agents running; agents enhance, not gate
- Markdown compatibility — vault works as a standalone Obsidian/Logseq vault at all times