You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

421 lines
16 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# AI Second Brain — System Architecture
> Version: 1.0.0
> Date: 2026-03-05
> Status: Design Document
---
## Table of Contents
1. [Overview](#overview)
2. [Core Components](#core-components)
3. [Data Flow](#data-flow)
4. [Database Schema](#database-schema)
5. [API Design](#api-design)
6. [Agent Architecture](#agent-architecture)
7. [Ingestion Pipeline](#ingestion-pipeline)
8. [Infrastructure](#infrastructure)
9. [Design Principles](#design-principles)
---
## Overview
The AI Second Brain is a fully self-hosted, offline-capable knowledge management system that treats a Markdown vault (Obsidian/Logseq compatible) as the single source of truth. All AI capabilities—embeddings, retrieval, generation, and autonomous agents—run locally.
```
┌─────────────────────────────────────────────────────────────────────┐
│ AI SECOND BRAIN │
│ │
│ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────────┐ │
│ │ EDITOR │───▶│ INGESTION │───▶│ STORAGE │───▶│ API │ │
│ │ LAYER │ │ PIPELINE │ │ LAYER │ │ LAYER │ │
│ └──────────┘ └────────────┘ └──────────┘ └────────────┘ │
│ │ │ │
│ Markdown Vault ┌────▼───────┐ │
│ (Obsidian/Logseq) │ AI LAYER │ │
│ │ (Ollama) │ │
│ └────────────┘ │
│ │ │
│ ┌────▼───────┐ │
│ │ INTERFACE │ │
│ │ LAYER │ │
│ └────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
---
## Core Components
### 1. Editor Layer
- **Vault directory**: `./vault/` — plain Markdown files, fully compatible with Obsidian and Logseq
- **Format**: CommonMark + YAML frontmatter + `[[WikiLinks]]`
- **Source of truth**: All knowledge lives here; the database is a derived index
- **Sync**: File-system watching via `watchdog` triggers the ingestion pipeline
### 2. Storage Layer
- **PostgreSQL 16** with **pgvector** extension
- Stores: document metadata, text chunks, embeddings (1536-dim or 768-dim), extracted entities, wikilink relations
- Vector index: IVFFlat or HNSW for ANN search
### 3. Processing Layer (Ingestion Pipeline)
- File watcher monitors `./vault/**/*.md`
- Parser: frontmatter extraction (YAML), Markdown-to-text, WikiLink graph extraction
- Chunker: 500800 token sliding window with 10% overlap
- Embeddings: Ollama (`nomic-embed-text`) or `sentence-transformers` (offline fallback)
- Idempotent: SHA-256 content hashing prevents redundant re-indexing
### 4. API Layer
- **FastAPI** service exposing REST endpoints
- Retrieval: hybrid search (vector similarity + full-text BM25-style)
- Reranking: optional cross-encoder via `sentence-transformers`
- Async throughout; connection pooling with `asyncpg`
### 5. AI Layer
- **Ollama** sidecar providing local LLM inference (Mistral, Llama 3, Phi-3, etc.)
- Embedding model: `nomic-embed-text` (768-dim)
- Chat/generation model: configurable (default: `mistral`)
- Agents use LangChain/LlamaIndex or direct Ollama API calls
### 6. Agent Layer
- Long-running Python workers
- Agents: Ingestion, Knowledge Linking, Tagging, Summarization, Maintenance
- Message queue: Redis-backed job queue (ARQ) or simple PostgreSQL-backed queue
- Scheduled via cron-style configuration
### 7. Interface Layer
- **Next.js** (React) web application
- Pages: Search, Chat, Document Viewer, Graph View (knowledge graph), Tag Browser
- API client calls the FastAPI backend
- Served as a Docker container (Node.js)
---
## Data Flow
### Ingestion Flow
```
Markdown File (vault/)
File Watcher (watchdog)
Parse & Validate
├── Extract YAML frontmatter (title, tags, date, aliases)
├── Extract WikiLinks [[target]]
└── Convert Markdown → plain text
Content Hash (SHA-256)
└── Skip if unchanged (idempotent)
Chunker (500-800 tokens, 10% overlap)
Embedding Generation (Ollama nomic-embed-text)
Store in PostgreSQL
├── documents table (metadata + full text)
├── chunks table (chunk text + embedding vector)
├── entities table (extracted NER if enabled)
└── relations table (WikiLink graph edges)
```
### Retrieval (RAG) Flow
```
User Query
Query Embedding (Ollama)
Hybrid Search
├── Vector similarity (pgvector cosine distance)
└── Full-text search (PostgreSQL tsvector)
Reranker (optional cross-encoder)
Context Assembly (top-k chunks + metadata)
LLM Generation (Ollama)
Response + Citations
```
---
## Database Schema
### Tables
#### `documents`
```sql
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
path TEXT NOT NULL UNIQUE, -- relative path in vault
title TEXT,
content TEXT NOT NULL, -- full markdown source
content_hash TEXT NOT NULL, -- SHA-256 for change detection
frontmatter JSONB DEFAULT '{}', -- parsed YAML frontmatter
tags TEXT[] DEFAULT '{}',
aliases TEXT[] DEFAULT '{}',
word_count INTEGER,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now(),
indexed_at TIMESTAMPTZ,
fts_vector TSVECTOR -- full-text search index
);
CREATE INDEX idx_documents_path ON documents(path);
CREATE INDEX idx_documents_tags ON documents USING GIN(tags);
CREATE INDEX idx_documents_fts ON documents USING GIN(fts_vector);
```
#### `chunks`
```sql
CREATE TABLE chunks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
chunk_index INTEGER NOT NULL,
content TEXT NOT NULL,
token_count INTEGER,
embedding VECTOR(768), -- nomic-embed-text dimension
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_chunks_document_id ON chunks(document_id);
CREATE INDEX idx_chunks_embedding ON chunks USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
```
#### `entities`
```sql
CREATE TABLE entities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
document_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
name TEXT NOT NULL,
entity_type TEXT NOT NULL, -- PERSON, ORG, CONCEPT, etc.
context TEXT,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_entities_document_id ON entities(document_id);
CREATE INDEX idx_entities_name ON entities(name);
CREATE INDEX idx_entities_type ON entities(entity_type);
```
#### `relations`
```sql
CREATE TABLE relations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_doc_id UUID NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
target_path TEXT NOT NULL, -- may not exist yet (forward links)
target_doc_id UUID REFERENCES documents(id) ON DELETE SET NULL,
relation_type TEXT DEFAULT 'wikilink', -- wikilink, tag, explicit
context TEXT, -- surrounding text
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_relations_source ON relations(source_doc_id);
CREATE INDEX idx_relations_target ON relations(target_doc_id);
CREATE INDEX idx_relations_target_path ON relations(target_path);
```
#### `agent_jobs`
```sql
CREATE TABLE agent_jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_type TEXT NOT NULL, -- ingestion, linking, tagging, etc.
status TEXT DEFAULT 'pending', -- pending, running, done, failed
payload JSONB DEFAULT '{}',
result JSONB,
error TEXT,
created_at TIMESTAMPTZ DEFAULT now(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
retry_count INTEGER DEFAULT 0
);
CREATE INDEX idx_agent_jobs_status ON agent_jobs(status);
CREATE INDEX idx_agent_jobs_type ON agent_jobs(agent_type);
```
#### `agent_logs`
```sql
CREATE TABLE agent_logs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID REFERENCES agent_jobs(id) ON DELETE SET NULL,
agent_type TEXT NOT NULL,
level TEXT DEFAULT 'info',
message TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_agent_logs_job_id ON agent_logs(job_id);
CREATE INDEX idx_agent_logs_created ON agent_logs(created_at DESC);
```
---
## API Design
### Base URL: `http://localhost:8000/api/v1`
| Method | Endpoint | Description |
|--------|-----------------------|------------------------------------------|
| POST | `/search` | Hybrid vector + full-text search |
| POST | `/chat` | RAG chat with streaming response |
| GET | `/document/{id}` | Get document by ID |
| GET | `/document/path` | Get document by vault path |
| POST | `/index` | Manually trigger index of a file |
| POST | `/reindex` | Full vault reindex |
| GET | `/related/{id}` | Get related documents by embedding sim |
| GET | `/tags` | List all tags with counts |
| GET | `/graph` | WikiLink graph (nodes + edges) |
| GET | `/health` | Health check |
| GET | `/stats` | System statistics |
### Request/Response Shapes
#### POST `/search`
```json
// Request
{
"query": "machine learning concepts",
"limit": 10,
"threshold": 0.7,
"tags": ["ml", "ai"],
"hybrid": true
}
// Response
{
"results": [
{
"document_id": "uuid",
"chunk_id": "uuid",
"title": "Introduction to ML",
"path": "notes/ml-intro.md",
"content": "chunk text...",
"score": 0.92,
"tags": ["ml", "ai"],
"highlight": "...matched text..."
}
],
"total": 42,
"query_time_ms": 23
}
```
#### POST `/chat`
```json
// Request (SSE stream)
{
"message": "What do I know about transformers?",
"conversation_id": "optional-uuid",
"context_limit": 5
}
// Response (Server-Sent Events)
data: {"token": "Transformers", "type": "token"}
data: {"token": " are", "type": "token"}
data: {"sources": [...], "type": "sources"}
data: {"type": "done"}
```
---
## Agent Architecture
All agents inherit from a common `BaseAgent` class:
```
BaseAgent
├── IngestionAgent — watches vault, triggers indexing
├── LinkingAgent — discovers and creates knowledge links
├── TaggingAgent — auto-tags documents using LLM
├── SummarizationAgent — generates/updates document summaries
└── MaintenanceAgent — detects orphans, broken links, stale content
```
### Agent Lifecycle
1. Agent starts, reads config from environment
2. Polls `agent_jobs` table (or subscribes to PostgreSQL NOTIFY)
3. Claims job atomically (`UPDATE ... WHERE status='pending' RETURNING *`)
4. Executes job with retry logic (exponential backoff, max 3 retries)
5. Writes result / error back to `agent_jobs`
6. Logs to `agent_logs`
### Scheduling
- **IngestionAgent**: event-driven (file watcher) + fallback poll every 30s
- **LinkingAgent**: runs after every ingestion batch
- **TaggingAgent**: runs on new/modified documents without tags
- **SummarizationAgent**: runs on documents >1000 words without summary
- **MaintenanceAgent**: scheduled daily at midnight
---
## Ingestion Pipeline
```
services/ingestion-worker/
├── watcher.py — watchdog file system monitor
├── parser.py — frontmatter + markdown + wikilink parser
├── chunker.py — token-aware sliding window chunker
├── embedder.py — Ollama / sentence-transformers embeddings
├── indexer.py — PostgreSQL upsert logic
└── pipeline.py — orchestrates the full ingestion flow
```
### Chunking Strategy
- **Method**: Sliding window, 500800 tokens, 10% overlap
- **Splitter**: Prefer semantic boundaries (paragraphs, headings) over hard token cuts
- **Metadata preserved**: document_id, chunk_index, source heading path
### Embedding Strategy
- **Primary**: Ollama `nomic-embed-text` (768-dim, fully offline)
- **Fallback**: `sentence-transformers/all-MiniLM-L6-v2` (384-dim, local model)
- **Batching**: 32 chunks per embedding request for efficiency
---
## Infrastructure
### Docker Services
| Service | Image | Port | Description |
|--------------------|------------------------------|-------|----------------------------------|
| `postgres` | pgvector/pgvector:pg16 | 5432 | PostgreSQL + pgvector |
| `ollama` | ollama/ollama:latest | 11434 | Local LLM inference |
| `rag-api` | local/rag-api | 8000 | FastAPI retrieval service |
| `ingestion-worker` | local/ingestion-worker | — | Vault watcher + indexer |
| `agents` | local/agents | — | Background AI agents |
| `web-ui` | local/web-ui | 3000 | Next.js frontend |
| `redis` | redis:7-alpine | 6379 | Job queue + caching |
### Volume Mounts
- `./vault:/vault:rw` — shared across all services needing vault access
- `postgres_data:/var/lib/postgresql/data` — persistent database
- `ollama_data:/root/.ollama` — pulled LLM models
### Network
- Internal Docker network `second-brain-net`
- External ports: `3000` (UI), `8000` (API), `11434` (Ollama)
---
## Design Principles
1. **Vault is source of truth** — database is always a derived index, fully rebuildable
2. **Offline-first** — zero external API calls required; all AI runs locally via Ollama
3. **Idempotent ingestion** — SHA-256 hashing ensures files are not re-indexed unless changed
4. **No vendor lock-in** — all components are open source and self-hosted
5. **Modular** — each service can be replaced independently (swap Ollama for another runtime)
6. **Graceful degradation** — system works without agents running; agents enhance, not gate
7. **Markdown compatibility** — vault works as a standalone Obsidian/Logseq vault at all times

Powered by TurnKey Linux.