|
|
# AI Agents Guide
|
|
|
|
|
|
The Second Brain system includes five autonomous AI agents that run as background workers, continuously improving the knowledge base.
|
|
|
|
|
|
---
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
All agents inherit from `BaseAgent` and share:
|
|
|
- **Atomic job claiming** from `agent_jobs` table (no double-processing)
|
|
|
- **Exponential backoff retry** (max 3 retries, 2/4/8s delays)
|
|
|
- **Structured logging** to `agent_logs` table
|
|
|
- **Configurable poll intervals** via environment variables
|
|
|
|
|
|
---
|
|
|
|
|
|
## Agents
|
|
|
|
|
|
### 1. Ingestion Agent (`ingestion`)
|
|
|
|
|
|
**Purpose:** Indexes new and modified Markdown files from the vault.
|
|
|
|
|
|
**Triggers:**
|
|
|
- Queued job via the API (`POST /api/v1/index`)
|
|
|
- Full vault reindex job (`POST /api/v1/index/reindex`)
|
|
|
- File watcher events (from ingestion-worker)
|
|
|
|
|
|
**What it does:**
|
|
|
1. Reads the target file(s) from the vault
|
|
|
2. Parses frontmatter, extracts WikiLinks and tags
|
|
|
3. Chunks content into 500–800 token segments
|
|
|
4. Generates embeddings via Ollama
|
|
|
5. Upserts document, chunks, and relations in PostgreSQL
|
|
|
|
|
|
**Idempotency:** SHA-256 content hashing ensures unchanged files are skipped.
|
|
|
|
|
|
---
|
|
|
|
|
|
### 2. Knowledge Linking Agent (`linking`)
|
|
|
|
|
|
**Purpose:** Discovers semantic connections between documents and creates `ai-inferred` relation edges.
|
|
|
|
|
|
**Triggers:** Runs periodically (default: every 60s).
|
|
|
|
|
|
**What it does:**
|
|
|
1. Finds documents without AI-inferred links
|
|
|
2. For each: computes average chunk embedding
|
|
|
3. Finds top-5 semantically similar documents (cosine similarity > 0.75)
|
|
|
4. Inserts `ai-inferred` relations
|
|
|
|
|
|
**Use case:** Surfaces non-obvious connections — e.g., a note about "attention mechanisms" linked to a note about "reading strategies" if the embeddings are similar.
|
|
|
|
|
|
---
|
|
|
|
|
|
### 3. Tagging Agent (`tagging`)
|
|
|
|
|
|
**Purpose:** Automatically suggests and applies tags to untagged documents using the LLM.
|
|
|
|
|
|
**Triggers:** Runs periodically (default: every 120s).
|
|
|
|
|
|
**What it does:**
|
|
|
1. Finds documents with no tags
|
|
|
2. Sends title + content excerpt to Ollama with a tagging prompt
|
|
|
3. Parses the LLM JSON response (3–7 suggested tags)
|
|
|
4. Writes tags back to the `documents` table
|
|
|
|
|
|
**Prompt template:** Instructs the LLM to produce lowercase, hyphen-separated tags.
|
|
|
|
|
|
**Note:** Tags written to the database only — to persist back to the Markdown file, run the optional vault sync script.
|
|
|
|
|
|
---
|
|
|
|
|
|
### 4. Summarization Agent (`summarization`)
|
|
|
|
|
|
**Purpose:** Generates concise summaries for long documents that lack one.
|
|
|
|
|
|
**Triggers:** Runs periodically (default: every 300s).
|
|
|
|
|
|
**Criteria:**
|
|
|
- Document word count > 500
|
|
|
- `frontmatter.summary` is missing or empty
|
|
|
|
|
|
**What it does:**
|
|
|
1. Sends title + content (up to 4000 chars) to Ollama
|
|
|
2. Receives a 2–4 sentence summary
|
|
|
3. Stores the summary in `documents.frontmatter.summary`
|
|
|
|
|
|
The summary becomes available via the API and is displayed in the document viewer.
|
|
|
|
|
|
---
|
|
|
|
|
|
### 5. Maintenance Agent (`maintenance`)
|
|
|
|
|
|
**Purpose:** Health checks and housekeeping for the knowledge graph.
|
|
|
|
|
|
**Triggers:** Runs daily (default: every 3600s).
|
|
|
|
|
|
**What it does:**
|
|
|
1. Counts broken WikiLinks (links with no matching document)
|
|
|
2. Finds orphaned documents (no incoming or outgoing links)
|
|
|
3. Counts stale documents (not re-indexed in 7+ days)
|
|
|
4. Counts chunks with missing embeddings
|
|
|
5. Resolves previously broken WikiLinks that now have matching documents
|
|
|
|
|
|
**Output:** A structured report written to `agent_jobs.result` and logged to `agent_logs`.
|
|
|
|
|
|
---
|
|
|
|
|
|
## Monitoring Agents
|
|
|
|
|
|
### Check agent job queue
|
|
|
|
|
|
```sql
|
|
|
SELECT agent_type, status, COUNT(*)
|
|
|
FROM agent_jobs
|
|
|
GROUP BY agent_type, status
|
|
|
ORDER BY agent_type;
|
|
|
```
|
|
|
|
|
|
### View recent agent logs
|
|
|
|
|
|
```sql
|
|
|
SELECT agent_type, level, message, created_at
|
|
|
FROM agent_logs
|
|
|
ORDER BY created_at DESC
|
|
|
LIMIT 50;
|
|
|
```
|
|
|
|
|
|
### View last maintenance report
|
|
|
|
|
|
```sql
|
|
|
SELECT result
|
|
|
FROM agent_jobs
|
|
|
WHERE agent_type = 'maintenance' AND status = 'done'
|
|
|
ORDER BY completed_at DESC
|
|
|
LIMIT 1;
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## Disabling Agents
|
|
|
|
|
|
Set poll intervals to very large values in `.env` to effectively disable specific agents:
|
|
|
|
|
|
```env
|
|
|
LINKING_POLL=999999
|
|
|
TAGGING_POLL=999999
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## Adding a Custom Agent
|
|
|
|
|
|
1. Create `services/agents/my-agent/agent.py`:
|
|
|
|
|
|
```python
|
|
|
from base_agent import BaseAgent
|
|
|
|
|
|
class MyAgent(BaseAgent):
|
|
|
agent_type = 'my-agent'
|
|
|
|
|
|
async def process(self, job_id: str, payload: dict) -> dict:
|
|
|
# Your logic here
|
|
|
return {'done': True}
|
|
|
```
|
|
|
|
|
|
2. Register in `services/agents/main.py`:
|
|
|
|
|
|
```python
|
|
|
from my_agent.agent import MyAgent
|
|
|
asyncio.create_task(MyAgent(pool, settings).run_forever(60))
|
|
|
```
|
|
|
|
|
|
3. Enqueue jobs via the `agent_jobs` table or via the base class `enqueue()` method.
|