You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

175 lines
4.5 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# AI Agents Guide
The Second Brain system includes five autonomous AI agents that run as background workers, continuously improving the knowledge base.
---
## Architecture
All agents inherit from `BaseAgent` and share:
- **Atomic job claiming** from `agent_jobs` table (no double-processing)
- **Exponential backoff retry** (max 3 retries, 2/4/8s delays)
- **Structured logging** to `agent_logs` table
- **Configurable poll intervals** via environment variables
---
## Agents
### 1. Ingestion Agent (`ingestion`)
**Purpose:** Indexes new and modified Markdown files from the vault.
**Triggers:**
- Queued job via the API (`POST /api/v1/index`)
- Full vault reindex job (`POST /api/v1/index/reindex`)
- File watcher events (from ingestion-worker)
**What it does:**
1. Reads the target file(s) from the vault
2. Parses frontmatter, extracts WikiLinks and tags
3. Chunks content into 500800 token segments
4. Generates embeddings via Ollama
5. Upserts document, chunks, and relations in PostgreSQL
**Idempotency:** SHA-256 content hashing ensures unchanged files are skipped.
---
### 2. Knowledge Linking Agent (`linking`)
**Purpose:** Discovers semantic connections between documents and creates `ai-inferred` relation edges.
**Triggers:** Runs periodically (default: every 60s).
**What it does:**
1. Finds documents without AI-inferred links
2. For each: computes average chunk embedding
3. Finds top-5 semantically similar documents (cosine similarity > 0.75)
4. Inserts `ai-inferred` relations
**Use case:** Surfaces non-obvious connections — e.g., a note about "attention mechanisms" linked to a note about "reading strategies" if the embeddings are similar.
---
### 3. Tagging Agent (`tagging`)
**Purpose:** Automatically suggests and applies tags to untagged documents using the LLM.
**Triggers:** Runs periodically (default: every 120s).
**What it does:**
1. Finds documents with no tags
2. Sends title + content excerpt to Ollama with a tagging prompt
3. Parses the LLM JSON response (37 suggested tags)
4. Writes tags back to the `documents` table
**Prompt template:** Instructs the LLM to produce lowercase, hyphen-separated tags.
**Note:** Tags written to the database only — to persist back to the Markdown file, run the optional vault sync script.
---
### 4. Summarization Agent (`summarization`)
**Purpose:** Generates concise summaries for long documents that lack one.
**Triggers:** Runs periodically (default: every 300s).
**Criteria:**
- Document word count > 500
- `frontmatter.summary` is missing or empty
**What it does:**
1. Sends title + content (up to 4000 chars) to Ollama
2. Receives a 24 sentence summary
3. Stores the summary in `documents.frontmatter.summary`
The summary becomes available via the API and is displayed in the document viewer.
---
### 5. Maintenance Agent (`maintenance`)
**Purpose:** Health checks and housekeeping for the knowledge graph.
**Triggers:** Runs daily (default: every 3600s).
**What it does:**
1. Counts broken WikiLinks (links with no matching document)
2. Finds orphaned documents (no incoming or outgoing links)
3. Counts stale documents (not re-indexed in 7+ days)
4. Counts chunks with missing embeddings
5. Resolves previously broken WikiLinks that now have matching documents
**Output:** A structured report written to `agent_jobs.result` and logged to `agent_logs`.
---
## Monitoring Agents
### Check agent job queue
```sql
SELECT agent_type, status, COUNT(*)
FROM agent_jobs
GROUP BY agent_type, status
ORDER BY agent_type;
```
### View recent agent logs
```sql
SELECT agent_type, level, message, created_at
FROM agent_logs
ORDER BY created_at DESC
LIMIT 50;
```
### View last maintenance report
```sql
SELECT result
FROM agent_jobs
WHERE agent_type = 'maintenance' AND status = 'done'
ORDER BY completed_at DESC
LIMIT 1;
```
---
## Disabling Agents
Set poll intervals to very large values in `.env` to effectively disable specific agents:
```env
LINKING_POLL=999999
TAGGING_POLL=999999
```
---
## Adding a Custom Agent
1. Create `services/agents/my-agent/agent.py`:
```python
from base_agent import BaseAgent
class MyAgent(BaseAgent):
agent_type = 'my-agent'
async def process(self, job_id: str, payload: dict) -> dict:
# Your logic here
return {'done': True}
```
2. Register in `services/agents/main.py`:
```python
from my_agent.agent import MyAgent
asyncio.create_task(MyAgent(pool, settings).run_forever(60))
```
3. Enqueue jobs via the `agent_jobs` table or via the base class `enqueue()` method.

Powered by TurnKey Linux.