Architecture Overview
OWL is designed as a local AI daemon with persistent state, using a client-server architecture.
High-Level Architecture
┌─────────────────────────────────────────────────────────────────┐
│ OWL Daemon (owld) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ LLM │ │ Memory │ │ Knowledge │ │
│ │ (Ollama) │ │ (SQLite) │ │ (ChromaDB) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ └─────────────────┼──────────────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ Server │ │
│ │ (Handler) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────────────┐ ┌──────┴──────┐ ┌──────────────────────┐ │
│ │ Soul │ │ Tools │ │ Projects │ │
│ │ (YAML) │ │ (Registry) │ │ (Detector) │ │
│ └──────────────┘ └─────────────┘ └──────────────────────┘ │
│ │
└────────────────────────────┬────────────────────────────────────┘
│
Unix Socket
(~/.owl/owl.sock)
│
┌────────────────────────────┴────────────────────────────────────┐
│ OWL CLI (owl) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ REPL │ │ Display │ │ Client │ │
│ │ (Input) │ │ (Rich) │ │ (Socket) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Core Principles
1. Local-First
Everything runs on your machine:
- Ollama for LLM inference
- SQLite for persistent storage
- ChromaDB for vector search
- No external API calls
2. Persistent State
State survives restarts:
- Conversations stored in SQLite
- Memory file is human-readable
- Soul evolves over time
- Knowledge base persists
3. Project Awareness
OWL understands project context:
- Auto-detects project type
- Scopes history by project
- Prevents context contamination
4. Streaming Communication
Real-time responses:
- Token-by-token streaming
- Live tool execution feedback
- Interruptible operations
Component Responsibilities
| Component | Responsibility |
|---|---|
| Server | Handle requests, orchestrate components |
| LLM Client | Communicate with Ollama |
| Memory Store | Persist conversations, learnings |
| Knowledge Store | RAG vector database |
| Soul | Personality, evolution |
| Tool Registry | Manage tools, profiles |
| Project Detector | Identify project types |
| CLI Client | User interface |
Request Flow
Chat Request
1. User types message in CLI
2. CLI sends STREAM request via socket
3. Server receives request
4. Server builds context:
- Load soul (personality)
- Get conversation history
- Search knowledge base
- Detect project context
5. Server calls LLM with tools
6. If LLM returns tool calls:
- Execute each tool
- Send tool results back to LLM
- Repeat until no more tools
7. Stream response tokens to CLI
8. CLI displays live output
9. Server stores conversation
10. Server triggers async reflection
Tool Execution Flow
1. LLM requests tool call
2. Server checks tool availability (profile)
3. If available:
a. Send tool_call chunk to CLI
b. Execute tool
c. Send tool_result chunk to CLI
d. Add result to LLM context
4. If not available:
a. Return error to LLM
Data Flow
Context Assembly
System Prompt = [
Soul (character, values)
+ Project Context (type, frameworks)
+ Memory File (preferences, notes)
+ Knowledge Chunks (RAG results)
+ Tool Definitions (available tools)
+ Session Summary (compressed history)
]
Messages = [
System Prompt
+ Recent Conversation History
+ User Message
]
Memory Flow
User Message
↓
Conversation stored in SQLite
↓
Every 3 exchanges → Reflection
↓
Learnings extracted
↓
Every 10 learnings → Evolution eligible
↓
Every 20 messages → Summarization
File Structure
~/.owl/
├── config.yaml # Configuration
├── soul.yaml # Character/personality
├── memory.md # Human-readable memory
├── owl.sock # Unix socket (runtime)
├── memory/
│ └── owl.db # SQLite database
└── knowledge/
└── chroma/ # Vector database
Module Dependencies
owl/
├── config/ # Configuration loading
│ └── loader.py # (no dependencies)
│
├── daemon/ # Server components
│ ├── server.py # → all modules
│ ├── protocol.py # (no dependencies)
│ └── project.py # → config
│
├── cli/ # Client components
│ ├── main.py # → client, config
│ └── client.py # → protocol
│
├── llm/ # LLM integration
│ ├── client.py # → config
│ └── context.py # → soul, memory, knowledge
│
├── memory/ # Storage
│ ├── store.py # → config
│ ├── memory_file.py # → config
│ └── summarizer.py # → llm
│
├── knowledge/ # RAG
│ └── store.py # → config, llm
│
├── soul/ # Personality
│ ├── loader.py # → config
│ ├── evolver.py # → llm, memory
│ └── reflector.py # → llm, memory
│
└── tools/ # Tool system
├── registry.py # (no dependencies)
└── file_ops.py # (no dependencies)
Design Decisions
Why Unix Sockets?
- Fast local IPC
- No network overhead
- File-based permissions
- Works offline
Why SQLite?
- No separate database process
- File-based (portable)
- ACID transactions
- Good enough for single-user
Why ChromaDB?
- Local vector database
- Persistence support
- Ollama embedding integration
- Simple API
Why Separate Daemon/CLI?
- Daemon maintains state
- CLI can restart without losing context
- Multiple CLI instances possible
- Background processing (reflection, evolution)