Skip to main content

Architecture Overview

OWL is designed as a local AI daemon with persistent state, using a client-server architecture.

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│ OWL Daemon (owld) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ LLM │ │ Memory │ │ Knowledge │ │
│ │ (Ollama) │ │ (SQLite) │ │ (ChromaDB) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │ │
│ └─────────────────┼──────────────────────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ Server │ │
│ │ (Handler) │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────────────┐ ┌──────┴──────┐ ┌──────────────────────┐ │
│ │ Soul │ │ Tools │ │ Projects │ │
│ │ (YAML) │ │ (Registry) │ │ (Detector) │ │
│ └──────────────┘ └─────────────┘ └──────────────────────┘ │
│ │
└────────────────────────────┬────────────────────────────────────┘

Unix Socket
(~/.owl/owl.sock)

┌────────────────────────────┴────────────────────────────────────┐
│ OWL CLI (owl) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ REPL │ │ Display │ │ Client │ │
│ │ (Input) │ │ (Rich) │ │ (Socket) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Core Principles

1. Local-First

Everything runs on your machine:

  • Ollama for LLM inference
  • SQLite for persistent storage
  • ChromaDB for vector search
  • No external API calls

2. Persistent State

State survives restarts:

  • Conversations stored in SQLite
  • Memory file is human-readable
  • Soul evolves over time
  • Knowledge base persists

3. Project Awareness

OWL understands project context:

  • Auto-detects project type
  • Scopes history by project
  • Prevents context contamination

4. Streaming Communication

Real-time responses:

  • Token-by-token streaming
  • Live tool execution feedback
  • Interruptible operations

Component Responsibilities

ComponentResponsibility
ServerHandle requests, orchestrate components
LLM ClientCommunicate with Ollama
Memory StorePersist conversations, learnings
Knowledge StoreRAG vector database
SoulPersonality, evolution
Tool RegistryManage tools, profiles
Project DetectorIdentify project types
CLI ClientUser interface

Request Flow

Chat Request

1. User types message in CLI
2. CLI sends STREAM request via socket
3. Server receives request
4. Server builds context:
- Load soul (personality)
- Get conversation history
- Search knowledge base
- Detect project context
5. Server calls LLM with tools
6. If LLM returns tool calls:
- Execute each tool
- Send tool results back to LLM
- Repeat until no more tools
7. Stream response tokens to CLI
8. CLI displays live output
9. Server stores conversation
10. Server triggers async reflection

Tool Execution Flow

1. LLM requests tool call
2. Server checks tool availability (profile)
3. If available:
a. Send tool_call chunk to CLI
b. Execute tool
c. Send tool_result chunk to CLI
d. Add result to LLM context
4. If not available:
a. Return error to LLM

Data Flow

Context Assembly

System Prompt = [
Soul (character, values)
+ Project Context (type, frameworks)
+ Memory File (preferences, notes)
+ Knowledge Chunks (RAG results)
+ Tool Definitions (available tools)
+ Session Summary (compressed history)
]

Messages = [
System Prompt
+ Recent Conversation History
+ User Message
]

Memory Flow

User Message

Conversation stored in SQLite

Every 3 exchanges → Reflection

Learnings extracted

Every 10 learnings → Evolution eligible

Every 20 messages → Summarization

File Structure

~/.owl/
├── config.yaml # Configuration
├── soul.yaml # Character/personality
├── memory.md # Human-readable memory
├── owl.sock # Unix socket (runtime)
├── memory/
│ └── owl.db # SQLite database
└── knowledge/
└── chroma/ # Vector database

Module Dependencies

owl/
├── config/ # Configuration loading
│ └── loader.py # (no dependencies)

├── daemon/ # Server components
│ ├── server.py # → all modules
│ ├── protocol.py # (no dependencies)
│ └── project.py # → config

├── cli/ # Client components
│ ├── main.py # → client, config
│ └── client.py # → protocol

├── llm/ # LLM integration
│ ├── client.py # → config
│ └── context.py # → soul, memory, knowledge

├── memory/ # Storage
│ ├── store.py # → config
│ ├── memory_file.py # → config
│ └── summarizer.py # → llm

├── knowledge/ # RAG
│ └── store.py # → config, llm

├── soul/ # Personality
│ ├── loader.py # → config
│ ├── evolver.py # → llm, memory
│ └── reflector.py # → llm, memory

└── tools/ # Tool system
├── registry.py # (no dependencies)
└── file_ops.py # (no dependencies)

Design Decisions

Why Unix Sockets?

  • Fast local IPC
  • No network overhead
  • File-based permissions
  • Works offline

Why SQLite?

  • No separate database process
  • File-based (portable)
  • ACID transactions
  • Good enough for single-user

Why ChromaDB?

  • Local vector database
  • Persistence support
  • Ollama embedding integration
  • Simple API

Why Separate Daemon/CLI?

  • Daemon maintains state
  • CLI can restart without losing context
  • Multiple CLI instances possible
  • Background processing (reflection, evolution)