Architecture Overview

OWL is designed as a local AI daemon with persistent state, using a client-server architecture.

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                           OWL Daemon (owld)                      │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │     LLM      │  │    Memory    │  │      Knowledge       │  │
│  │   (Ollama)   │  │   (SQLite)   │  │     (ChromaDB)       │  │
│  └──────┬───────┘  └──────┬───────┘  └──────────┬───────────┘  │
│         │                 │                      │              │
│         └─────────────────┼──────────────────────┘              │
│                           │                                      │
│                    ┌──────┴──────┐                              │
│                    │   Server    │                              │
│                    │  (Handler)  │                              │
│                    └──────┬──────┘                              │
│                           │                                      │
│  ┌──────────────┐  ┌──────┴──────┐  ┌──────────────────────┐  │
│  │     Soul     │  │    Tools    │  │      Projects        │  │
│  │   (YAML)     │  │  (Registry) │  │     (Detector)       │  │
│  └──────────────┘  └─────────────┘  └──────────────────────┘  │
│                                                                  │
└────────────────────────────┬────────────────────────────────────┘
                             │
                      Unix Socket
                      (~/.owl/owl.sock)
                             │
┌────────────────────────────┴────────────────────────────────────┐
│                           OWL CLI (owl)                          │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │    REPL      │  │   Display    │  │      Client          │  │
│  │   (Input)    │  │   (Rich)     │  │     (Socket)         │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Core Principles

1. Local-First

Everything runs on your machine:

Ollama for LLM inference
SQLite for persistent storage
ChromaDB for vector search
No external API calls

2. Persistent State

State survives restarts:

Conversations stored in SQLite
Memory file is human-readable
Soul evolves over time
Knowledge base persists

3. Project Awareness

OWL understands project context:

Auto-detects project type
Scopes history by project
Prevents context contamination

4. Streaming Communication

Real-time responses:

Token-by-token streaming
Live tool execution feedback
Interruptible operations

Component Responsibilities

Component	Responsibility
Server	Handle requests, orchestrate components
LLM Client	Communicate with Ollama
Memory Store	Persist conversations, learnings
Knowledge Store	RAG vector database
Soul	Personality, evolution
Tool Registry	Manage tools, profiles
Project Detector	Identify project types
CLI Client	User interface

Request Flow

Chat Request

1. User types message in CLI
2. CLI sends STREAM request via socket
3. Server receives request
4. Server builds context:
   - Load soul (personality)
   - Get conversation history
   - Search knowledge base
   - Detect project context
5. Server calls LLM with tools
6. If LLM returns tool calls:
   - Execute each tool
   - Send tool results back to LLM
   - Repeat until no more tools
7. Stream response tokens to CLI
8. CLI displays live output
9. Server stores conversation
10. Server triggers async reflection

Tool Execution Flow

1. LLM requests tool call
2. Server checks tool availability (profile)
3. If available:
   a. Send tool_call chunk to CLI
   b. Execute tool
   c. Send tool_result chunk to CLI
   d. Add result to LLM context
4. If not available:
   a. Return error to LLM

Data Flow

Context Assembly

System Prompt = [
  Soul (character, values)
  + Project Context (type, frameworks)
  + Memory File (preferences, notes)
  + Knowledge Chunks (RAG results)
  + Tool Definitions (available tools)
  + Session Summary (compressed history)
]

Messages = [
  System Prompt
  + Recent Conversation History
  + User Message
]

Memory Flow

User Message
    ↓
Conversation stored in SQLite
    ↓
Every 3 exchanges → Reflection
    ↓
Learnings extracted
    ↓
Every 10 learnings → Evolution eligible
    ↓
Every 20 messages → Summarization

File Structure

~/.owl/
├── config.yaml          # Configuration
├── soul.yaml            # Character/personality
├── memory.md            # Human-readable memory
├── owl.sock             # Unix socket (runtime)
├── memory/
│   └── owl.db           # SQLite database
└── knowledge/
    └── chroma/          # Vector database

Module Dependencies

owl/
├── config/              # Configuration loading
│   └── loader.py        # (no dependencies)
│
├── daemon/              # Server components
│   ├── server.py        # → all modules
│   ├── protocol.py      # (no dependencies)
│   └── project.py       # → config
│
├── cli/                 # Client components
│   ├── main.py          # → client, config
│   └── client.py        # → protocol
│
├── llm/                 # LLM integration
│   ├── client.py        # → config
│   └── context.py       # → soul, memory, knowledge
│
├── memory/              # Storage
│   ├── store.py         # → config
│   ├── memory_file.py   # → config
│   └── summarizer.py    # → llm
│
├── knowledge/           # RAG
│   └── store.py         # → config, llm
│
├── soul/                # Personality
│   ├── loader.py        # → config
│   ├── evolver.py       # → llm, memory
│   └── reflector.py     # → llm, memory
│
└── tools/               # Tool system
    ├── registry.py      # (no dependencies)
    └── file_ops.py      # (no dependencies)

Design Decisions

Why Unix Sockets?

Fast local IPC
No network overhead
File-based permissions
Works offline

Why SQLite?

No separate database process
File-based (portable)
ACID transactions
Good enough for single-user

Why ChromaDB?

Local vector database
Persistence support
Ollama embedding integration
Simple API

Why Separate Daemon/CLI?

Daemon maintains state
CLI can restart without losing context
Multiple CLI instances possible
Background processing (reflection, evolution)

High-Level Architecture​

Core Principles​

1. Local-First​

2. Persistent State​

3. Project Awareness​

4. Streaming Communication​

Component Responsibilities​

Request Flow​

Chat Request​

Tool Execution Flow​

Data Flow​

Context Assembly​

Memory Flow​

File Structure​

Module Dependencies​

Design Decisions​

Why Unix Sockets?​

Why SQLite?​

Why ChromaDB?​

Why Separate Daemon/CLI?​