Skip to main content

Knowledge Base

OWL includes a RAG (Retrieval-Augmented Generation) knowledge base powered by ChromaDB for semantic search over your documents.

Overview

The knowledge base allows you to:

  • Add text documents (markdown, code, plain text)
  • Search semantically (not just keywords)
  • Get relevant context in conversations

Everything runs locally using Ollama embeddings.

Adding Documents

/learn command

/learn README.md
/learn docs/architecture.md
/learn ~/notes/project-spec.txt

Output:

Learning from /home/user/project/README.md...
Learned from README.md (12 chunks)

What Happens

  1. Document is read and parsed
  2. Content is split into ~500 token chunks
  3. Each chunk is embedded using nomic-embed-text
  4. Embeddings are stored in ChromaDB

Searching

/knowledge search authentication flow

Output:

Search Results (3)

README.md (0.85)
Authentication is handled by the auth middleware...

architecture.md (0.72)
The auth flow starts when a user submits credentials...

spec.txt (0.68)
Users must authenticate before accessing protected...

During conversations, OWL automatically searches the knowledge base when relevant:

you: How does authentication work in this project?

[OWL searches knowledge base]
[Finds relevant chunks]
[Includes them in context]

owl: Based on the project documentation, authentication works as follows...

Managing Knowledge

View Stats

/knowledge

Output:

Knowledge Base
Total chunks: 42
Sources: 3

Sources:
- README.md (12 chunks)
- architecture.md (20 chunks)
- spec.txt (10 chunks)

Remove Documents

/unlearn README.md

Output:

Removed: README.md

Supported Formats

FormatExtensionNotes
Markdown.mdFull support
Plain text.txtFull support
Code files.py, .js, .ts, etc.Treated as text
Config files.yaml, .json, .tomlTreated as text

Note: Binary formats like PDF are not currently supported. Convert to text first.

How It Works

Chunking

Documents are split into chunks by paragraph:

  • Chunk size: ~500 tokens
  • Splits on blank lines (paragraph boundaries)
  • Keeps related content together

Embedding

Chunks are embedded using Ollama's nomic-embed-text model:

  • 768-dimensional vectors
  • Semantic meaning preserved
  • Similar content = similar vectors

Storage

ChromaDB stores:

  • Chunk text
  • Embedding vector
  • Metadata (source file, project, timestamp)

Location: ~/.owl/knowledge/chroma/

Retrieval

When searching:

  1. Query is embedded
  2. ChromaDB finds nearest neighbors
  3. Top 3 chunks returned
  4. Included in LLM context

Project Scoping

Knowledge searches are scoped by project:

# In project A
/learn docs/api.md # Added to project A

# Switch to project B
/project ~/project-b
/knowledge search api # Won't find project A docs

Best Practices

What to Add

Good candidates:

  • Project documentation
  • Architecture decisions
  • API specifications
  • Team guidelines
  • Complex code explanations

What NOT to Add

Avoid:

  • Frequently changing files
  • Generated documentation
  • Entire codebases (use tools instead)
  • Sensitive information

Keeping Knowledge Fresh

When documents change:

# Re-learn to update
/learn docs/api.md # Replaces old version

Organization

Keep related documents together:

/learn docs/architecture.md
/learn docs/api.md
/learn docs/deployment.md

Troubleshooting

"No embedding model"

Ensure you have the embedding model:

ollama pull nomic-embed-text

Slow Learning

Large documents take time to embed. For very large files:

  • Split into smaller documents
  • Learn incrementally

No Results

If searches return nothing:

  • Check document was learned: /knowledge
  • Try different query terms
  • Ensure you're in the right project

Technical Details

ChromaDB Collection

Each project gets a collection:

  • Name: owl_knowledge
  • Distance: cosine similarity
  • Persistence: ~/.owl/knowledge/chroma/

Embedding Model

Default: nomic-embed-text

  • 768 dimensions
  • Good semantic understanding
  • Runs locally via Ollama

Query Parameters

  • Top K: 3 chunks returned
  • Similarity threshold: None (returns top K regardless)
  • Project filter: Applied when project is set