Skip to main content

Overview

The RAG (Retrieval-Augmented Generation) knowledge store enables OrcBot to maintain a persistent, searchable knowledge base. Documents are chunked, embedded using vector models, and stored for semantic retrieval during agent reasoning.
RAG is ideal for large documents, reference materials, or datasets that exceed the LLM’s context window. The agent can semantically search the knowledge store and retrieve relevant chunks on-demand.

How It Works

Architecture

  • KnowledgeStore (src/memory/KnowledgeStore.ts) - Main interface
  • VectorMemory (src/memory/VectorMemory.ts) - Vector storage backend
  • Embeddings - OpenAI text-embedding-3-small or Google embeddings
  • Storage - File-based JSON with optional SQLite backend

Ingesting Documents

From Text

content
string
required
Text content to ingest
source
string
required
Document identifier (e.g., filename, URL)
collection
string
default:"default"
Logical grouping for documents
tags
string[]
Tags for filtering (e.g., [“technical”, “2024”])
metadata
object
Additional metadata (author, date, etc.)
Example:
{
  "skillName": "rag_ingest",
  "content": "OrcBot is an autonomous AI agent...",
  "source": "orcbot-overview.md",
  "collection": "documentation",
  "tags": ["agent", "autonomous"],
  "metadata": {
    "author": "OrcBot Team",
    "date": "2024-03-01"
  }
}

From File

{
  "skillName": "rag_ingest_file",
  "filePath": "/path/to/document.pdf",
  "collection": "research-papers",
  "tags": ["quantum", "2024"]
}
Supported Formats:
  • Plain text (.txt, .md)
  • PDF (.pdf) - Extracted via pdf-parse
  • HTML - Cleaned via Readability

From URL

{
  "skillName": "rag_ingest_url",
  "url": "https://example.com/article",
  "collection": "web-research",
  "tags": ["AI", "2024"]
}
URLs are fetched and cleaned with Readability before ingestion. JavaScript-rendered content requires browser_navigate first.

Searching the Knowledge Store

query
string
required
Natural language search query
limit
number
default:"5"
Number of results to return (1-20)
collection
string
Limit search to specific collection
tags
string[]
Filter by tags
threshold
number
default:"0.7"
Minimum similarity score (0-1)
Example:
{
  "skillName": "rag_search",
  "query": "How does the agent decision pipeline work?",
  "limit": 3,
  "collection": "documentation",
  "threshold": 0.75
}
Response:
{
  "results": [
    {
      "chunk": "The decision pipeline has 6 layers of guardrails...",
      "source": "decision-pipeline.md",
      "score": 0.89,
      "metadata": {
        "collection": "documentation",
        "tags": ["pipeline", "guardrails"]
      }
    },
    {
      "chunk": "DecisionPipeline.ts implements safety checks...",
      "source": "architecture.md",
      "score": 0.82,
      "metadata": {
        "collection": "documentation",
        "tags": ["architecture"]
      }
    }
  ],
  "count": 2
}

Managing Documents

List Collections

{
  "skillName": "rag_list",
  "type": "collections"
}
Response:
{
  "collections": [
    {"name": "documentation", "documents": 45},
    {"name": "research-papers", "documents": 12},
    {"name": "web-research", "documents": 28}
  ]
}

List Documents

{
  "skillName": "rag_list",
  "type": "documents",
  "collection": "documentation"
}

Delete Document

Deletion is permanent. Backup your knowledge store before bulk deletions.
{
  "skillName": "rag_delete",
  "source": "old-document.md",
  "collection": "documentation"
}

Delete Collection

{
  "skillName": "rag_delete_collection",
  "collection": "deprecated-docs"
}

Chunking Strategy

Default Settings

# orcbot.config.yaml
ragChunkSize: 1000          # Characters per chunk
ragChunkOverlap: 200        # Overlap between chunks
ragMaxChunksPerDoc: 100     # Limit per document

How Chunking Works

1

Split by Paragraphs

Documents are split at paragraph boundaries (double newlines).
2

Respect Chunk Size

Each chunk is approximately ragChunkSize characters.Chunks never exceed 1500 characters to stay within embedding limits.
3

Add Overlap

The last ragChunkOverlap characters of each chunk are prepended to the next chunk.This preserves context across chunk boundaries.
4

Generate Embeddings

Each chunk is embedded using text-embedding-3-small (1536 dimensions).Cost: ~$0.00002 per 1000 tokens.

Custom Chunking

{
  "skillName": "rag_ingest",
  "content": "Large document...",
  "source": "technical-spec.md",
  "chunkSize": 1500,
  "chunkOverlap": 300
}

Use Cases

Documentation Search

Ingest your entire documentation and let the agent search it on-demand.
orcbot push "Ingest all markdown files in ./docs into RAG"

Research Assistant

Store research papers and query them during tasks.
orcbot push "Search RAG for quantum computing papers from 2024"

Customer Support Knowledge Base

Ingest support articles, FAQs, and product manuals.The agent retrieves relevant answers during user interactions.

Code Reference

Store code examples, API references, or architecture docs.
orcbot push "Ingest OpenAPI spec into RAG for skill routing"

Configuration

Embeddings Provider

# orcbot.config.yaml
ragEmbeddingProvider: openai  # or 'google'
ragEmbeddingModel: text-embedding-3-small
OpenAI Models:
  • text-embedding-3-small (1536 dim, $0.02/1M tokens)
  • text-embedding-3-large (3072 dim, $0.13/1M tokens)
Google Models:
  • text-embedding-004 (768 dim, free with Gemini API)

Storage Backend

ragStorageBackend: json       # or 'sqlite'
ragStoragePath: ~/.orcbot/rag
JSON Storage:
  • Simple file-based storage
  • Fast for small to medium datasets (under 10,000 chunks)
  • No external dependencies
SQLite Storage:
  • Better performance for large datasets
  • Full-text search support
  • Requires better-sqlite3 package

Search Settings

ragDefaultLimit: 5            # Results per search
ragSimilarityThreshold: 0.7   # Minimum score (0-1)
ragRerankEnabled: false       # Use LLM to rerank results

Best Practices

Collection strategy: Use collections to logically group related documents (e.g., “v2.0-docs”, “v2.1-docs”). This makes it easier to search specific knowledge domains and delete outdated content.

Do’s

  • Use descriptive source names (“installation-guide.md” vs “doc1.md”)
  • Tag documents with relevant keywords for filtering
  • Set appropriate chunk sizes (500-1500 characters)
  • Use semantic search, not keyword matching
  • Regularly clean up outdated documents

Don’ts

  • Don’t ingest personal or sensitive information
  • Don’t use RAG for tiny snippets (under 200 chars) - use short memory
  • Don’t store secrets or API keys in RAG
  • Don’t set chunk size too small (under 300) or too large (over 2000)
  • Don’t forget to set threshold - low-quality results waste tokens

Performance

Embedding Speed

  • OpenAI: ~500 chunks/minute (rate limited)
  • Google: ~1000 chunks/minute
Optimization: Batch-embed multiple chunks per API call.

Search Speed

  • JSON storage: ~10ms for 1,000 chunks, ~100ms for 10,000 chunks
  • SQLite storage: ~5ms for 10,000 chunks, ~20ms for 100,000 chunks
Optimization: Use SQLite for datasets over 5,000 chunks.

Storage Size

  • Per chunk: ~2KB (text + embedding + metadata)
  • 10,000 chunks: ~20MB
  • 100,000 chunks: ~200MB

Troubleshooting

No Results Found

Symptoms: rag_search returns empty results even though documents exist. Causes:
  • Query too specific or uses different terminology
  • Similarity threshold too high
  • Documents not chunked properly
Solution:
{
  "skillName": "rag_search",
  "query": "Your query here",
  "threshold": 0.5,  // Lower threshold
  "limit": 10        // More results
}

Slow Ingestion

Symptoms: rag_ingest takes a long time for large documents. Causes:
  • Large chunk size
  • Many chunks per document
  • API rate limits
Solution:
# Reduce chunk count
ragChunkSize: 1500
ragMaxChunksPerDoc: 50

High Costs

Symptoms: Embedding API bills are unexpectedly high. Causes:
  • Re-ingesting the same documents multiple times
  • Chunk size too small (more API calls)
Solution:
  • Use rag_list to check what’s already ingested
  • Increase chunk size to 1000-1500 characters
  • Use Google embeddings (free with Gemini API)