RAG Knowledge Store

Overview

The RAG (Retrieval-Augmented Generation) knowledge store enables OrcBot to maintain a persistent, searchable knowledge base. Documents are chunked, embedded using vector models, and stored for semantic retrieval during agent reasoning.

RAG is ideal for large documents, reference materials, or datasets that exceed the LLM’s context window. The agent can semantically search the knowledge store and retrieve relevant chunks on-demand.

How It Works

Architecture

KnowledgeStore (src/memory/KnowledgeStore.ts) - Main interface
VectorMemory (src/memory/VectorMemory.ts) - Vector storage backend
Embeddings - OpenAI text-embedding-3-small or Google embeddings
Storage - File-based JSON with optional SQLite backend

Ingesting Documents

From Text

content

string

required

Text content to ingest

source

string

required

Document identifier (e.g., filename, URL)

collection

string

default:"default"

Logical grouping for documents

From File

{
  "skillName": "rag_ingest_file",
  "filePath": "/path/to/document.pdf",
  "collection": "research-papers",
  "tags": ["quantum", "2024"]
}

Supported Formats:

Plain text (.txt, .md)
PDF (.pdf) - Extracted via pdf-parse
HTML - Cleaned via Readability

From URL

{
  "skillName": "rag_ingest_url",
  "url": "https://example.com/article",
  "collection": "web-research",
  "tags": ["AI", "2024"]
}

URLs are fetched and cleaned with Readability before ingestion. JavaScript-rendered content requires browser_navigate first.

Searching the Knowledge Store

Semantic Search

query

string

required

Natural language search query

limit

number

default:"5"

Number of results to return (1-20)

collection

string

Limit search to specific collection

Managing Documents

List Collections

{
  "skillName": "rag_list",
  "type": "collections"
}

Response:

{
  "collections": [
    {"name": "documentation", "documents": 45},
    {"name": "research-papers", "documents": 12},
    {"name": "web-research", "documents": 28}
  ]
}

List Documents

{
  "skillName": "rag_list",
  "type": "documents",
  "collection": "documentation"
}

Delete Document

Deletion is permanent. Backup your knowledge store before bulk deletions.

{
  "skillName": "rag_delete",
  "source": "old-document.md",
  "collection": "documentation"
}

Delete Collection

{
  "skillName": "rag_delete_collection",
  "collection": "deprecated-docs"
}

Chunking Strategy

Default Settings

# orcbot.config.yaml
ragChunkSize: 1000          # Characters per chunk
ragChunkOverlap: 200        # Overlap between chunks
ragMaxChunksPerDoc: 100     # Limit per document

How Chunking Works

Split by Paragraphs

Documents are split at paragraph boundaries (double newlines).

Respect Chunk Size

Each chunk is approximately ragChunkSize characters.Chunks never exceed 1500 characters to stay within embedding limits.

Add Overlap

The last ragChunkOverlap characters of each chunk are prepended to the next chunk.This preserves context across chunk boundaries.

Generate Embeddings

Each chunk is embedded using text-embedding-3-small (1536 dimensions).Cost: ~$0.00002 per 1000 tokens.

Custom Chunking

{
  "skillName": "rag_ingest",
  "content": "Large document...",
  "source": "technical-spec.md",
  "chunkSize": 1500,
  "chunkOverlap": 300
}

Use Cases

Documentation Search

Ingest your entire documentation and let the agent search it on-demand.

orcbot push "Ingest all markdown files in ./docs into RAG"

Research Assistant

Store research papers and query them during tasks.

orcbot push "Search RAG for quantum computing papers from 2024"

Customer Support Knowledge Base

Ingest support articles, FAQs, and product manuals.The agent retrieves relevant answers during user interactions.

Code Reference

Store code examples, API references, or architecture docs.

orcbot push "Ingest OpenAPI spec into RAG for skill routing"

Configuration

Embeddings Provider

# orcbot.config.yaml
ragEmbeddingProvider: openai  # or 'google'
ragEmbeddingModel: text-embedding-3-small

OpenAI Models:

text-embedding-3-small (1536 dim, $0.02/1M tokens)
text-embedding-3-large (3072 dim, $0.13/1M tokens)

Google Models:

text-embedding-004 (768 dim, free with Gemini API)

Storage Backend

ragStorageBackend: json       # or 'sqlite'
ragStoragePath: ~/.orcbot/rag

JSON Storage:

Simple file-based storage
Fast for small to medium datasets (under 10,000 chunks)
No external dependencies

SQLite Storage:

Better performance for large datasets
Full-text search support
Requires better-sqlite3 package

Search Settings

ragDefaultLimit: 5            # Results per search
ragSimilarityThreshold: 0.7   # Minimum score (0-1)
ragRerankEnabled: false       # Use LLM to rerank results

Best Practices

Collection strategy: Use collections to logically group related documents (e.g., “v2.0-docs”, “v2.1-docs”). This makes it easier to search specific knowledge domains and delete outdated content.

Do’s

Use descriptive source names (“installation-guide.md” vs “doc1.md”)
Tag documents with relevant keywords for filtering
Set appropriate chunk sizes (500-1500 characters)
Use semantic search, not keyword matching
Regularly clean up outdated documents

Don’ts

Don’t ingest personal or sensitive information
Don’t use RAG for tiny snippets (under 200 chars) - use short memory
Don’t store secrets or API keys in RAG
Don’t set chunk size too small (under 300) or too large (over 2000)
Don’t forget to set threshold - low-quality results waste tokens

Performance

Embedding Speed

OpenAI: ~500 chunks/minute (rate limited)
Google: ~1000 chunks/minute

Optimization: Batch-embed multiple chunks per API call.

Search Speed

JSON storage: ~10ms for 1,000 chunks, ~100ms for 10,000 chunks
SQLite storage: ~5ms for 10,000 chunks, ~20ms for 100,000 chunks

Optimization: Use SQLite for datasets over 5,000 chunks.

Storage Size

Per chunk: ~2KB (text + embedding + metadata)
10,000 chunks: ~20MB
100,000 chunks: ~200MB

Troubleshooting

No Results Found

Symptoms: rag_search returns empty results even though documents exist. Causes:

Query too specific or uses different terminology
Similarity threshold too high
Documents not chunked properly

Solution:

{
  "skillName": "rag_search",
  "query": "Your query here",
  "threshold": 0.5,  // Lower threshold
  "limit": 10        // More results
}

Slow Ingestion

Symptoms: rag_ingest takes a long time for large documents. Causes:

Large chunk size
Many chunks per document
API rate limits

Solution:

# Reduce chunk count
ragChunkSize: 1500
ragMaxChunksPerDoc: 50

High Costs

Symptoms: Embedding API bills are unexpectedly high. Causes:

Re-ingesting the same documents multiple times
Chunk size too small (more API calls)

Solution:

Use rag_list to check what’s already ingested
Increase chunk size to 1000-1500 characters
Use Google embeddings (free with Gemini API)

rag_ingest

Ingest text content into the knowledge store

rag_ingest_file

Ingest a file into the knowledge store

rag_ingest_url

Ingest content from a URL

rag_search

Perform semantic search

rag_list

List collections and documents

rag_delete

Delete a document or collection

​Overview

​How It Works

​Architecture

​Ingesting Documents

​From Text

​From File

​From URL

​Searching the Knowledge Store

​Semantic Search

​Managing Documents

​List Collections

​List Documents

​Delete Document

​Delete Collection

​Chunking Strategy

​Default Settings

​How Chunking Works

​Custom Chunking

​Use Cases

Documentation Search

Research Assistant

Customer Support Knowledge Base

Code Reference

​Configuration

​Embeddings Provider

​Storage Backend

​Search Settings

​Best Practices

​Do’s

​Don’ts

​Performance

​Embedding Speed

​Search Speed

​Storage Size

​Troubleshooting

​No Results Found

​Slow Ingestion

​High Costs

​Related Skills

rag_ingest

rag_ingest_file

rag_ingest_url

rag_search

rag_list

rag_delete

Overview

How It Works

Architecture

Ingesting Documents

From Text

From File

From URL

Searching the Knowledge Store

Semantic Search

Managing Documents

List Collections

List Documents

Delete Document

Delete Collection

Chunking Strategy

Default Settings

How Chunking Works

Custom Chunking

Use Cases

Configuration

Embeddings Provider

Storage Backend

Search Settings

Best Practices

Do’s

Don’ts

Performance

Embedding Speed

Search Speed

Storage Size

Troubleshooting

No Results Found

Slow Ingestion

High Costs

Related Skills