Overview
The RAG (Retrieval-Augmented Generation) knowledge store enables OrcBot to maintain a persistent, searchable knowledge base. Documents are chunked, embedded using vector models, and stored for semantic retrieval during agent reasoning.RAG is ideal for large documents, reference materials, or datasets that exceed the LLM’s context window. The agent can semantically search the knowledge store and retrieve relevant chunks on-demand.
How It Works
Architecture
- KnowledgeStore (
src/memory/KnowledgeStore.ts) - Main interface - VectorMemory (
src/memory/VectorMemory.ts) - Vector storage backend - Embeddings - OpenAI
text-embedding-3-smallor Google embeddings - Storage - File-based JSON with optional SQLite backend
Ingesting Documents
From Text
Text content to ingest
Document identifier (e.g., filename, URL)
Logical grouping for documents
Tags for filtering (e.g., [“technical”, “2024”])
Additional metadata (author, date, etc.)
From File
- Plain text (
.txt,.md) - PDF (
.pdf) - Extracted viapdf-parse - HTML - Cleaned via Readability
From URL
URLs are fetched and cleaned with Readability before ingestion. JavaScript-rendered content requires
browser_navigate first.Searching the Knowledge Store
Semantic Search
Natural language search query
Number of results to return (1-20)
Limit search to specific collection
Filter by tags
Minimum similarity score (0-1)
Managing Documents
List Collections
List Documents
Delete Document
Delete Collection
Chunking Strategy
Default Settings
How Chunking Works
Respect Chunk Size
Each chunk is approximately
ragChunkSize characters.Chunks never exceed 1500 characters to stay within embedding limits.Add Overlap
The last
ragChunkOverlap characters of each chunk are prepended to the next chunk.This preserves context across chunk boundaries.Custom Chunking
Use Cases
Documentation Search
Ingest your entire documentation and let the agent search it on-demand.
Research Assistant
Store research papers and query them during tasks.
Customer Support Knowledge Base
Ingest support articles, FAQs, and product manuals.The agent retrieves relevant answers during user interactions.
Code Reference
Store code examples, API references, or architecture docs.
Configuration
Embeddings Provider
text-embedding-3-small(1536 dim, $0.02/1M tokens)text-embedding-3-large(3072 dim, $0.13/1M tokens)
text-embedding-004(768 dim, free with Gemini API)
Storage Backend
- Simple file-based storage
- Fast for small to medium datasets (under 10,000 chunks)
- No external dependencies
- Better performance for large datasets
- Full-text search support
- Requires
better-sqlite3package
Search Settings
Best Practices
Do’s
- Use descriptive source names (“installation-guide.md” vs “doc1.md”)
- Tag documents with relevant keywords for filtering
- Set appropriate chunk sizes (500-1500 characters)
- Use semantic search, not keyword matching
- Regularly clean up outdated documents
Don’ts
- Don’t ingest personal or sensitive information
- Don’t use RAG for tiny snippets (under 200 chars) - use short memory
- Don’t store secrets or API keys in RAG
- Don’t set chunk size too small (under 300) or too large (over 2000)
- Don’t forget to set
threshold- low-quality results waste tokens
Performance
Embedding Speed
- OpenAI: ~500 chunks/minute (rate limited)
- Google: ~1000 chunks/minute
Search Speed
- JSON storage: ~10ms for 1,000 chunks, ~100ms for 10,000 chunks
- SQLite storage: ~5ms for 10,000 chunks, ~20ms for 100,000 chunks
Storage Size
- Per chunk: ~2KB (text + embedding + metadata)
- 10,000 chunks: ~20MB
- 100,000 chunks: ~200MB
Troubleshooting
No Results Found
Symptoms:rag_search returns empty results even though documents exist.
Causes:
- Query too specific or uses different terminology
- Similarity threshold too high
- Documents not chunked properly
Slow Ingestion
Symptoms:rag_ingest takes a long time for large documents.
Causes:
- Large chunk size
- Many chunks per document
- API rate limits
High Costs
Symptoms: Embedding API bills are unexpectedly high. Causes:- Re-ingesting the same documents multiple times
- Chunk size too small (more API calls)
- Use
rag_listto check what’s already ingested - Increase chunk size to 1000-1500 characters
- Use Google embeddings (free with Gemini API)