Testing - OrcBot

Overview

OrcBot uses Vitest for unit and integration testing. This guide covers testing strategies, running tests, and writing new tests for skills and core components.

Quick Start

Running Tests

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run a specific test file
npx vitest run tests/memory.test.ts

# Run tests matching a pattern
npx vitest run -t "MemoryManager"

Test Output

✓ tests/memory.test.ts (5)
  ✓ MemoryManager (5)
    ✓ should save and retrieve short memory
    ✓ should consolidate episodic memory
    ✓ should perform semantic search
    ✓ should respect memory limits
    ✓ should flush stale memories

Test Files  1 passed (1)
Tests  5 passed (5)
Duration  2.34s

Test Structure

File Organization

tests/
├── memory.test.ts           # Memory system tests
├── skills/
│   ├── web-search.test.ts   # Web search skill tests
│   ├── files.test.ts        # File operation tests
│   └── shell.test.ts        # Shell execution tests
├── core/
│   ├── agent.test.ts        # Agent loop tests
│   ├── decision.test.ts     # Decision pipeline tests
│   └── skills.test.ts       # Skills manager tests
└── utils/
    ├── parser.test.ts       # JSON parser tests
    └── logger.test.ts       # Logger tests

Test File Template

import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { YourComponent } from '../src/core/YourComponent';

describe('YourComponent', () => {
  let component: YourComponent;
  
  beforeEach(() => {
    // Setup
    component = new YourComponent();
  });
  
  afterEach(() => {
    // Cleanup
    component.cleanup();
  });
  
  it('should do something', async () => {
    // Arrange
    const input = 'test';
    
    // Act
    const result = await component.process(input);
    
    // Assert
    expect(result).toBe('expected');
  });
});

Testing Strategies

Unit Tests

Test individual functions and classes in isolation. Example: Testing a Parser Function

import { parseDecisionOutput } from '../src/core/ParserLayer';

it('should parse valid JSON decision', () => {
  const input = `{"reasoning": "test", "tools": [], "completed": false}`;
  const result = parseDecisionOutput(input);
  
  expect(result).toEqual({
    reasoning: 'test',
    tools: [],
    completed: false
  });
});

it('should handle malformed JSON with fallback', () => {
  const input = `{reasoning: test, tools: []}`;
  const result = parseDecisionOutput(input);
  
  expect(result).toBeDefined();
  expect(result.reasoning).toContain('test');
});

Integration Tests

Test multiple components working together. Example: Testing Agent with Memory

import { Agent } from '../src/core/Agent';
import { MemoryManager } from '../src/memory/MemoryManager';

it('should save action results to memory', async () => {
  const agent = new Agent();
  const memory = agent.memory;
  
  await agent.processAction({
    actionId: 'test-1',
    task: 'Test task',
    priority: 5
  });
  
  const recent = memory.getRecentContext();
  expect(recent).toContain('Test task');
});

Mock Testing

Mock external dependencies (LLM, APIs) for fast, reliable tests. Example: Mocking LLM Calls

import { vi } from 'vitest';
import { MultiLLM } from '../src/core/MultiLLM';

it('should handle LLM responses', async () => {
  const mockLLM = new MultiLLM({});
  
  // Mock the chat method
  vi.spyOn(mockLLM, 'chat').mockResolvedValue({
    content: '{"reasoning": "mocked", "tools": [], "completed": false}'
  });
  
  const response = await mockLLM.chat([
    { role: 'user', content: 'test' }
  ]);
  
  expect(response.content).toContain('mocked');
});

End-to-End Tests

Test the full agent loop with real skills (use sparingly, they’re slow). Example: Testing Full Action Execution

import { Agent } from '../src/core/Agent';

it('should complete a simple task end-to-end', async () => {
  const agent = new Agent();
  
  const result = await agent.processAction({
    actionId: 'e2e-1',
    task: 'Read the file test.txt and tell me its contents',
    priority: 5
  });
  
  expect(result.completed).toBe(true);
  expect(result.observations).toContain('File contents:');
}, 30000); // 30s timeout for E2E tests

Testing Skills

Skill Test Template

import { describe, it, expect } from 'vitest';
import { yourSkill } from '../src/skills/yourSkill';

describe('yourSkill', () => {
  it('should have correct metadata', () => {
    expect(yourSkill.name).toBe('your_skill_name');
    expect(yourSkill.description).toBeDefined();
    expect(yourSkill.parameters).toBeDefined();
  });
  
  it('should execute successfully', async () => {
    const context = {
      actionId: 'test-1',
      userId: 'test-user',
      channelType: 'cli'
    };
    
    const result = await yourSkill.handler(
      { param1: 'value1' },
      context
    );
    
    expect(result).toContain('expected output');
  });
  
  it('should handle errors gracefully', async () => {
    const context = { actionId: 'test-1' };
    
    const result = await yourSkill.handler(
      { param1: 'invalid' },
      context
    );
    
    expect(result).toContain('error');
  });
});

Testing Web Skills

Mock HTTP Requests:

import { vi } from 'vitest';
import { webSearchSkill } from '../src/core/Agent';

it('should search the web', async () => {
  // Mock fetch
  global.fetch = vi.fn().mockResolvedValue({
    ok: true,
    json: async () => ({
      organic: [
        { title: 'Result 1', snippet: 'Description 1' }
      ]
    })
  });
  
  const result = await webSearchSkill.handler(
    { query: 'test query' },
    { actionId: 'test-1' }
  );
  
  expect(result).toContain('Result 1');
  expect(fetch).toHaveBeenCalledWith(
    expect.stringContaining('test query')
  );
});

Testing File Skills

Use Temporary Directories:

import fs from 'fs';
import path from 'path';
import os from 'os';
import { readFileSkill, writeFileSkill } from '../src/core/Agent';

it('should write and read files', async () => {
  const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'orcbot-test-'));
  const testFile = path.join(tempDir, 'test.txt');
  
  // Write
  await writeFileSkill.handler(
    { filePath: testFile, content: 'Hello' },
    { actionId: 'test-1' }
  );
  
  // Read
  const result = await readFileSkill.handler(
    { filePath: testFile },
    { actionId: 'test-1' }
  );
  
  expect(result).toContain('Hello');
  
  // Cleanup
  fs.rmSync(tempDir, { recursive: true });
});

Testing Memory System

Short Memory Tests

import { MemoryManager } from '../src/memory/MemoryManager';

it('should save and retrieve short memory', () => {
  const memory = new MemoryManager('/tmp/orcbot-test');
  
  memory.saveMemory({
    actionId: 'test-1',
    step: 1,
    content: 'Test observation'
  });
  
  const recent = memory.getRecentContext();
  expect(recent).toContain('Test observation');
});

Episodic Memory Tests

it('should consolidate episodic memory', async () => {
  const memory = new MemoryManager('/tmp/orcbot-test');
  
  // Add many short memories
  for (let i = 0; i < 50; i++) {
    memory.saveMemory({
      actionId: `test-${i}`,
      step: 1,
      content: `Memory ${i}`
    });
  }
  
  // Trigger consolidation
  await memory.consolidateIfNeeded();
  
  const episodic = memory.getEpisodicMemory();
  expect(episodic.length).toBeGreaterThan(0);
});

Vector Memory Tests

it('should perform semantic search', async () => {
  const memory = new MemoryManager('/tmp/orcbot-test');
  
  // Add searchable content
  await memory.saveToVectorMemory('OrcBot is an autonomous agent');
  await memory.saveToVectorMemory('TypeScript is a programming language');
  
  // Search
  const results = await memory.semanticSearch('agent');
  
  expect(results[0].content).toContain('autonomous');
});

Testing Decision Pipeline

Deduplication Tests

import { DecisionPipeline } from '../src/core/DecisionPipeline';

it('should deduplicate repeated tool calls', () => {
  const pipeline = new DecisionPipeline();
  
  const decision = {
    reasoning: 'test',
    tools: [
      { name: 'web_search', metadata: { query: 'test' } },
      { name: 'web_search', metadata: { query: 'test' } }
    ],
    completed: false
  };
  
  const filtered = pipeline.applyGuardrails(decision, [
    { name: 'web_search', metadata: { query: 'test' } }
  ]);
  
  expect(filtered.tools).toHaveLength(0); // Duplicate removed
});

Termination Review Tests

it('should block premature completion', () => {
  const pipeline = new DecisionPipeline();
  
  const decision = {
    reasoning: 'Done',
    tools: [],
    completed: true
  };
  
  const result = pipeline.reviewTermination(decision, {
    deepToolsUsed: ['web_search'],
    lastSentStep: 0,
    currentStep: 2
  });
  
  expect(result.shouldBlock).toBe(true);
  expect(result.auditCodes).toContain('UNSENT_RESULTS');
});

Testing Configuration

vitest.config.ts

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    setupFiles: ['./tests/setup.ts'],
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      exclude: [
        'node_modules/',
        'dist/',
        'tests/',
        '**/*.test.ts'
      ]
    },
    testTimeout: 10000,
    hookTimeout: 10000
  }
});

Test Setup File

// tests/setup.ts
import { beforeAll, afterAll } from 'vitest';
import fs from 'fs';
import path from 'path';
import os from 'os';

let tempDir: string;

beforeAll(() => {
  // Create temp directory for tests
  tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'orcbot-test-'));
  process.env.ORCBOT_DATA_DIR = tempDir;
});

afterAll(() => {
  // Cleanup
  if (fs.existsSync(tempDir)) {
    fs.rmSync(tempDir, { recursive: true });
  }
});

Continuous Integration

GitHub Actions

# .github/workflows/test.yml
name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      - run: npm install
      - run: npm run build
      - run: npm test
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/coverage-final.json

Best Practices

Write Fast Tests

Mock external APIs
Use in-memory storage
Keep test data small
Parallelize tests

Test Boundaries

Test error handling
Test edge cases
Test invalid inputs
Test resource limits

Keep Tests Isolated

No shared state
Clean up after each test
Use temp directories
Reset mocks

Maintain Coverage

Aim for 80%+ coverage
Cover critical paths
Test new features
Update tests with code changes

Troubleshooting

Tests Timing Out

Symptoms: Tests fail with “Test timeout” error. Solutions:

// Increase timeout for specific test
it('slow test', async () => {
  // test code
}, 30000); // 30 seconds

// Or in vitest.config.ts
export default defineConfig({
  test: {
    testTimeout: 20000
  }
});

Flaky Tests

Symptoms: Tests pass sometimes, fail other times. Common Causes:

Race conditions
Shared state between tests
Network/filesystem timing

Solutions:

// Use waitFor for async assertions
import { waitFor } from '@testing-library/react';

await waitFor(() => {
  expect(element).toBeInTheDocument();
});

// Add proper cleanup
afterEach(async () => {
  await agent.cleanup();
  vi.clearAllMocks();
});

Memory Leaks

Symptoms: Tests slow down or crash after many runs. Solutions:

afterEach(() => {
  // Clear event listeners
  eventBus.removeAllListeners();
  
  // Close connections
  await memory.close();
  
  // Clear intervals
  clearInterval(heartbeatInterval);
});

Vitest Documentation

Official Vitest testing framework docs

Agent Architecture

Understand the agent structure for better tests

Skills System

Learn how skills work to write skill tests

Contributing Guide

Guidelines for contributing tests

Get Started

Core Concepts

Guides

Advanced

Documentation Index

​Overview

​Quick Start

​Running Tests

​Test Output

​Test Structure

​File Organization

​Test File Template

​Testing Strategies

​Unit Tests

​Integration Tests

​Mock Testing

​End-to-End Tests

​Testing Skills

​Skill Test Template

​Testing Web Skills

​Testing File Skills

​Testing Memory System

​Short Memory Tests

​Episodic Memory Tests

​Vector Memory Tests

​Testing Decision Pipeline

​Deduplication Tests

​Termination Review Tests

​Testing Configuration

​vitest.config.ts

​Test Setup File

​Continuous Integration

​GitHub Actions

​Best Practices