Skip to main content

Overview

OrcBot uses Vitest for unit and integration testing. This guide covers testing strategies, running tests, and writing new tests for skills and core components.

Quick Start

Running Tests

# Run all tests
npm test

# Run tests in watch mode
npm run test:watch

# Run a specific test file
npx vitest run tests/memory.test.ts

# Run tests matching a pattern
npx vitest run -t "MemoryManager"

Test Output

 tests/memory.test.ts (5)
 MemoryManager (5)
 should save and retrieve short memory
 should consolidate episodic memory
 should perform semantic search
 should respect memory limits
 should flush stale memories

Test Files  1 passed (1)
Tests  5 passed (5)
Duration  2.34s

Test Structure

File Organization

tests/
├── memory.test.ts           # Memory system tests
├── skills/
│   ├── web-search.test.ts   # Web search skill tests
│   ├── files.test.ts        # File operation tests
│   └── shell.test.ts        # Shell execution tests
├── core/
│   ├── agent.test.ts        # Agent loop tests
│   ├── decision.test.ts     # Decision pipeline tests
│   └── skills.test.ts       # Skills manager tests
└── utils/
    ├── parser.test.ts       # JSON parser tests
    └── logger.test.ts       # Logger tests

Test File Template

import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { YourComponent } from '../src/core/YourComponent';

describe('YourComponent', () => {
  let component: YourComponent;
  
  beforeEach(() => {
    // Setup
    component = new YourComponent();
  });
  
  afterEach(() => {
    // Cleanup
    component.cleanup();
  });
  
  it('should do something', async () => {
    // Arrange
    const input = 'test';
    
    // Act
    const result = await component.process(input);
    
    // Assert
    expect(result).toBe('expected');
  });
});

Testing Strategies

Unit Tests

Test individual functions and classes in isolation. Example: Testing a Parser Function
import { parseDecisionOutput } from '../src/core/ParserLayer';

it('should parse valid JSON decision', () => {
  const input = `{"reasoning": "test", "tools": [], "completed": false}`;
  const result = parseDecisionOutput(input);
  
  expect(result).toEqual({
    reasoning: 'test',
    tools: [],
    completed: false
  });
});

it('should handle malformed JSON with fallback', () => {
  const input = `{reasoning: test, tools: []}`;
  const result = parseDecisionOutput(input);
  
  expect(result).toBeDefined();
  expect(result.reasoning).toContain('test');
});

Integration Tests

Test multiple components working together. Example: Testing Agent with Memory
import { Agent } from '../src/core/Agent';
import { MemoryManager } from '../src/memory/MemoryManager';

it('should save action results to memory', async () => {
  const agent = new Agent();
  const memory = agent.memory;
  
  await agent.processAction({
    actionId: 'test-1',
    task: 'Test task',
    priority: 5
  });
  
  const recent = memory.getRecentContext();
  expect(recent).toContain('Test task');
});

Mock Testing

Mock external dependencies (LLM, APIs) for fast, reliable tests. Example: Mocking LLM Calls
import { vi } from 'vitest';
import { MultiLLM } from '../src/core/MultiLLM';

it('should handle LLM responses', async () => {
  const mockLLM = new MultiLLM({});
  
  // Mock the chat method
  vi.spyOn(mockLLM, 'chat').mockResolvedValue({
    content: '{"reasoning": "mocked", "tools": [], "completed": false}'
  });
  
  const response = await mockLLM.chat([
    { role: 'user', content: 'test' }
  ]);
  
  expect(response.content).toContain('mocked');
});

End-to-End Tests

Test the full agent loop with real skills (use sparingly, they’re slow). Example: Testing Full Action Execution
import { Agent } from '../src/core/Agent';

it('should complete a simple task end-to-end', async () => {
  const agent = new Agent();
  
  const result = await agent.processAction({
    actionId: 'e2e-1',
    task: 'Read the file test.txt and tell me its contents',
    priority: 5
  });
  
  expect(result.completed).toBe(true);
  expect(result.observations).toContain('File contents:');
}, 30000); // 30s timeout for E2E tests

Testing Skills

Skill Test Template

import { describe, it, expect } from 'vitest';
import { yourSkill } from '../src/skills/yourSkill';

describe('yourSkill', () => {
  it('should have correct metadata', () => {
    expect(yourSkill.name).toBe('your_skill_name');
    expect(yourSkill.description).toBeDefined();
    expect(yourSkill.parameters).toBeDefined();
  });
  
  it('should execute successfully', async () => {
    const context = {
      actionId: 'test-1',
      userId: 'test-user',
      channelType: 'cli'
    };
    
    const result = await yourSkill.handler(
      { param1: 'value1' },
      context
    );
    
    expect(result).toContain('expected output');
  });
  
  it('should handle errors gracefully', async () => {
    const context = { actionId: 'test-1' };
    
    const result = await yourSkill.handler(
      { param1: 'invalid' },
      context
    );
    
    expect(result).toContain('error');
  });
});

Testing Web Skills

Mock HTTP Requests:
import { vi } from 'vitest';
import { webSearchSkill } from '../src/core/Agent';

it('should search the web', async () => {
  // Mock fetch
  global.fetch = vi.fn().mockResolvedValue({
    ok: true,
    json: async () => ({
      organic: [
        { title: 'Result 1', snippet: 'Description 1' }
      ]
    })
  });
  
  const result = await webSearchSkill.handler(
    { query: 'test query' },
    { actionId: 'test-1' }
  );
  
  expect(result).toContain('Result 1');
  expect(fetch).toHaveBeenCalledWith(
    expect.stringContaining('test query')
  );
});

Testing File Skills

Use Temporary Directories:
import fs from 'fs';
import path from 'path';
import os from 'os';
import { readFileSkill, writeFileSkill } from '../src/core/Agent';

it('should write and read files', async () => {
  const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'orcbot-test-'));
  const testFile = path.join(tempDir, 'test.txt');
  
  // Write
  await writeFileSkill.handler(
    { filePath: testFile, content: 'Hello' },
    { actionId: 'test-1' }
  );
  
  // Read
  const result = await readFileSkill.handler(
    { filePath: testFile },
    { actionId: 'test-1' }
  );
  
  expect(result).toContain('Hello');
  
  // Cleanup
  fs.rmSync(tempDir, { recursive: true });
});

Testing Memory System

Short Memory Tests

import { MemoryManager } from '../src/memory/MemoryManager';

it('should save and retrieve short memory', () => {
  const memory = new MemoryManager('/tmp/orcbot-test');
  
  memory.saveMemory({
    actionId: 'test-1',
    step: 1,
    content: 'Test observation'
  });
  
  const recent = memory.getRecentContext();
  expect(recent).toContain('Test observation');
});

Episodic Memory Tests

it('should consolidate episodic memory', async () => {
  const memory = new MemoryManager('/tmp/orcbot-test');
  
  // Add many short memories
  for (let i = 0; i < 50; i++) {
    memory.saveMemory({
      actionId: `test-${i}`,
      step: 1,
      content: `Memory ${i}`
    });
  }
  
  // Trigger consolidation
  await memory.consolidateIfNeeded();
  
  const episodic = memory.getEpisodicMemory();
  expect(episodic.length).toBeGreaterThan(0);
});

Vector Memory Tests

it('should perform semantic search', async () => {
  const memory = new MemoryManager('/tmp/orcbot-test');
  
  // Add searchable content
  await memory.saveToVectorMemory('OrcBot is an autonomous agent');
  await memory.saveToVectorMemory('TypeScript is a programming language');
  
  // Search
  const results = await memory.semanticSearch('agent');
  
  expect(results[0].content).toContain('autonomous');
});

Testing Decision Pipeline

Deduplication Tests

import { DecisionPipeline } from '../src/core/DecisionPipeline';

it('should deduplicate repeated tool calls', () => {
  const pipeline = new DecisionPipeline();
  
  const decision = {
    reasoning: 'test',
    tools: [
      { name: 'web_search', metadata: { query: 'test' } },
      { name: 'web_search', metadata: { query: 'test' } }
    ],
    completed: false
  };
  
  const filtered = pipeline.applyGuardrails(decision, [
    { name: 'web_search', metadata: { query: 'test' } }
  ]);
  
  expect(filtered.tools).toHaveLength(0); // Duplicate removed
});

Termination Review Tests

it('should block premature completion', () => {
  const pipeline = new DecisionPipeline();
  
  const decision = {
    reasoning: 'Done',
    tools: [],
    completed: true
  };
  
  const result = pipeline.reviewTermination(decision, {
    deepToolsUsed: ['web_search'],
    lastSentStep: 0,
    currentStep: 2
  });
  
  expect(result.shouldBlock).toBe(true);
  expect(result.auditCodes).toContain('UNSENT_RESULTS');
});

Testing Configuration

vitest.config.ts

import { defineConfig } from 'vitest/config';

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    setupFiles: ['./tests/setup.ts'],
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      exclude: [
        'node_modules/',
        'dist/',
        'tests/',
        '**/*.test.ts'
      ]
    },
    testTimeout: 10000,
    hookTimeout: 10000
  }
});

Test Setup File

// tests/setup.ts
import { beforeAll, afterAll } from 'vitest';
import fs from 'fs';
import path from 'path';
import os from 'os';

let tempDir: string;

beforeAll(() => {
  // Create temp directory for tests
  tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'orcbot-test-'));
  process.env.ORCBOT_DATA_DIR = tempDir;
});

afterAll(() => {
  // Cleanup
  if (fs.existsSync(tempDir)) {
    fs.rmSync(tempDir, { recursive: true });
  }
});

Continuous Integration

GitHub Actions

# .github/workflows/test.yml
name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      - run: npm install
      - run: npm run build
      - run: npm test
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          files: ./coverage/coverage-final.json

Best Practices

Write Fast Tests

  • Mock external APIs
  • Use in-memory storage
  • Keep test data small
  • Parallelize tests

Test Boundaries

  • Test error handling
  • Test edge cases
  • Test invalid inputs
  • Test resource limits

Keep Tests Isolated

  • No shared state
  • Clean up after each test
  • Use temp directories
  • Reset mocks

Maintain Coverage

  • Aim for 80%+ coverage
  • Cover critical paths
  • Test new features
  • Update tests with code changes

Troubleshooting

Tests Timing Out

Symptoms: Tests fail with “Test timeout” error. Solutions:
// Increase timeout for specific test
it('slow test', async () => {
  // test code
}, 30000); // 30 seconds

// Or in vitest.config.ts
export default defineConfig({
  test: {
    testTimeout: 20000
  }
});

Flaky Tests

Symptoms: Tests pass sometimes, fail other times. Common Causes:
  • Race conditions
  • Shared state between tests
  • Network/filesystem timing
Solutions:
// Use waitFor for async assertions
import { waitFor } from '@testing-library/react';

await waitFor(() => {
  expect(element).toBeInTheDocument();
});

// Add proper cleanup
afterEach(async () => {
  await agent.cleanup();
  vi.clearAllMocks();
});

Memory Leaks

Symptoms: Tests slow down or crash after many runs. Solutions:
afterEach(() => {
  // Clear event listeners
  eventBus.removeAllListeners();
  
  // Close connections
  await memory.close();
  
  // Clear intervals
  clearInterval(heartbeatInterval);
});