Agent Loop (ReAct)

OrcBot uses the ReAct (Reasoning + Acting) paradigm to autonomously solve tasks through iterative thought, action, and observation cycles.

What is ReAct?

ReAct is a prompt engineering pattern that gives language models the ability to:

Think about what to do next (reasoning)
Act by calling tools/functions
Observe the results
Re-reason based on observations
Repeat until the task is complete

This is fundamentally different from single-shot LLM calls, where the model gives one answer and stops. ReAct enables agents to work through problems iteratively, just like a human would.

The Loop Implementation

The core loop is implemented in Agent.ts (lines 67-797). Here’s the simplified flow:

while (!actionCompleted && step < maxSteps) {
  // 1. THINK: Call LLM with task + memory + available skills
  const decision = await decisionEngine.decide(action);

  // 2. VALIDATE: Run guardrails (dedup, loop detection, safety)
  const validated = await decisionPipeline.process(decision);

  // 3. ACT: Execute tool calls
  for (const tool of validated.tools) {
    const result = await skills.execute(tool.name, tool.metadata);
    
    // 4. OBSERVE: Save result to memory
    memory.saveMemory({
      id: `${actionId}-step-${step}`,
      type: 'short',
      content: `Observation: Tool ${tool.name} returned ${result}`,
      metadata: { actionId, step, skill: tool.name }
    });
  }

  // 5. CHECK: Is the task complete?
  if (decision.completed) {
    actionCompleted = true;
  }

  step++;
}

Step-by-Step Example

Let’s trace a real task: “Find the weather in Paris and send it to me on Telegram”

Step 0: Pre-Task Simulation

Before entering the loop, SimulationEngine creates a plan:

INPUT: "Find the weather in Paris and send it to me on Telegram"

SIMULATED PLAN:
- Step 1: Use web_search to find current weather in Paris
- Step 2: Extract temperature and conditions from results
- Step 3: Use send_telegram to deliver weather report
- Expected outcome: User receives message with Paris weather
- Potential errors: Search API rate limit, invalid chatId

This plan is injected into the prompt to help the LLM stay on track.

Step 1: Reasoning

Prompt assembled by DecisionEngine:

SYSTEM:
You are OrcBot, an autonomous AI agent. Your goal is to complete tasks using available skills.

AVAILABLE SKILLS:
- web_search(query) — Search the web for information
- send_telegram(chatId, message) — Send a Telegram message
- browser_navigate(url) — Visit a URL and extract content
[... 40+ more skills]

MEMORY (Recent Context):
[2025-01-15 10:23] User: Find the weather in Paris and send it to me on Telegram

SIMULATION PLAN:
- Step 1: Use web_search to find current weather in Paris
- Step 2: Extract temperature and conditions
- Step 3: Use send_telegram to deliver report

TASK: Find the weather in Paris and send it to me on Telegram

RETURN FORMAT:
{
  "reasoning": "Your thought process",
  "tools": [{"name": "skill_name", "metadata": {"arg": "value"}}],
  "completed": false
}

LLM Response (Step 1):

{
  "reasoning": "I need to search for the current weather in Paris. I'll use web_search to find this information.",
  "tools": [
    {
      "name": "web_search",
      "metadata": {
        "query": "weather Paris current temperature"
      }
    }
  ],
  "completed": false
}

Action: Agent executes web_search("weather Paris current temperature") Observation (saved to memory):

Observation: Tool web_search returned 10 results. Top result:
Title: Weather in Paris - Current Conditions
Snippet: Paris, France - Current weather: 8°C (46°F), partly cloudy, wind 15 km/h NW.
URL: https://weather.com/paris

Step 2: Re-Reasoning

Prompt (Step 2):

[Same system prompt as Step 1]

MEMORY (Recent Context):
[Step 1] Observation: Tool web_search returned... 8°C, partly cloudy
[Original] User: Find the weather in Paris and send it to me on Telegram

TASK: Find the weather in Paris and send it to me on Telegram

LLM Response (Step 2):

{
  "reasoning": "I found the weather in Paris: 8°C, partly cloudy. Now I need to send this to the user on Telegram. The chatId is in the action metadata.",
  "tools": [
    {
      "name": "send_telegram",
      "metadata": {
        "chatId": "123456789",
        "message": "Weather in Paris: 8°C (46°F), partly cloudy, wind 15 km/h NW."
      }
    }
  ],
  "completed": false
}

Action: Agent executes send_telegram(123456789, "Weather in Paris: 8°C...") Observation (saved to memory):

Observation: Tool send_telegram succeeded. Message sent to chatId 123456789.

Step 3: Completion

Prompt (Step 3):

[Same system prompt]

MEMORY (Recent Context):
[Step 2] Observation: Tool send_telegram succeeded. Message sent to chatId 123456789.
[Step 1] Observation: Tool web_search returned... 8°C, partly cloudy
[Original] User: Find the weather in Paris and send it to me on Telegram

TASK: Find the weather in Paris and send it to me on Telegram

LLM Response (Step 3):

{
  "reasoning": "I successfully found the weather in Paris (8°C, partly cloudy) and sent it to the user on Telegram. The task is complete.",
  "tools": [],
  "completed": true,
  "summary": "Found current weather in Paris (8°C, partly cloudy) and delivered it to user via Telegram."
}

Action: Agent marks action as completed and exits loop.

Loop Mechanics

Entry Conditions

The loop starts when:

An action is popped from the ActionQueue
Agent.runActionLoop() is called
The action status is 'pending' or 'running'

Exit Conditions

The loop terminates when:

Natural completion: LLM sets completed: true
Max steps reached: Default is 15 steps (configurable via maxStepsPerAction)
Max messages sent: Default is 10 (configurable via maxMessagesPerAction)
Hard timeout: 30 minutes (configurable via actionTimeoutMs)
Cancellation: User or system cancels the action

Step Budget (Dynamic)

OrcBot uses an LLM-based Task Complexity Classifier to adjust step budgets dynamically:

// src/core/DecisionEngine.ts - Task classification
const complexity = await classifyTaskComplexity(taskDescription);
// Returns: 'trivial' | 'simple' | 'standard' | 'complex'

const stepBudgets = {
  trivial: 3,   // "What's 2+2?"
  simple: 8,    // "Search for news"
  standard: 15, // "Research topic and summarize"
  complex: 25   // "Build a full app"
};

maxSteps = stepBudgets[complexity];

This prevents trivial tasks from wasting tokens on 15-step budgets, while giving complex tasks more room to work.

Memory Scope

Each step’s observations are saved with the action ID:

memory.saveMemory({
  id: `${actionId}-step-${stepNumber}`,
  type: 'short',
  content: observation,
  metadata: { actionId, step: stepNumber, skill: toolName }
});

This allows the agent to:

See its own progress within the current task
Filter out unrelated memories from other actions
Clean up step memories after task completion

Step History Compaction

When step count exceeds 10, OrcBot automatically compacts history:

[Step 1] Observation: Tool web_search returned 10 results...
[Step 2] Observation: Tool browser_navigate succeeded...
  --- [8 middle steps compacted] ---
  ... web_search x3 (3 ok, 0 err)
  ... browser_navigate x5 (4 ok, 1 err)
  --- [recent steps below] ---
[Step 11] Observation: Tool send_telegram succeeded
[Step 12] Observation: Tool write_file succeeded

This prevents prompt bloat while preserving context continuity.

Guardrails

Before each tool execution, DecisionPipeline applies safety checks:

1. Deduplication

Prevents repeated identical tool calls within the same action:

const dedupKey = `${toolName}:${JSON.stringify(args)}`;
if (recentToolCalls.has(dedupKey)) {
  return { blocked: true, reason: 'Duplicate tool call detected' };
}

2. Loop Detection

Blocks repetitive patterns (e.g., web_search → browser_navigate → web_search):

const lastThree = toolHistory.slice(-3);
if (lastThree.every(t => t === toolName)) {
  return { blocked: true, reason: 'Loop detected: same tool 3 times in a row' };
}

3. Cross-Channel Send Protection

Non-admin tasks can’t send to other channels:

if (action.source === 'telegram' && toolName === 'send_whatsapp' && !isAdmin) {
  return { blocked: true, reason: 'Cross-channel send blocked' };
}

4. Autonomy Delivery Policy

Heartbeat tasks can only send to allowed channels:

const allowedChannels = config.get('autonomyAllowedChannels'); // ['telegram']
if (isHeartbeat && toolName === 'send_discord' && !allowedChannels.includes('discord')) {
  return { blocked: true, reason: 'Autonomy sends to Discord are disabled' };
}

Termination Review

Before accepting completed: true, OrcBot runs a termination review to prevent premature exits:

if (decision.completed) {
  const review = await blockReviewer.review(action, memory);
  
  if (review.verdict === 'BLOCK') {
    logger.warn(`Termination blocked: ${review.reason}`);
    // Inject feedback and continue loop
    memory.saveMemory({
      id: `${actionId}-step-${step}-completion-audit-blocked`,
      type: 'short',
      content: `[SYSTEM: Completion blocked] ${review.reason}`,
      metadata: { actionId, step, auditCode: review.codes }
    });
    continue; // Don't exit yet
  }
}

Example termination block codes:

Code	Meaning	Fix
`NO_SEND`	No user-visible reply sent for a channel task	Send a message before completing
`UNSENT_RESULTS`	Deep tool output exists after the last message	Send final results summary
`ACK_ONLY`	Only status updates sent, no substantive content	Deliver concrete findings
`ERROR_UNRESOLVED`	Tool errors without recovery/explanation	Retry or explain failure

See decision-pipeline.mdx for more details.

Transparency Nudges

If the agent works silently for too long, the prompt injects a nudge:

⚡ TRANSPARENCY ALERT: You have been working for 5 steps without updating the user.
The user cannot see your internal work — they only see messages you send them.
You MUST send a brief progress update NOW. Examples:
- "I've found [X] so far. Still checking [Y]..."
- "Working on it — I've [done A and B], now [doing C]..."

This prevents the “silent failure” UX issue where users think the agent crashed.

Special Loop Modes

Heartbeat Loop

Autonomous tasks (source: 'autonomy') skip redundant context loading:

const isHeartbeat = action.payload.isHeartbeat;

if (!isHeartbeat) {
  // Load journal/learning/thread context
  journalContent = readJournal();
  learningContent = readLearning();
  threadContext = getThreadContext();
} else {
  // Heartbeat prompts already include journal tail
  // Skip redundant loads to save tokens
}

Time Capsule Mode

High-intensity tasks with relaxed limits:

if (action.payload.isTimeCapsule) {
  maxSteps = 50;
  maxMessages = 20;
  disableTransparencyNudges = true;
}

Only available to admin users. Useful for complex, time-bounded goals.

Lean Mode

Skip expensive context retrieval for simple tasks:

if (action.payload.isLeanMode) {
  skipSemanticRecall = true;
  skipEpisodicRetrieval = true;
  skipRAGContext = true;
}

Automatically enabled for trivial tasks (e.g., “ping”).

Debugging the Loop

To trace loop execution: 1. Enable verbose logs:

# orcbot.config.yaml
logLevel: debug

2. Inspect step memories:

const stepMemories = memory.getActionMemories(actionId);
console.log(stepMemories.map(m => m.content));

3. Watch live in TUI:

orcbot ui
# Navigate to "Memory Viewer" and filter by action ID

4. Check pipeline blocks:

// DecisionPipeline logs all blocks at WARN level
grep "Pipeline blocked" ~/.orcbot/daemon.log

Performance Notes

Token costs per step:

System prompt: ~2,500 tokens (cached after step 1)
Step history: ~500-2,000 tokens (grows with step count, compacted at 10+)
Tool output: ~500-5,000 tokens (truncated if > 10 KB)
LLM response: ~200-500 tokens

Total per action (average): 8-12 steps × 5,000 tokens = 40,000-60,000 tokens Optimization tips:

Use compactSkillsPrompt: true to reduce skills list by 60%
Enable step compaction (default threshold: 10 steps)
Set memoryContentMaxLength to 1500 (default) to truncate large observations
Use lean mode for simple tasks

Get Started

Core Concepts

Guides

Advanced

Agent Loop (ReAct)

What is ReAct?

The Loop Implementation

Step-by-Step Example

Step 0: Pre-Task Simulation

Step 1: Reasoning

Step 2: Re-Reasoning

Step 3: Completion

Loop Mechanics

Entry Conditions

Exit Conditions

Step Budget (Dynamic)

Memory Scope

Step History Compaction

Guardrails

1. Deduplication

2. Loop Detection

3. Cross-Channel Send Protection

4. Autonomy Delivery Policy

Termination Review

Transparency Nudges

Special Loop Modes

Heartbeat Loop

Time Capsule Mode

Lean Mode

Debugging the Loop

Performance Notes

Further Reading

Get Started

Core Concepts

Guides

Advanced

Documentation Index

​What is ReAct?

​The Loop Implementation

​Step-by-Step Example

​Step 0: Pre-Task Simulation

​Step 1: Reasoning

​Step 2: Re-Reasoning

​Step 3: Completion

​Loop Mechanics

​Entry Conditions

​Exit Conditions

​Step Budget (Dynamic)

​Memory Scope

​Step History Compaction

​Guardrails

​1. Deduplication

​2. Loop Detection

​3. Cross-Channel Send Protection

​4. Autonomy Delivery Policy

​Termination Review

​Transparency Nudges

​Special Loop Modes

​Heartbeat Loop

​Time Capsule Mode

​Lean Mode

​Debugging the Loop

​Performance Notes

​Further Reading

What is ReAct?

The Loop Implementation

Step-by-Step Example

Step 0: Pre-Task Simulation

Step 1: Reasoning

Step 2: Re-Reasoning

Step 3: Completion

Loop Mechanics

Entry Conditions

Exit Conditions

Step Budget (Dynamic)

Memory Scope

Step History Compaction

Guardrails

1. Deduplication

2. Loop Detection

3. Cross-Channel Send Protection

4. Autonomy Delivery Policy

Termination Review

Transparency Nudges

Special Loop Modes

Heartbeat Loop

Time Capsule Mode

Lean Mode

Debugging the Loop

Performance Notes

Further Reading