Skip to main content

Browser Automation

OrcBot’s browser automation system provides production-ready web scraping and interaction capabilities with built-in anti-bot detection, CAPTCHA solving, and vision-based navigation.
Prefer high-level skills first. Always try web_search, http_fetch, or extract_article before using low-level browser automation. Browser operations are slower and more resource-intensive.

Architecture

Browser Engines

OrcBot supports three browser engines:
  1. Puppeteer (default) - Fast, reliable Chromium automation
  2. Playwright - Advanced Chromium with better stealth
  3. Lightpanda - Lightweight headless browser (experimental)
Switch engines with switch_browser_engine.

Stealth Features

All browser operations include:
  • --disable-blink-features=AutomationControlled flag removal
  • Realistic user agent strings (desktop or mobile)
  • Persistent profiles with cookies and localStorage
  • Resource blocking (ads, trackers, analytics)
  • Console log capture for debugging
  • Blank page detection and recovery

Persistent Profiles

Browser profiles are stored in ~/.orcbot/browser-profiles/ (Playwright) or ~/.orcbot/puppeteer-profiles/ (Puppeteer). Profiles persist:
  • Cookies and session storage
  • localStorage data
  • Browser history
  • Cached resources
Use switch_browser_profile to manage multiple identities.

Core Navigation Skills

browser_navigate

Navigate to a URL and extract semantic snapshot. See Web Search for full documentation.

browser_examine_page

Get a semantic snapshot of the current page without navigating. Parameters: None. Return Value: Semantic snapshot identical to browser_navigate output, but for the current page. Example Usage:
{
  "skill": "browser_examine_page",
  "args": {}
}
Metadata:
  • isDeep: true
  • isResearch: false
Use browser_examine_page after clicking links or submitting forms instead of navigating again. It’s instant.

browser_back

Navigate back to the previous page in browser history. Parameters: None. Return Value: Confirmation message and new page URL. Example Usage:
{
  "skill": "browser_back",
  "args": {}
}

Interaction Skills

browser_click

Click an element by CSS selector or reference number. Parameters:
selector_or_ref
string | number
required
CSS selector (e.g., "#submit-btn") or reference number from snapshot (e.g., 3)
Return Value: Confirmation message with element details. Example Usage:
{
  "skill": "browser_click",
  "args": {
    "selector_or_ref": "#login-button"
  }
}
Using References: The semantic snapshot includes numbered references for interactive elements:
--- INTERACTIVE ELEMENTS ---
[0] button: "Login" (id=login-btn)
[1] link: "Forgot password?" (href=/reset)
[2] input: email (name=email)
Click by reference:
{
  "skill": "browser_click",
  "args": {
    "selector_or_ref": 0
  }
}

browser_type

Type text into an input field. Parameters:
selector_or_ref
string | number
required
CSS selector or reference number for the input element
text
string
required
Text to type
slow
boolean
default:"false"
Use slow typing (100ms delay per character) to avoid bot detection
Example Usage:
{
  "skill": "browser_type",
  "args": {
    "selector_or_ref": "input[name='username']",
    "text": "user@example.com",
    "slow": true
  }
}

browser_press

Press a keyboard key or key combination. Parameters:
key
string
required
Key name or combination (e.g., "Enter", "Control+C", "Alt+Tab")
Supported Keys:
  • Single keys: Enter, Escape, Tab, Backspace, Delete, ArrowUp, etc.
  • Combinations: Control+C, Meta+V, Shift+Enter
Example Usage:
{
  "skill": "browser_press",
  "args": {
    "key": "Enter"
  }
}

browser_hover

Hover over an element to trigger menus or tooltips. Parameters:
selector
string
required
CSS selector for the element to hover
Example Usage:
{
  "skill": "browser_hover",
  "args": {
    "selector": ".dropdown-menu"
  }
}

browser_select

Select an option in a dropdown by visible label. Parameters:
selector
string
required
CSS selector for the <select> element
value
string
required
Visible label text of the option to select
Example Usage:
{
  "skill": "browser_select",
  "args": {
    "selector": "#country-dropdown",
    "value": "United States"
  }
}

browser_scroll

Scroll the page up or down. Parameters:
direction
string
required
"up" or "down"
amount
number
default:"500"
Pixels to scroll
Example Usage:
{
  "skill": "browser_scroll",
  "args": {
    "direction": "down",
    "amount": 1000
  }
}

Waiting & Timing

browser_wait

Wait for a specified duration. Parameters:
ms
number
required
Milliseconds to wait
Example Usage:
{
  "skill": "browser_wait",
  "args": {
    "ms": 2000
  }
}

browser_wait_for

Wait for an element to appear on the page. Parameters:
selector
string
required
CSS selector to wait for
timeout
number
default:"30000"
Timeout in milliseconds
Example Usage:
{
  "skill": "browser_wait_for",
  "args": {
    "selector": ".search-results",
    "timeout": 10000
  }
}

Visual & Analysis Skills

browser_screenshot

Capture a screenshot of the current page. Parameters:
fullPage
boolean
default:"false"
Capture entire page or just viewport
Return Value: File path to the saved screenshot. Example Usage:
{
  "skill": "browser_screenshot",
  "args": {
    "fullPage": true
  }
}
Response Example:
Screenshot saved to: /home/user/.orcbot/downloads/screenshot_1704123456789.png

browser_vision

Analyze the current page using AI vision (GPT-4 Vision or Gemini). Parameters:
prompt
string
Optional question or instruction. Defaults to “Describe what you see on this page.”
Return Value: AI-generated description of the visual content. Example Usage:
{
  "skill": "browser_vision",
  "args": {
    "prompt": "What is the main call-to-action button on this page?"
  }
}
Response Example:
The main call-to-action button is a large blue button in the center of the page labeled "Start Free Trial". It features white text and a subtle shadow effect. Below the button is small gray text reading "No credit card required".

browser_solve_captcha

Attempt to solve a detected CAPTCHA automatically. Parameters: None. Requires captchaApiKey (2captcha) in config. Return Value: Success or error message. Example Usage:
{
  "skill": "browser_solve_captcha",
  "args": {}
}
CAPTCHA solving requires an API key from 2captcha.com and can take 20-60 seconds.

Advanced Skills

browser_run_js

Execute custom JavaScript on the current page. Parameters:
script
string
required
JavaScript code to execute. Returns the result of the expression.
Return Value: JSON-serialized result of the script. Example Usage:
{
  "skill": "browser_run_js",
  "args": {
    "script": "document.querySelectorAll('a').length"
  }
}
Response Example:
Script result: 47
Complex Example:
{
  "skill": "browser_run_js",
  "args": {
    "script": "Array.from(document.querySelectorAll('h2')).map(h => h.textContent)"
  }
}

browser_run_script

Execute custom Puppeteer/Playwright code with access to page and browser objects. (Admin only) Parameters:
code
string
required
JavaScript code with access to page and browser variables
Return Value: Script output or error. Example Usage:
{
  "skill": "browser_run_script",
  "args": {
    "code": "await page.setViewport({ width: 1920, height: 1080 }); return 'Viewport updated';"
  }
}
browser_run_script is restricted to admin users only. It provides full access to the browser automation API.

Profile & Engine Management

switch_browser_profile

Switch to a different persistent browser profile. Parameters:
profileName
string
required
Name of the profile to switch to (creates if doesn’t exist)
profileDir
string
Optional custom directory for profiles
Example Usage:
{
  "skill": "switch_browser_profile",
  "args": {
    "profileName": "work"
  }
}
Use Cases:
  • Manage multiple logged-in sessions
  • Isolate scraping tasks
  • Test with different browser states
  • Bypass rate limits (different cookies/fingerprints)

switch_browser_engine

Switch between Puppeteer, Playwright, and Lightpanda. Parameters:
engine
string
required
"puppeteer", "playwright", or "lightpanda"
endpoint
string
For Lightpanda: CDP endpoint (default: ws://127.0.0.1:9222)
Example Usage:
{
  "skill": "switch_browser_engine",
  "args": {
    "engine": "playwright"
  }
}

Mobile Viewport

Switch between desktop and mobile viewports:
{
  "skill": "set_viewport_mode",
  "args": {
    "mode": "mobile"
  }
}
Mobile viewport specs:
  • Size: 375x812 (iPhone 13)
  • User agent: iOS Safari 16.6
  • Device scale factor: 2x
  • Touch events enabled

State Tracking

The browser maintains state across skills:
  • Last navigated URL: Tracks the current page
  • Blank page counter: Auto-recovers from blank pages (max 3 attempts)
  • Console logs: Captures JavaScript errors and warnings
  • Intercepted APIs: Records XHR/fetch requests (when enabled)

Circuit Breaker Pattern

The browser implements automatic loop prevention:
  • Tracks consecutive blank page loads per domain
  • After 3 blank pages from the same domain, switches to headful mode
  • Clears counter after successful navigation

Best Practices

Navigation strategy for reliability:
  1. Try web_search or http_fetch first
  2. If you must use the browser, start with browser_navigate
  3. Use references (numbered elements) instead of brittle CSS selectors
  4. Always call browser_wait_for before interacting with dynamic content
  5. Use browser_examine_page after interactions instead of re-navigating
Avoid these common mistakes:
  • Don’t navigate to the same URL twice in one action
  • Don’t use browser_click + browser_navigate — just click and examine
  • Don’t guess selectors — read the snapshot first
  • Don’t skip browser_wait_for on SPAs (React, Vue, Angular)

Troubleshooting

”Element not found”

  • Cause: Selector doesn’t match or element hasn’t loaded yet
  • Fix: Use browser_examine_page to verify the selector, then browser_wait_for before clicking

”Blank page detected”

  • Cause: Site blocked automation or JavaScript failed to render
  • Fix: Retry with headless: false or switch to a fresh profile

”CAPTCHA blocked navigation”

  • Cause: Site detected automation
  • Fix: Configure captchaApiKey or use switch_browser_profile to a clean profile

”Browser crashed”

  • Cause: Out of memory or GPU issues
  • Fix: Check browser args in config, disable extensions, or restart OrcBot