Browser Automation

OrcBot’s browser automation system provides production-ready web scraping and interaction capabilities with built-in anti-bot detection, CAPTCHA solving, and vision-based navigation.

Prefer high-level skills first. Always try web_search, http_fetch, or extract_article before using low-level browser automation. Browser operations are slower and more resource-intensive.

Architecture

Browser Engines

OrcBot supports three browser engines:

Puppeteer (default) - Fast, reliable Chromium automation
Playwright - Advanced Chromium with better stealth
Lightpanda - Lightweight headless browser (experimental)

Switch engines with switch_browser_engine.

Stealth Features

All browser operations include:

--disable-blink-features=AutomationControlled flag removal
Realistic user agent strings (desktop or mobile)
Persistent profiles with cookies and localStorage
Resource blocking (ads, trackers, analytics)
Console log capture for debugging
Blank page detection and recovery

Persistent Profiles

Browser profiles are stored in ~/.orcbot/browser-profiles/ (Playwright) or ~/.orcbot/puppeteer-profiles/ (Puppeteer). Profiles persist:

Cookies and session storage
localStorage data
Browser history
Cached resources

Use switch_browser_profile to manage multiple identities.

browser_navigate

Navigate to a URL and extract semantic snapshot. See Web Search for full documentation.

browser_examine_page

Get a semantic snapshot of the current page without navigating. Parameters: None. Return Value: Semantic snapshot identical to browser_navigate output, but for the current page. Example Usage:

{
  "skill": "browser_examine_page",
  "args": {}
}

Metadata:

isDeep: true
isResearch: false

Use browser_examine_page after clicking links or submitting forms instead of navigating again. It’s instant.

browser_back

Navigate back to the previous page in browser history. Parameters: None. Return Value: Confirmation message and new page URL. Example Usage:

{
  "skill": "browser_back",
  "args": {}
}

Interaction Skills

browser_click

Click an element by CSS selector or reference number. Parameters:

selector_or_ref

string | number

required

CSS selector (e.g., "#submit-btn") or reference number from snapshot (e.g., 3)

Return Value: Confirmation message with element details. Example Usage:

{
  "skill": "browser_click",
  "args": {
    "selector_or_ref": "#login-button"
  }
}

Using References: The semantic snapshot includes numbered references for interactive elements:

--- INTERACTIVE ELEMENTS ---
[0] button: "Login" (id=login-btn)
[1] link: "Forgot password?" (href=/reset)
[2] input: email (name=email)

Click by reference:

{
  "skill": "browser_click",
  "args": {
    "selector_or_ref": 0
  }
}

browser_type

Type text into an input field. Parameters:

selector_or_ref

string | number

required

CSS selector or reference number for the input element

text

string

required

Text to type

slow

boolean

default:"false"

Use slow typing (100ms delay per character) to avoid bot detection

Example Usage:

{
  "skill": "browser_type",
  "args": {
    "selector_or_ref": "input[name='username']",
    "text": "user@example.com",
    "slow": true
  }
}

browser_press

Press a keyboard key or key combination. Parameters:

key

string

required

Key name or combination (e.g., "Enter", "Control+C", "Alt+Tab")

Supported Keys:

Single keys: Enter, Escape, Tab, Backspace, Delete, ArrowUp, etc.
Combinations: Control+C, Meta+V, Shift+Enter

Example Usage:

{
  "skill": "browser_press",
  "args": {
    "key": "Enter"
  }
}

browser_hover

Hover over an element to trigger menus or tooltips. Parameters:

selector

string

required

CSS selector for the element to hover

Example Usage:

{
  "skill": "browser_hover",
  "args": {
    "selector": ".dropdown-menu"
  }
}

browser_select

Select an option in a dropdown by visible label. Parameters:

selector

string

required

CSS selector for the <select> element

value

string

required

Visible label text of the option to select

Example Usage:

{
  "skill": "browser_select",
  "args": {
    "selector": "#country-dropdown",
    "value": "United States"
  }
}

browser_scroll

Scroll the page up or down. Parameters:

direction

string

required

"up" or "down"

amount

number

default:"500"

Pixels to scroll

Example Usage:

{
  "skill": "browser_scroll",
  "args": {
    "direction": "down",
    "amount": 1000
  }
}

Waiting & Timing

browser_wait

Wait for a specified duration. Parameters:

number

required

Milliseconds to wait

Example Usage:

{
  "skill": "browser_wait",
  "args": {
    "ms": 2000
  }
}

browser_wait_for

Wait for an element to appear on the page. Parameters:

selector

string

required

CSS selector to wait for

timeout

number

default:"30000"

Timeout in milliseconds

Example Usage:

{
  "skill": "browser_wait_for",
  "args": {
    "selector": ".search-results",
    "timeout": 10000
  }
}

Visual & Analysis Skills

browser_screenshot

Capture a screenshot of the current page. Parameters:

fullPage

boolean

default:"false"

Capture entire page or just viewport

Return Value: File path to the saved screenshot. Example Usage:

{
  "skill": "browser_screenshot",
  "args": {
    "fullPage": true
  }
}

Response Example:

Screenshot saved to: /home/user/.orcbot/downloads/screenshot_1704123456789.png

browser_vision

Analyze the current page using AI vision (GPT-4 Vision or Gemini). Parameters:

prompt

string

Optional question or instruction. Defaults to “Describe what you see on this page.”

Return Value: AI-generated description of the visual content. Example Usage:

{
  "skill": "browser_vision",
  "args": {
    "prompt": "What is the main call-to-action button on this page?"
  }
}

Response Example:

The main call-to-action button is a large blue button in the center of the page labeled "Start Free Trial". It features white text and a subtle shadow effect. Below the button is small gray text reading "No credit card required".

browser_solve_captcha

Attempt to solve a detected CAPTCHA automatically. Parameters: None. Requires captchaApiKey (2captcha) in config. Return Value: Success or error message. Example Usage:

{
  "skill": "browser_solve_captcha",
  "args": {}
}

CAPTCHA solving requires an API key from 2captcha.com and can take 20-60 seconds.

Advanced Skills

browser_run_js

Execute custom JavaScript on the current page. Parameters:

script

string

required

JavaScript code to execute. Returns the result of the expression.

Return Value: JSON-serialized result of the script. Example Usage:

{
  "skill": "browser_run_js",
  "args": {
    "script": "document.querySelectorAll('a').length"
  }
}

Response Example:

Script result: 47

Complex Example:

{
  "skill": "browser_run_js",
  "args": {
    "script": "Array.from(document.querySelectorAll('h2')).map(h => h.textContent)"
  }
}

browser_run_script

Execute custom Puppeteer/Playwright code with access to page and browser objects. (Admin only) Parameters:

code

string

required

JavaScript code with access to page and browser variables

Return Value: Script output or error. Example Usage:

{
  "skill": "browser_run_script",
  "args": {
    "code": "await page.setViewport({ width: 1920, height: 1080 }); return 'Viewport updated';"
  }
}

browser_run_script is restricted to admin users only. It provides full access to the browser automation API.

Profile & Engine Management

switch_browser_profile

Switch to a different persistent browser profile. Parameters:

profileName

string

required

Name of the profile to switch to (creates if doesn’t exist)

profileDir

string

Optional custom directory for profiles

Example Usage:

{
  "skill": "switch_browser_profile",
  "args": {
    "profileName": "work"
  }
}

Use Cases:

Manage multiple logged-in sessions
Isolate scraping tasks
Test with different browser states
Bypass rate limits (different cookies/fingerprints)

switch_browser_engine

Switch between Puppeteer, Playwright, and Lightpanda. Parameters:

engine

string

required

"puppeteer", "playwright", or "lightpanda"

endpoint

string

For Lightpanda: CDP endpoint (default: ws://127.0.0.1:9222)

Example Usage:

{
  "skill": "switch_browser_engine",
  "args": {
    "engine": "playwright"
  }
}

Mobile Viewport

Switch between desktop and mobile viewports:

{
  "skill": "set_viewport_mode",
  "args": {
    "mode": "mobile"
  }
}

Mobile viewport specs:

Size: 375x812 (iPhone 13)
User agent: iOS Safari 16.6
Device scale factor: 2x
Touch events enabled

State Tracking

The browser maintains state across skills:

Last navigated URL: Tracks the current page
Blank page counter: Auto-recovers from blank pages (max 3 attempts)
Console logs: Captures JavaScript errors and warnings
Intercepted APIs: Records XHR/fetch requests (when enabled)

Circuit Breaker Pattern

The browser implements automatic loop prevention:

Tracks consecutive blank page loads per domain
After 3 blank pages from the same domain, switches to headful mode
Clears counter after successful navigation

Best Practices

Navigation strategy for reliability:

Try web_search or http_fetch first
If you must use the browser, start with browser_navigate
Use references (numbered elements) instead of brittle CSS selectors
Always call browser_wait_for before interacting with dynamic content
Use browser_examine_page after interactions instead of re-navigating

Avoid these common mistakes:

Don’t navigate to the same URL twice in one action
Don’t use browser_click + browser_navigate — just click and examine
Don’t guess selectors — read the snapshot first
Don’t skip browser_wait_for on SPAs (React, Vue, Angular)

Troubleshooting

”Element not found”

Cause: Selector doesn’t match or element hasn’t loaded yet
Fix: Use browser_examine_page to verify the selector, then browser_wait_for before clicking

”Blank page detected”

Cause: Site blocked automation or JavaScript failed to render
Fix: Retry with headless: false or switch to a fresh profile

Cause: Site detected automation
Fix: Configure captchaApiKey or use switch_browser_profile to a clean profile

”Browser crashed”

Cause: Out of memory or GPU issues
Fix: Check browser args in config, disable extensions, or restart OrcBot

Web Search

High-level search and navigation

Shell Execution

Execute system commands and scripts

CLI Commands

Core Skills

Configuration

Documentation Index

​Browser Automation

​Architecture

​Browser Engines

​Stealth Features

​Persistent Profiles

​Core Navigation Skills

​browser_navigate

​browser_examine_page

​browser_back

​Interaction Skills

​browser_click

​browser_type

​browser_press

​browser_hover

​browser_select

​browser_scroll

​Waiting & Timing

​browser_wait

​browser_wait_for

​Visual & Analysis Skills

​browser_screenshot

​browser_vision

​browser_solve_captcha

​Advanced Skills

​browser_run_js

​browser_run_script

​Profile & Engine Management

​switch_browser_profile

​switch_browser_engine

​Mobile Viewport

​State Tracking

​Circuit Breaker Pattern

​Best Practices

​Troubleshooting

​”Element not found”

​”Blank page detected”

​”CAPTCHA blocked navigation”

​”Browser crashed”

​Related Skills

Web Search

Shell Execution

Browser Automation

Architecture

Browser Engines

Stealth Features

Persistent Profiles

Core Navigation Skills

browser_navigate

browser_examine_page

browser_back

Interaction Skills

browser_click

browser_type

browser_press

browser_hover

browser_select

browser_scroll

Waiting & Timing

browser_wait

browser_wait_for

Visual & Analysis Skills

browser_screenshot

browser_vision

browser_solve_captcha

Advanced Skills

browser_run_js

browser_run_script

Profile & Engine Management

switch_browser_profile

switch_browser_engine

Mobile Viewport

State Tracking

Circuit Breaker Pattern

Best Practices

Troubleshooting

”Element not found”

”Blank page detected”

”CAPTCHA blocked navigation”

”Browser crashed”

Related Skills