Web Search Skills

OrcBot’s web search skills provide resilient, production-ready web research capabilities with automatic fallback across multiple search providers. The system prioritizes API-based search for speed and reliability, then falls back to browser-based search when API keys aren’t configured.

web_search

Search the web using multiple engines with automatic fallback. This is the preferred high-level skill for web research.

Parameters

query

string

required

Search query. Can be natural language or keyword-based.

Return Value

results

string

Formatted search results with titles, URLs, and snippets. Returns up to 10 results ranked by relevance.

Fallback Chain

The search system tries providers in this order (configurable via searchProviderOrder):

Serper API - Fast, structured results (requires serperApiKey)
Brave Search API - Privacy-focused (requires braveSearchApiKey)
SearXNG - Self-hosted metasearch (requires searxngUrl)
Google - Browser-based fallback
Bing - Browser-based fallback
DuckDuckGo - Browser-based fallback

If no API keys are configured, the system automatically uses browser-based search. This is slower but requires no external services.

Example Usage

{
  "skill": "web_search",
  "args": {
    "query": "latest GPT-4 capabilities 2025"
  }
}

Response Example

Search results for "latest GPT-4 capabilities 2025":

1. OpenAI GPT-4 Technical Report 2025
   https://openai.com/research/gpt-4-2025
   OpenAI's GPT-4 now supports advanced reasoning, multimodal inputs including video, and extended context windows up to 128K tokens...

2. GPT-4 vs GPT-3.5: What's New in 2025
   https://aiblog.example.com/gpt4-updates
   The latest GPT-4 update brings significant improvements in coding, mathematics, and creative writing...

[8 more results]

Metadata

isDeep: true - Counts as substantive progress
isResearch: true - Higher repetition budget (up to 15 calls)
isSideEffect: false - No deduplication

Always try web_search before browser automation. It’s 10-50x faster than full browser navigation and uses fewer resources.

browser_navigate

Navigate to a URL and extract a semantic snapshot of the page content. Use when you need to read a specific page after finding it via search.

Parameters

url

string

required

Full URL to navigate to (must include protocol: https:// or http://)

headless

boolean

default:"true"

Run browser in headless mode. Set to false for sites with anti-bot detection.

Return Value

snapshot

string

Semantic snapshot of the page including:

Page title and URL
Main text content extracted from semantic HTML
Interactive elements (buttons, links, forms) with references
Console logs and navigation state

Stealth Mode

browser_navigate uses anti-bot detection evasion:

--disable-blink-features=AutomationControlled
Realistic user agent strings
Persistent browser profiles with cookies and localStorage
Human-like timing and behavior
Optional CAPTCHA solving (via 2captcha if captchaApiKey is configured)

Some sites require visible browser mode (headless: false) to render correctly. If you get blank pages or CAPTCHAs, retry with headless: false.

Example Usage

{
  "skill": "browser_navigate",
  "args": {
    "url": "https://news.ycombinator.com",
    "headless": true
  }
}

Response Example

=== PAGE SNAPSHOT ===
URL: https://news.ycombinator.com/
Title: Hacker News

--- TEXT CONTENT ---

Hacker News
New | Past | Comments | Ask | Show | Jobs | Submit

1. GPT-5 Training Begins (openai.com)
   482 points by user123 4 hours ago | 203 comments

2. Rust 1.75 Released with Async Traits (rust-lang.org)
   312 points by rustacean 2 hours ago | 87 comments

[... more content ...]

--- INTERACTIVE ELEMENTS ---
[0] link: "New" (href=/newest)
[1] link: "Past" (href=/front)
[2] link: "Comments" (href=/newcomments)
[3] button: "Submit" (id=submit-btn)

--- STATE ---
Last navigated: https://news.ycombinator.com/
Blank pages: 0/3
Console logs: (none)

Vision Fallback

If the semantic snapshot is thin (< 500 chars) and a vision analyzer is configured, the system automatically:

Captures a screenshot
Analyzes it with GPT-4 Vision or Gemini
Returns a visual description alongside the semantic content

This handles canvas-heavy UIs, image galleries, and custom components.

Metadata

isDeep: true
isResearch: true
isSideEffect: false

http_fetch

Lightweight HTTP request without launching a browser. Use this before browser_navigate for APIs and simple pages.

Parameters

url

string

required

URL to fetch

method

string

default:"GET"

HTTP method: GET, POST, PUT, PATCH, or DELETE

headers

object

Custom HTTP headers as key-value pairs

body

string | object

Request body for POST/PUT/PATCH. Auto-stringifies objects to JSON.

timeout

number

default:"30000"

Timeout in milliseconds

Return Value

result

string

Response status and body:

HTTP 200 OK
Content-Type: application/json

{"status": "success", "data": [...]}

Example Usage

{
  "skill": "http_fetch",
  "args": {
    "url": "https://api.github.com/repos/torvalds/linux",
    "method": "GET",
    "headers": {
      "Accept": "application/vnd.github.v3+json"
    }
  }
}

Metadata

isDeep: true
isResearch: false

Prefer http_fetch for APIs and JSON endpoints. It’s 100x faster than browser navigation and doesn’t require rendering.

extract_article

Extract clean article text from a URL using Mozilla Readability. Strips ads, navigation, and clutter.

Parameters

url

string

URL to extract. If omitted, extracts from the current browser page.

Return Value

article

string

Clean article text with:

Title
Author (if available)
Publication date (if available)
Main content without ads or navigation

Example Usage

{
  "skill": "extract_article",
  "args": {
    "url": "https://arstechnica.com/science/2025/01/ai-breakthrough/"
  }
}

Response Example

=== ARTICLE ===
Title: Breakthrough in AI Reasoning: GPT-5 Passes Advanced Math Olympiad
Author: John Doe
Date: January 15, 2025

OpenAI announced today that GPT-5 has successfully solved all problems from the 2024 International Mathematical Olympiad...

[Clean article text continues...]

Metadata

isDeep: true
isResearch: false

extract_article reuses the shared browser instance, so it’s fast if you just navigated to the page.

download_file

Download a file from the web to the agent’s local storage.

Parameters

url

string

required

URL of the file to download

filename

string

Optional filename. If omitted, infers from URL path or Content-Type.

Return Value

filePath

string

Absolute path to the downloaded file and size in KB

Limits

Max file size: 50 MB (enforced via streaming)
Timeout: 60 seconds
Storage location: ~/.orcbot/downloads/

MIME → Extension Inference

If the URL has no file extension, the system infers it from Content-Type:

Content-Type	Extension
`image/jpeg`	`.jpg`
`image/png`	`.png`
`application/pdf`	`.pdf`
`application/zip`	`.zip`
`audio/mpeg`	`.mp3`
`video/mp4`	`.mp4`
`text/csv`	`.csv`

Example Usage

{
  "skill": "download_file",
  "args": {
    "url": "https://example.com/report.pdf"
  }
}

Response Example

File downloaded successfully to: /home/user/.orcbot/downloads/report.pdf (342.5 KB)

Error Handling

Timeout: Error: Download timed out after 60s
Too large: File too large: 75.2 MB (max 50 MB)
HTTP error: HTTP error! status: 404

Metadata

isDeep: false
isResearch: false

Best Practices

Search strategy for reliable results:

Try web_search with a focused query
If you need to read a specific result, use http_fetch (if it’s an API) or extract_article (if it’s an article)
Only use browser_navigate for complex pages with JavaScript or forms

Avoid repeated navigation to the same URL. The browser maintains state, so if you just navigated to a page, use browser_examine_page instead of navigating again.

Search results are cached for 5 minutes. Identical queries within the cache window return instant results.

Troubleshooting

”No search results found”

Cause: All providers failed or returned no results
Fix: Try a simpler, more general query. Avoid quotes unless needed.

”Browser returned blank page”

Cause: Site requires JavaScript or detects automation
Fix: Retry with headless: false or use switch_browser_profile to a fresh profile

”CAPTCHA detected”

Cause: Site has bot protection
Fix: Configure captchaApiKey (2captcha) or use browser_solve_captcha manually

”Download timed out”

Cause: File too large or slow network
Fix: Check file size via http_fetch first, or use a download manager

Browser Tooling

Low-level browser automation for complex interactions

File Operations

Read and process downloaded files

CLI Commands

Core Skills

Configuration

Documentation Index

​Web Search Skills

​web_search

​Parameters

​Return Value

​Fallback Chain

​Example Usage

​Response Example

​Metadata

​browser_navigate

​Parameters

​Return Value

​Stealth Mode

​Example Usage

​Response Example

​Vision Fallback

​Metadata

​http_fetch

​Parameters

​Return Value

​Example Usage

​Metadata

​extract_article

​Parameters

​Return Value

​Example Usage

​Response Example

​Metadata

​download_file

​Parameters

​Return Value

​Limits

​MIME → Extension Inference

​Example Usage

​Response Example

​Error Handling

​Metadata

​Best Practices

​Troubleshooting

​”No search results found”

​”Browser returned blank page”

​”CAPTCHA detected”

​”Download timed out”

​Related Skills

Browser Tooling

File Operations

Web Search Skills

web_search

Parameters

Return Value

Fallback Chain

Example Usage

Response Example

Metadata

browser_navigate

Parameters

Return Value

Stealth Mode

Example Usage

Response Example

Vision Fallback

Metadata

http_fetch

Parameters

Return Value

Example Usage

Metadata

extract_article

Parameters

Return Value

Example Usage

Response Example

Metadata

download_file

Parameters

Return Value

Limits

MIME → Extension Inference

Example Usage

Response Example

Error Handling

Metadata

Best Practices

Troubleshooting

”No search results found”

”Browser returned blank page”

”CAPTCHA detected”

”Download timed out”

Related Skills