Skip to main content

Web Search Skills

OrcBot’s web search skills provide resilient, production-ready web research capabilities with automatic fallback across multiple search providers. The system prioritizes API-based search for speed and reliability, then falls back to browser-based search when API keys aren’t configured. Search the web using multiple engines with automatic fallback. This is the preferred high-level skill for web research.

Parameters

query
string
required
Search query. Can be natural language or keyword-based.

Return Value

results
string
Formatted search results with titles, URLs, and snippets. Returns up to 10 results ranked by relevance.

Fallback Chain

The search system tries providers in this order (configurable via searchProviderOrder):
  1. Serper API - Fast, structured results (requires serperApiKey)
  2. Brave Search API - Privacy-focused (requires braveSearchApiKey)
  3. SearXNG - Self-hosted metasearch (requires searxngUrl)
  4. Google - Browser-based fallback
  5. Bing - Browser-based fallback
  6. DuckDuckGo - Browser-based fallback
If no API keys are configured, the system automatically uses browser-based search. This is slower but requires no external services.

Example Usage

{
  "skill": "web_search",
  "args": {
    "query": "latest GPT-4 capabilities 2025"
  }
}

Response Example

Search results for "latest GPT-4 capabilities 2025":

1. OpenAI GPT-4 Technical Report 2025
   https://openai.com/research/gpt-4-2025
   OpenAI's GPT-4 now supports advanced reasoning, multimodal inputs including video, and extended context windows up to 128K tokens...

2. GPT-4 vs GPT-3.5: What's New in 2025
   https://aiblog.example.com/gpt4-updates
   The latest GPT-4 update brings significant improvements in coding, mathematics, and creative writing...

[8 more results]

Metadata

  • isDeep: true - Counts as substantive progress
  • isResearch: true - Higher repetition budget (up to 15 calls)
  • isSideEffect: false - No deduplication
Always try web_search before browser automation. It’s 10-50x faster than full browser navigation and uses fewer resources.

browser_navigate

Navigate to a URL and extract a semantic snapshot of the page content. Use when you need to read a specific page after finding it via search.

Parameters

url
string
required
Full URL to navigate to (must include protocol: https:// or http://)
headless
boolean
default:"true"
Run browser in headless mode. Set to false for sites with anti-bot detection.

Return Value

snapshot
string
Semantic snapshot of the page including:
  • Page title and URL
  • Main text content extracted from semantic HTML
  • Interactive elements (buttons, links, forms) with references
  • Console logs and navigation state

Stealth Mode

browser_navigate uses anti-bot detection evasion:
  • --disable-blink-features=AutomationControlled
  • Realistic user agent strings
  • Persistent browser profiles with cookies and localStorage
  • Human-like timing and behavior
  • Optional CAPTCHA solving (via 2captcha if captchaApiKey is configured)
Some sites require visible browser mode (headless: false) to render correctly. If you get blank pages or CAPTCHAs, retry with headless: false.

Example Usage

{
  "skill": "browser_navigate",
  "args": {
    "url": "https://news.ycombinator.com",
    "headless": true
  }
}

Response Example

=== PAGE SNAPSHOT ===
URL: https://news.ycombinator.com/
Title: Hacker News

--- TEXT CONTENT ---

Hacker News
New | Past | Comments | Ask | Show | Jobs | Submit

1. GPT-5 Training Begins (openai.com)
   482 points by user123 4 hours ago | 203 comments

2. Rust 1.75 Released with Async Traits (rust-lang.org)
   312 points by rustacean 2 hours ago | 87 comments

[... more content ...]

--- INTERACTIVE ELEMENTS ---
[0] link: "New" (href=/newest)
[1] link: "Past" (href=/front)
[2] link: "Comments" (href=/newcomments)
[3] button: "Submit" (id=submit-btn)

--- STATE ---
Last navigated: https://news.ycombinator.com/
Blank pages: 0/3
Console logs: (none)

Vision Fallback

If the semantic snapshot is thin (< 500 chars) and a vision analyzer is configured, the system automatically:
  1. Captures a screenshot
  2. Analyzes it with GPT-4 Vision or Gemini
  3. Returns a visual description alongside the semantic content
This handles canvas-heavy UIs, image galleries, and custom components.

Metadata

  • isDeep: true
  • isResearch: true
  • isSideEffect: false

http_fetch

Lightweight HTTP request without launching a browser. Use this before browser_navigate for APIs and simple pages.

Parameters

url
string
required
URL to fetch
method
string
default:"GET"
HTTP method: GET, POST, PUT, PATCH, or DELETE
headers
object
Custom HTTP headers as key-value pairs
body
string | object
Request body for POST/PUT/PATCH. Auto-stringifies objects to JSON.
timeout
number
default:"30000"
Timeout in milliseconds

Return Value

result
string
Response status and body:
HTTP 200 OK
Content-Type: application/json

{"status": "success", "data": [...]}

Example Usage

{
  "skill": "http_fetch",
  "args": {
    "url": "https://api.github.com/repos/torvalds/linux",
    "method": "GET",
    "headers": {
      "Accept": "application/vnd.github.v3+json"
    }
  }
}

Metadata

  • isDeep: true
  • isResearch: false
Prefer http_fetch for APIs and JSON endpoints. It’s 100x faster than browser navigation and doesn’t require rendering.

extract_article

Extract clean article text from a URL using Mozilla Readability. Strips ads, navigation, and clutter.

Parameters

url
string
URL to extract. If omitted, extracts from the current browser page.

Return Value

article
string
Clean article text with:
  • Title
  • Author (if available)
  • Publication date (if available)
  • Main content without ads or navigation

Example Usage

{
  "skill": "extract_article",
  "args": {
    "url": "https://arstechnica.com/science/2025/01/ai-breakthrough/"
  }
}

Response Example

=== ARTICLE ===
Title: Breakthrough in AI Reasoning: GPT-5 Passes Advanced Math Olympiad
Author: John Doe
Date: January 15, 2025

OpenAI announced today that GPT-5 has successfully solved all problems from the 2024 International Mathematical Olympiad...

[Clean article text continues...]

Metadata

  • isDeep: true
  • isResearch: false
extract_article reuses the shared browser instance, so it’s fast if you just navigated to the page.

download_file

Download a file from the web to the agent’s local storage.

Parameters

url
string
required
URL of the file to download
filename
string
Optional filename. If omitted, infers from URL path or Content-Type.

Return Value

filePath
string
Absolute path to the downloaded file and size in KB

Limits

  • Max file size: 50 MB (enforced via streaming)
  • Timeout: 60 seconds
  • Storage location: ~/.orcbot/downloads/

MIME → Extension Inference

If the URL has no file extension, the system infers it from Content-Type:
Content-TypeExtension
image/jpeg.jpg
image/png.png
application/pdf.pdf
application/zip.zip
audio/mpeg.mp3
video/mp4.mp4
text/csv.csv

Example Usage

{
  "skill": "download_file",
  "args": {
    "url": "https://example.com/report.pdf"
  }
}

Response Example

File downloaded successfully to: /home/user/.orcbot/downloads/report.pdf (342.5 KB)

Error Handling

  • Timeout: Error: Download timed out after 60s
  • Too large: File too large: 75.2 MB (max 50 MB)
  • HTTP error: HTTP error! status: 404

Metadata

  • isDeep: false
  • isResearch: false

Best Practices

Search strategy for reliable results:
  1. Try web_search with a focused query
  2. If you need to read a specific result, use http_fetch (if it’s an API) or extract_article (if it’s an article)
  3. Only use browser_navigate for complex pages with JavaScript or forms
Avoid repeated navigation to the same URL. The browser maintains state, so if you just navigated to a page, use browser_examine_page instead of navigating again.
Search results are cached for 5 minutes. Identical queries within the cache window return instant results.

Troubleshooting

”No search results found”

  • Cause: All providers failed or returned no results
  • Fix: Try a simpler, more general query. Avoid quotes unless needed.

”Browser returned blank page”

  • Cause: Site requires JavaScript or detects automation
  • Fix: Retry with headless: false or use switch_browser_profile to a fresh profile

”CAPTCHA detected”

  • Cause: Site has bot protection
  • Fix: Configure captchaApiKey (2captcha) or use browser_solve_captcha manually

”Download timed out”

  • Cause: File too large or slow network
  • Fix: Check file size via http_fetch first, or use a download manager