Migration Guide: Crawl4AI to Firecrawl
This guide helps you migrate from the Crawl4AI-based web extraction to the new Firecrawl implementation.
Why Switch to Firecrawl?
- Simpler API: Firecrawl provides a cleaner, more intuitive API
- Better reliability: Built-in retry logic and error handling
- No browser context issues: Firecrawl manages browser instances internally
- Optimized for LLMs: Returns clean markdown by default
Key Changes
API Changes
The extract_urls method signature has changed:
Old (Crawl4AI):
await web_tool.extract_urls(
urls,
extraction_type="markdown", # or "structured", "css", "regex"
schema=None,
css_selector=None,
regex_patterns=None,
enable_virtual_scroll=False,
enable_pdf=False,
js_code=None,
wait_for=None
)New (Firecrawl):
await web_tool.extract_urls(
urls,
formats=["markdown", "html"],
only_main_content=True,
include_tags=None,
exclude_tags=None,
wait_for_selector=None,
timeout=30000
)Parameter Mapping
| Crawl4AI Parameter | Firecrawl Equivalent | Notes |
|---|---|---|
extraction_type | formats | Now accepts list of formats |
css_selector | include_tags | Use HTML tags instead of CSS selectors |
enable_virtual_scroll | N/A | Firecrawl handles scrolling automatically |
js_code | N/A | Not directly supported |
wait_for | wait_for_selector | Same functionality |
| N/A | only_main_content | New feature to exclude navigation, ads |
| N/A | exclude_tags | New feature to exclude specific tags |
Environment Setup
You need to set up a Firecrawl API key:
# Add to your .env file
FIRECRAWL_API_KEY=your-api-key-hereGet your API key from: https://www.firecrawl.dev/app/sign-in
Code Examples
Basic extraction:
# Old way
result = await web_tool.extract_urls(url, extraction_type="markdown")
# New way
result = await web_tool.extract_urls(url, formats=["markdown"])Targeted extraction:
# Old way
result = await web_tool.extract_urls(
url,
extraction_type="css",
css_selector=".article-content"
)
# New way
result = await web_tool.extract_urls(
url,
include_tags=["article", "main"],
only_main_content=True
)Waiting for dynamic content:
# Old way
result = await web_tool.extract_urls(
url,
wait_for=".content-loaded"
)
# New way
result = await web_tool.extract_urls(
url,
wait_for_selector=".content-loaded"
)Backward Compatibility
The old web_crawl4ai.py file has been preserved as a backup. If you need to temporarily revert:
# Temporarily use the old implementation
from vibex.builtin_tools.web_crawl4ai import WebToolHowever, we recommend migrating to Firecrawl as soon as possible for better reliability and performance.
Troubleshooting
- 401 Unauthorized errors: Make sure your
FIRECRAWL_API_KEYis set correctly - Timeout errors: Increase the
timeoutparameter (default is 30000ms) - Missing content: Try setting
only_main_content=Falseto get full page content - Different output format: Firecrawl returns cleaner markdown by default, which may differ from Crawl4AI’s output