feat: add Tavily Extract as parallel scraping option alongside Firecrawl by tavily-integrations · Pull Request #29 · federicodeponte/opendraft

tavily-integrations · 2026-04-28T12:24:45Z

Summary

Adds Tavily Extract as a configurable page content extraction option alongside the existing Firecrawl provider. When TAVILY_API_KEY is set, Tavily Extract is tried first for page scraping, with Firecrawl as secondary fallback and the simple requests-based scraper as tertiary fallback.

Why: Tavily Extract provides an alternative content extraction API that can improve reliability and reduce dependency on a single scraping provider.

Changes

engine/utils/tavily_extract_client.py (new): TavilyExtractClient wrapping TavilyClient.extract(), returning the same {success, content, url, metadata} dict as FirecrawlClient.scrape_url()
engine/utils/fallback_services.py: Added scrape_page_with_tavily() function; updated scrape_page_with_firecrawl() to attempt Tavily Extract first when TAVILY_API_KEY is set
engine/requirements.txt: Added tavily-python>=0.5.0
.env.example: Documented TAVILY_API_KEY (also enables Extract)

Dependency changes

Added tavily-python>=0.5.0 to engine/requirements.txt

Environment variable changes

TAVILY_API_KEY: When set, enables Tavily Extract as the primary page scraping method (also used for web search if that migration unit is applied)

Notes for reviewers

Firecrawl and FIRECRAWL_API_KEY remain fully functional and unchanged
Tavily Extract only activates when TAVILY_API_KEY is present in the environment
The fallback chain is: Tavily Extract → Firecrawl → simple requests scraper
TavilyExtractClient gracefully handles missing tavily-python package via ImportError

Automated Review

Passed after 1 attempt(s)
Final review: The implementation correctly adds Tavily Extract as a primary page-scraping option with Firecrawl as fallback. The new TavilyExtractClient is well-structured, returns a drop-in compatible response dict, handles all error cases (empty results, failed_results, exceptions), and the fallback chain in scrape_page_with_firecrawl is implemented cleanly. All four planned files are changed, the tavily-python dependency is pinned correctly, and TAVILY_API_KEY is documented in .env.example. Note: the migration plan stated TAVILY_API_KEY would already be in .env.example from the prerequisite unit, but neither that unit's PR description nor the git diff support this — this unit adds it correctly regardless. Only minor issues found; no critical or major blockers.

feat: add Tavily Extract as parallel page-scraping option

aa9544f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add Tavily Extract as parallel scraping option alongside Firecrawl#29

feat: add Tavily Extract as parallel scraping option alongside Firecrawl#29
tavily-integrations wants to merge 1 commit into
federicodeponte:masterfrom
Tavily-FDE:feat/tavily-migration/add-tavily-extract-scraping

tavily-integrations commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tavily-integrations commented Apr 28, 2026

Summary

Changes

Dependency changes

Environment variable changes

Notes for reviewers

Automated Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant