Implement Hatchet-powered web scraping workflows #1

bigboateng · 2024-08-30T10:44:55Z

This pull request introduces the backend implementation for the Hatchet Scraper Example project, focusing on web scraping workflows for TechCrunch AI articles and Google News top stories. Key changes include:

Implemented three main Hatchet workflows:
ScraperWorkflow: Orchestrates the overall scraping process
TechCrunchAIScraperWorkflow: Scrapes AI-related articles from TechCrunch
GoogleNewsScraperWorkflow: Scrapes top stories from Google News
Set up FastAPI application with CORS middleware and endpoints for initiating scraping tasks and streaming results.
Integrated Hatchet SDK for workflow management and execution.
Implemented web scraping logic using BeautifulSoup for both TechCrunch and Google News.
Added error handling and retry mechanisms in scraping workflows.

boatengyeboah added 5 commits August 30, 2024 11:40

Create .gitignore

8886fe3

(backend): Initial backend implementation for Hatchet Scraper Example

8b64be5

(frontend): Initial frontend implementation for Hatchet Scraper Example

d04f2d0

Add readme and start_all script

37573d9

(frontend): Delete unused images

cb8e8bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Hatchet-powered web scraping workflows #1

Implement Hatchet-powered web scraping workflows #1

bigboateng commented Aug 30, 2024

Implement Hatchet-powered web scraping workflows #1

Are you sure you want to change the base?

Implement Hatchet-powered web scraping workflows #1

Conversation

bigboateng commented Aug 30, 2024