Write a Scraper as a State Machine with Playwright and Kafka
Talk virtual playwright typescript xstate
Most web scrapers are fragile scripts held together by setTimeout and hope. I showed how to model a Glassdoor scraper as a finite state machine instead — starting from the state diagram, then implementing it with Playwright, XState, and KafkaJS in a pnpm monorepo.
The architecture treats each scraping step as a state transition: Playwright handles browser automation, XState enforces valid navigation paths, and KafkaJS streams results between loosely coordinated microservices. Retry logic uses bounded exponential backoff, so transient failures don’t cascade.