Jun 28, 20264 min readDev Soufiane

How RecipeScrape Runs on Autopilot: CI/CD, Cron, and Zero-Touch Deployments

The full automation stack behind RecipeScrape — from daily scrapes on GitHub Actions to zero-downtime Vercel deployments.

RecipeScrape runs without any human operators. No one SSHes into a server. No one manually runs scrapers. No one deploys by hand. Here's how the automation works end to end.

Daily Scrape Pipeline (GitHub Actions)

Every day at 3:00 AM UTC, a GitHub Actions workflow kicks off the scraper:

yaml

name: daily-scrape
on:
  schedule:
    - cron: "0 3 * * *"
  workflow_dispatch:  # manual trigger for testing

jobs:
  scrape:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v3
      - run: uv sync
      - run: uv run python -m scrape_engine
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}

The scraper runs each spider sequentially per domain (with concurrency capped internally), upserts new recipes, and records a scrape_run audit entry. If a spider fails (e.g., the site changed its HTML structure), the others continue unaffected. The workflow is also dispatchable so we can trigger rescans from the GitHub UI.

Automated Tests

Before any scraper change reaches production, a separate CI workflow runs:

1. Fixture-based tests — Each spider has a saved HTML fixture from its target site. Tests verify that the parser still extracts the expected number of recipes, ingredients, and nutrition fields. 2. Schema validation — Normalized output is validated against a Pydantic model. If a field has the wrong type, the test fails. 3. Integration smoke test — A single recipe is fetched from the live database to confirm the connection is healthy.

Frontend Deployments (Vercel + GitHub)

The frontend repo deploys automatically on every push to main. Vercel's GitHub integration builds the Next.js app, runs next build, and promotes the preview to production if all checks pass. Drizzle schema changes are read-only — the frontend never writes to the database.

Monitoring Without Dashboards

We use a lightweight check: a GitHub Actions workflow hits the /api/v1/health endpoint every 30 minutes and sends a notification if the status isn't ok or if the recipe count dropped by more than 5%. No Datadog, no Grafana — just a cron job, a webhook, and a phone notification.

The Result

The system has been running for months with zero manual intervention. Recipes are refreshed daily, the frontend updates automatically, and the only time we touch a terminal is when we're adding a new source site or fixing a parser for a site redesign.