Every Recipe, One API: Architecture of a Serverless Recipe Platform
How a Python scraper and a Next.js frontend share a single Neon PostgreSQL database to serve 7,000+ recipes through a unified API.
Recipe sites are scattered across the web — each with its own design, its own quirks, and its own walled garden of content. RecipeScrape solves this by building a bridge between those sites and developers who want to build cooking apps.
The architecture is deliberately simple: two independent projects, one shared database, zero HTTP servers.
The Two-Repo Split
recipescrape-scraper/ ← Python, uv. Scrapes & writes. No server.
recipescrape-web/ ← Next.js + Bun. Reads & serves. UI + API.The scraper is a pure CLI process. It has no web server, no open ports, nothing to deploy beyond a scheduled task. It discovers recipe URLs from sitemaps and category pages, feeds them to the recipe-scrapers library (which parses schema.org/Recipe ld+json markup), normalizes the data, and upserts it into the shared Neon database.
The web project is a Next.js app deployed on Vercel. It owns everything user-facing: the browsing UI, the recipe detail pages, and the public JSON API. It connects to Neon using Drizzle ORM over @neondatabase/serverless HTTP driver.
One Database, Two Ecosystems
The clever part is that both repos talk to the same Neon PostgreSQL database:
- Python side writes via SQLAlchemy async +
asyncpg - TypeScript side reads via Drizzle ORM +
@neondatabase/serverless
Schema ownership lives firmly in Python. Alembic manages all migrations. The Drizzle schema in the frontend repo is a read-only mirror — never runs drizzle-kit push against production. This means one Alembic migration file updates both the scraper's write models and the frontend's read models simultaneously.
Schema at a Glance
The recipes table stores 24 columns covering everything you'd expect: title, description, ingredients (JSONB), instructions (JSONB), nutrition data, ratings, timings, difficulty, cuisine, category, and full-text search vectors. A companion scrape_runs table audits every scrape job with status, counts, and timestamps.
Key indexes include a GIN index on a generated tsvector column for fast full-text search across titles and descriptions, plus B-tree indexes on source_site, cuisine, category, and total_time_minutes for filtered queries.
Why This Works
By decoupling the scraper from the API, each can be developed, deployed, and scaled independently. The scraper runs on a cron (GitHub Actions at 3am UTC), while the frontend scales with Vercel's edge network. No orchestration, no message queues, no complex infrastructure — just a database both sides can reach.