Jun 24, 20265 min readDev Soufiane

Every Recipe, One API: Architecture of a Serverless Recipe Platform

How a Python scraper and a Next.js frontend share a single Neon PostgreSQL database to serve 7,000+ recipes through a unified API.

Recipe sites are scattered across the web — each with its own design, its own quirks, and its own walled garden of content. RecipeScrape solves this by building a bridge between those sites and developers who want to build cooking apps.

The architecture is deliberately simple: two independent projects, one shared database, zero HTTP servers.

The Two-Repo Split

text

recipescrape-scraper/    ← Python, uv. Scrapes & writes. No server.
recipescrape-web/        ← Next.js + Bun. Reads & serves. UI + API.

The scraper is a pure CLI process. It has no web server, no open ports, nothing to deploy beyond a scheduled task. It discovers recipe URLs from sitemaps and category pages, feeds them to the recipe-scrapers library (which parses schema.org/Recipe ld+json markup), normalizes the data, and upserts it into the shared Neon database.

The web project is a Next.js app deployed on Vercel. It owns everything user-facing: the browsing UI, the recipe detail pages, and the public JSON API. It connects to Neon using Drizzle ORM over @neondatabase/serverless HTTP driver.

One Database, Two Ecosystems

The clever part is that both repos talk to the same Neon PostgreSQL database:

Python side writes via SQLAlchemy async + asyncpg
TypeScript side reads via Drizzle ORM + @neondatabase/serverless

Schema ownership lives firmly in Python. Alembic manages all migrations. The Drizzle schema in the frontend repo is a read-only mirror — never runs drizzle-kit push against production. This means one Alembic migration file updates both the scraper's write models and the frontend's read models simultaneously.

Schema at a Glance

The recipes table stores 24 columns covering everything you'd expect: title, description, ingredients (JSONB), instructions (JSONB), nutrition data, ratings, timings, difficulty, cuisine, category, and full-text search vectors. A companion scrape_runs table audits every scrape job with status, counts, and timestamps.

Key indexes include a GIN index on a generated tsvector column for fast full-text search across titles and descriptions, plus B-tree indexes on source_site, cuisine, category, and total_time_minutes for filtered queries.

Why This Works

By decoupling the scraper from the API, each can be developed, deployed, and scaled independently. The scraper runs on a cron (GitHub Actions at 3am UTC), while the frontend scales with Vercel's edge network. No orchestration, no message queues, no complex infrastructure — just a database both sides can reach.