# Infrastructure blueprint (our stack)

This blueprint describes how ToolCall Store runs on the existing production stack: a single Ubuntu VPS reverse-proxied by **Caddy**, with **FastAPI** services, **SQLite today / Postgres on growth**, **Stalwart** self-hosted email, **OpenProvider** as the primary domain registrar, **Stripe** billing, and **PM2 + storectl** for process management. It deliberately avoids Cloudflare/Workers, managed Redis, and managed object storage until volume justifies them.

## Edge / site
- **Caddy** (NOT nginx — nginx is disabled) terminates TLS and reverse-proxies by domain to a localhost port. Add a `reverse_proxy 127.0.0.1:<port>` block for the API and reload Caddy.
- Static marketing/docs site (`index.html`, `pricing.html`, `docs/`, machine files) is served directly by Caddy as files.
- The `/v1` API is a **FastAPI** app (Python, uvicorn) on a dedicated localhost port, registered with **storectl** (the port/registry manager) so it is supervised, survives reboot (cron watchdog + `pm2 save`), and never drifts. Do not start it by hand with bare nohup.

## Core services
- **Database**: start on **SQLite** (`/opt/factory/phonescale.db` pattern, or a dedicated `toolcallstore.db`) for tenants, API keys, projects, quotes, approvals, receipts, metering ledger, provider costs, and webhook records. SQLite is fine for the first phase (single-writer, WAL mode). **Migration path to Postgres** when concurrent writes, multi-process workers, or row-level locking on idempotency/spend-cap checks become a bottleneck — keep all access behind a thin data layer (SQLAlchemy) so the swap is a connection-string + dialect change, not a rewrite.
- **Async jobs**: no Redis yet. For media generation, DNS verification, and webhook retries, start with a **SQLite-backed job table polled by a FastAPI background worker** (or APScheduler / a small `asyncio` queue). Introduce Redis + RQ/Celery only when job volume or fan-out demands it. Design jobs to be idempotent and retryable from day one so the queue backend can change later.
- **Object storage**: no S3/R2 yet. Store generated images/videos, screenshots, and receipt artifacts on local disk under a served path (e.g. `/var/www/toolcallstore/assets/`) behind Caddy, with signed, expiring URLs issued by the API. Abstract this behind a `storage` interface so a future move to S3-compatible storage (Backblaze B2, R2, MinIO) is a one-adapter change.
- **Secrets**: provider keys, customer BYOK keys, and webhook signing secrets live in an env/secrets file readable only by the API user (chmod 600), encrypted at rest in the DB for BYOK (Fernet/AES-GCM with a master key from the environment). Never return secrets in plaintext; store only `last4` for display.

## Provider adapters
Adapters are thin, swappable classes behind a common interface per category. Primary choices reflect what we already operate:
- **Domains / DNS**: **OpenProvider is the PRIMARY domain registrar and DNS adapter** (toolcallstore.com itself is registered there, and we hold credentials). Reuse the existing domain-ordering pattern from the affiliate factory where useful. Keep **Porkbun** as a secondary/failover registrar (already integrated elsewhere in the stack). NameSilo / OpenSRS / GoDaddy reseller are optional future adapters, not required for launch.
- **Email**: **Stalwart self-hosted** is the email backend — provision mailboxes via Stalwart's JMAP admin API (the same method used across the other products: authenticate per-domain as the mailbox, never as the management `admin` account). Apply MX/SPF/DKIM/DMARC automatically via the DNS adapter. OpenSRS email / Titan / Zoho are NOT used.
- **LLM**: direct **OpenAI, Anthropic, Gemini** for the premium lane; **OpenRouter / Together** for breadth via a single key; **DeepSeek / Groq / Fireworks / Mistral / Cerebras / Cloudflare Workers AI** for the low-cost lane. Routing policies live in `models.json`. Pull live provider price tables at quote time.
- **Media**: **Leonardo.Ai, Gemini Image, fal.ai, Replicate, Runware, Stability** behind one media adapter with a cost ceiling per request.
- **Payments**: **Stripe** (live mode is already integrated in the ShiftDeck product — reuse the same checkout + webhook pattern) for subscriptions and metered usage; the metering ledger reconciles provider cost vs. customer price vs. margin per receipt.

## Required controls
- Tenant- and project-level **spend caps** (checked transactionally before every billable call; on breach return HTTP 402 `spend_cap_exceeded`).
- **Per-tool scopes** enforced on the API key (return 403 `insufficient_scope`; never auto-escalate).
- **Quote-before-execute** requirement for every billable or irreversible action.
- **Human approval** on high-risk actions (domain purchase, DNS write, email provisioning, video generation, public publish, high spend); execution requires a granted `approval_id`.
- **Idempotency keys** (`Idempotency-Key` header) on all register/apply/create/generate/execute calls; replays return the original receipt. Store keys for 24h.
- **Audit log** for every request (actor, scope, cost, decision, request_id).
- **Webhooks** (HMAC-SHA256 signed, `ToolCall-Signature` header) for quote/approval/execution/receipt/media events, with at-least-once delivery and retry.
- **Provider failover** with a cost ceiling: on provider error or degraded status, fall back within the same lane without exceeding `max_cost_usd`.

## Deployment notes
- Build/run the API: `cd /var/www/toolcallstore/api && python3 -m uvicorn main:app --host 127.0.0.1 --port <port>`, supervised via storectl (do not rely on bare nohup over SSH).
- Register the API port with `storectl assign`, add the Caddy `reverse_proxy` block, reload Caddy, then `storectl start`.
- Keep `/llms.txt`, `/llms-full.txt`, `/openapi.json`, `/catalog.json`, `/pricing.json`, `/models.json`, and `/.well-known/*` publicly served by Caddy (no auth) so agents can discover the store.
