The spend cap is enforced at the proxy on every request, before the request reaches the provider. It is best-effort: each request is checked against a running spend total that refreshes every couple of minutes, so a small overage is possible before the total catches up.
How the cap is checked
The proxy keeps a running per-month total of billed spend for each key and checks it on every request:
read running total
if (total + estimated_next_call) >= cap:
return 429
else:
forward to provider
For LLM requests, the proxy estimates the next call's cost from the request's token count (model + input tokens). For voice requests, the estimate is character count × per-char rate. The estimate is conservative — slightly over-counts to avoid letting through a request that would put the key over the cap.
What counts toward the cap
Each call adds its billed amount — the per-token (or, for voice, per-character) price shown in the reference — to the key's running total. LLM and voice spend share the same total and the same cap.
After the request lands
After each request, the proxy records the usage. Within a couple of minutes, that usage is aggregated, shown on the dashboard, and reported to your invoice. The running per-key total the proxy enforces against is refreshed on the same cadence.
Display vs. billing rounding
The dashboard shows precise (sub-cent) spend, while your invoice rounds to whole cents. The two can differ by up to a cent per line per month.
Cap-reached error path
When the counter exceeds the cap:
- The proxy returns HTTP 429 with body
{ "error": "Monthly spend limit reached", "limit": <cents>, "current": <cents> }. - The dashboard's per-key card flips to a red banner.
- The proxy keeps returning 429 until either the cap is raised or the billing month rolls over (the new month's spend starts at zero; the cap value itself does not reset).
Why the cap blocks instead of only warning
A cap that only warns creates surprise bills. CrabGlamp's stance is: when you hit the cap, requests stop — you stay in explicit control of the bill. Because the running total refreshes every couple of minutes rather than per request, a small overage is possible, but the cap stops runaway spend rather than merely warning about it.
What this means for retries
A 429 from a reached cap is not retryable until the cap is raised — unlike an ordinary rate-limit 429, retrying will not help. SDKs that auto-retry on 429 should treat this one as terminal (the Monthly spend limit reached body distinguishes it). If you use a vendor SDK directly, do not configure exponential-backoff retries against it.