Spend-cap enforcement

How the CrabGlamp LLM proxy enforces a per-virtual-key spend cap in real time across LLM and voice requests. This page covers how the cap is checked on every request, what counts toward it, the cap-reached error path, and the cadence at which dashboard and invoice totals catch up to live spend. Read this before raising a cap.

The spend cap is enforced at the proxy on every request, before the request reaches the provider. It is best-effort: each request is checked against a running spend total that refreshes every couple of minutes, so a small overage is possible before the total catches up.

How the cap is checked

The proxy keeps a running per-month total of billed spend for each key and checks it on every request:

read running total
if (total + estimated_next_call) >= cap:
  return 429
else:
  forward to provider

For LLM requests, the proxy estimates the next call's cost from the request's token count (model + input tokens). For voice requests, the estimate is character count × per-char rate. The estimate is conservative — slightly over-counts to avoid letting through a request that would put the key over the cap.

What counts toward the cap

Each call adds its billed amount — the per-token (or, for voice, per-character) price shown in the reference — to the key's running total. LLM and voice spend share the same total and the same cap.

After the request lands

After each request, the proxy records the usage. Within a couple of minutes, that usage is aggregated, shown on the dashboard, and reported to your invoice. The running per-key total the proxy enforces against is refreshed on the same cadence.

Display vs. billing rounding

The dashboard shows precise (sub-cent) spend, while your invoice rounds to whole cents. The two can differ by up to a cent per line per month.

Cap-reached error path

When the counter exceeds the cap:

The proxy returns HTTP 429 with body { "error": "Monthly spend limit reached", "limit": <cents>, "current": <cents> }.
The dashboard's per-key card flips to a red banner.
The proxy keeps returning 429 until either the cap is raised or the billing month rolls over (the new month's spend starts at zero; the cap value itself does not reset).

Why the cap blocks instead of only warning

A cap that only warns creates surprise bills. CrabGlamp's stance is: when you hit the cap, requests stop — you stay in explicit control of the bill. Because the running total refreshes every couple of minutes rather than per request, a small overage is possible, but the cap stops runaway spend rather than merely warning about it.

What this means for retries

A 429 from a reached cap is not retryable until the cap is raised — unlike an ordinary rate-limit 429, retrying will not help. SDKs that auto-retry on 429 should treat this one as terminal (the Monthly spend limit reached body distinguishes it). If you use a vendor SDK directly, do not configure exponential-backoff retries against it.

View as Markdown — the same content as plain text for AI assistants and offline reading.