Source: https://crabglamp.com/docs/llm-proxy/explanation/spend-cap-enforcement
Last updated: 2026-06-09
Type: explanation

The spend cap is enforced at the proxy on every request, before the request reaches the provider. It is best-effort: each request is checked against a running spend total that refreshes every couple of minutes, so a small overage is possible before the total catches up.

## How the cap is checked

The proxy keeps a running per-month total of billed spend for each key and checks it on every request:

```text
read running total
if (total + estimated_next_call) >= cap:
  return 429
else:
  forward to provider
```

For LLM requests, the proxy estimates the next call's cost from the request's token count (model + input tokens). For voice requests, the estimate is character count × per-char rate. The estimate is conservative — slightly over-counts to avoid letting through a request that would put the key over the cap.

## What counts toward the cap

Each call adds its billed amount — the per-token (or, for voice, per-character) price shown in the [reference](/docs/llm-proxy/reference) — to the key's running total. LLM and voice spend share the same total and the same cap.

## After the request lands

After each request, the proxy records the usage. Within a couple of minutes, that usage is aggregated, shown on the dashboard, and reported to your invoice. The running per-key total the proxy enforces against is refreshed on the same cadence.

## Display vs. billing rounding

The dashboard shows precise (sub-cent) spend, while your invoice rounds to whole cents. The two can differ by up to a cent per line per month.

## Cap-reached error path

When the counter exceeds the cap:

- The proxy returns HTTP 429 with body `{ "error": "Monthly spend limit reached", "limit": <cents>, "current": <cents> }`.
- The dashboard's per-key card flips to a red banner.
- The proxy keeps returning 429 until either the cap is raised or the billing month rolls over (the new month's spend starts at zero; the cap value itself does not reset).

## Why the cap blocks instead of only warning

A cap that only warns creates surprise bills. CrabGlamp's stance is: when you hit the cap, requests stop — you stay in explicit control of the bill. Because the running total refreshes every couple of minutes rather than per request, a small overage is possible, but the cap stops runaway spend rather than merely warning about it.

## What this means for retries

A 429 from a reached cap is not retryable until the cap is raised — unlike an ordinary rate-limit 429, retrying will not help. SDKs that auto-retry on 429 should treat this one as terminal (the `Monthly spend limit reached` body distinguishes it). If you use a vendor SDK directly, do not configure exponential-backoff retries against it.
