Rate limits
Vectros applies two independent rate limits, each tuned for a different purpose:
- A per-account business limit at the application layer — a fixed number of requests per minute, selected from your plan. This is the limit you pace against day to day, and the one this page is about.
- A per-IP ceiling at the network edge — a high, plan-independent flood-protection limit. A legitimate server-to-server backend never reaches it; it exists to cut off a single-source flood before it reaches application code.
The two are enforced at different layers and return slightly different responses (see Two layers, two responses below).
The per-account limit
- Window. A fixed one-minute window. Your request count resets at the top of each minute (it is not a rolling window).
- Scope. The limit is per account, shared across every key. All of your API keys and scoped tokens draw from the same per-minute budget — issuing more keys does not raise it.
- What counts. Writes, searches, and inference count against the limit. Reads
(
GETrequests) do not — listing and fetching are not throttled by this limit.
Limits by plan
| Plan | Requests per minute |
|---|---|
| Free | 60 |
| Starter | 300 |
| Pro | 600 |
| Scale | 1,200 |
| Enterprise | 1,200 (negotiable) |
An unrecognized or unset plan is treated as Free (60/min). Enterprise agreements can raise the ceiling beyond the Scale default by arrangement.
When you exceed the limit
A request over the limit returns HTTP 429 with a JSON body and a set of headers that tell
you exactly how to recover:
{
"message": "Rate limit exceeded. Please try again later.",
"errorCode": "RATE_LIMITED",
"requestId": "…"
}
| Header | Meaning |
|---|---|
Retry-After | Seconds to wait before retrying. After this many seconds the window resets and the request will succeed (assuming you are within budget again). |
X-RateLimit-Limit | Your plan's maximum requests per minute. |
X-RateLimit-Remaining | Requests remaining in the current window — always 0 on a 429. |
Retry-After is the value to honor: it is the number of seconds until the current minute window
resets, so a client that sleeps for Retry-After seconds lands in a fresh window with a full
budget.
Handling 429s
- Honor
Retry-After. Sleep for the advertised number of seconds, then retry. This is the single most effective thing a client can do. - Back off exponentially if you retry without reading
Retry-After, and add jitter so a fleet of workers doesn't retry in lockstep. - Pace bulk work. A one-shot backfill (for example, ingesting a large corpus) is exactly the workload that hits this limit. Spread writes across minutes, or cap concurrency so your steady rate stays under your plan's per-minute budget.
- Branch on the
429status, not the body. The status code is the contract; theerrorCodeand headers are there to help you recover.
The limiter fails open: if the counter store is briefly unavailable, requests are allowed rather than rejected, so a limiter outage never blocks your traffic.
Two layers, two responses
Because the per-account limit and the per-IP edge ceiling live at different layers, a 429 can
come from either:
- Application limit (per-account, per-minute): the body above, with
Retry-AfterandX-RateLimit-*headers and arequestIdyou can quote to support. - Edge limit (per-IP, flood protection): a
429from the firewall with the sameerrorCode: "RATE_LIMITED"discriminator, but norequestId(the edge has no per-request correlation id). If you contact support about an edge429, quote thex-amz-cf-idresponse header instead.
In both cases the status is 429 and the body carries errorCode: "RATE_LIMITED", so a
single check on the status (and, if you want, the errorCode) handles either layer.
Notes & limits
- The per-minute window is fixed, not rolling: bursting right before a minute boundary and again right after briefly allows up to two windows' worth of requests in a short span. Pace to your average rate, not the instantaneous one.
- Reads are not counted by the per-account limit today. Plan your read-heavy workloads against the edge per-IP ceiling, not this limit.
- The limit is selected from your plan at request time; upgrading your plan raises it immediately on the next request.