Rate limits

Vectros applies two independent rate limits, each tuned for a different purpose:

  1. A per-account business limit at the application layer — a fixed number of requests per minute, selected from your plan. This is the limit you pace against day to day, and the one this page is about.
  2. A per-IP ceiling at the network edge — a high, plan-independent flood-protection limit. A legitimate server-to-server backend never reaches it; it exists to cut off a single-source flood before it reaches application code.

The two are enforced at different layers and return slightly different responses (see Two layers, two responses below).


The per-account limit

  • Window. A fixed one-minute window. Your request count resets at the top of each minute (it is not a rolling window).
  • Scope. The limit is per account, shared across every key. All of your API keys and scoped tokens draw from the same per-minute budget — issuing more keys does not raise it.
  • What counts. Writes, searches, and inference count against the limit. Reads (GET requests) do not — listing and fetching are not throttled by this limit.

Limits by plan

PlanRequests per minute
Free60
Starter300
Pro600
Scale1,200
Enterprise1,200 (negotiable)

An unrecognized or unset plan is treated as Free (60/min). Enterprise agreements can raise the ceiling beyond the Scale default by arrangement.


When you exceed the limit

A request over the limit returns HTTP 429 with a JSON body and a set of headers that tell you exactly how to recover:

{
  "message": "Rate limit exceeded. Please try again later.",
  "errorCode": "RATE_LIMITED",
  "requestId": "…"
}
HeaderMeaning
Retry-AfterSeconds to wait before retrying. After this many seconds the window resets and the request will succeed (assuming you are within budget again).
X-RateLimit-LimitYour plan's maximum requests per minute.
X-RateLimit-RemainingRequests remaining in the current window — always 0 on a 429.

Retry-After is the value to honor: it is the number of seconds until the current minute window resets, so a client that sleeps for Retry-After seconds lands in a fresh window with a full budget.

Handling 429s

  • Honor Retry-After. Sleep for the advertised number of seconds, then retry. This is the single most effective thing a client can do.
  • Back off exponentially if you retry without reading Retry-After, and add jitter so a fleet of workers doesn't retry in lockstep.
  • Pace bulk work. A one-shot backfill (for example, ingesting a large corpus) is exactly the workload that hits this limit. Spread writes across minutes, or cap concurrency so your steady rate stays under your plan's per-minute budget.
  • Branch on the 429 status, not the body. The status code is the contract; the errorCode and headers are there to help you recover.

The limiter fails open: if the counter store is briefly unavailable, requests are allowed rather than rejected, so a limiter outage never blocks your traffic.


Two layers, two responses

Because the per-account limit and the per-IP edge ceiling live at different layers, a 429 can come from either:

  • Application limit (per-account, per-minute): the body above, with Retry-After and X-RateLimit-* headers and a requestId you can quote to support.
  • Edge limit (per-IP, flood protection): a 429 from the firewall with the same errorCode: "RATE_LIMITED" discriminator, but no requestId (the edge has no per-request correlation id). If you contact support about an edge 429, quote the x-amz-cf-id response header instead.

In both cases the status is 429 and the body carries errorCode: "RATE_LIMITED", so a single check on the status (and, if you want, the errorCode) handles either layer.


Notes & limits

  • The per-minute window is fixed, not rolling: bursting right before a minute boundary and again right after briefly allows up to two windows' worth of requests in a short span. Pace to your average rate, not the instantaneous one.
  • Reads are not counted by the per-account limit today. Plan your read-heavy workloads against the edge per-IP ceiling, not this limit.
  • The limit is selected from your plan at request time; upgrading your plan raises it immediately on the next request.