Rate limits help manage how often you can call the API within set time intervals, ensuring stable performance, equitable access, and reliable inference.
Rate limit tiers
MARA Cloud offers three rate limit tiers:
| Tier | Activation | Limits |
|---|---|---|
| Free | No payment method linked | RPM + RPD + TPD |
| Developer | Payment method linked | RPM + RPD |
| Enterprise | Contact sales team | Custom |
Where:
- RPM - Requests per Minute
- RPD - Requests per Day
- TPD - Tokens per Day (Free tier only)
How rate limits work
- Rate limits are measured per model.
- You will hit whichever limit (RPM or RPD) you reach first.
- Every API response includes headers indicating your remaining quota and reset times.
- If you exceed a rate limit, the API returns a
429 Too Many Requestserror. See Error Codes for details.
Production model rate limits
Developer Tier
| Model | Model ID | RPM | RPD |
|---|---|---|---|
| GPT OSS 120B | gpt-oss-120B | 150 | 50K |
| DeepSeek V3.1 | DeepSeek-V3.1 | 60 | 15K |
| MiniMax M2.5 | MiniMax-M2.5 | 200 | 50K |
Free Tier
| Model | Model ID | RPM | RPD | TPD |
|---|---|---|---|---|
| GPT OSS 120B | gpt-oss-120B | 20 | 20 | 200K |
| DeepSeek V3.1 | DeepSeek-V3.1 | 20 | 20 | 200K |
| MiniMax M2.5 | MiniMax-M2.5 | 20 | 20 | 200K |
Rate limit response headers
Every API response includes headers indicating your current rate limit status.
Requests per minute (RPM):
| Header | Description |
|---|---|
x-ratelimit-limit-requests | Maximum requests allowed per minute |
x-ratelimit-remaining-requests | Requests remaining in the current minute |
x-ratelimit-reset-requests | Time (epoch) until the per-minute quota resets |
Requests per day (RPD):
| Header | Description |
|---|---|
x-ratelimit-limit-requests-day | Maximum requests allowed per day |
x-ratelimit-remaining-requests-day | Requests remaining in the current day |
x-ratelimit-reset-requests-day | Time (epoch) until the per-day quota resets |
Handling rate limits
If you receive a
429 error, wait for the reset window indicated in the response headers before retrying. A simple retry strategy:python
import time
def make_request_with_retry(client, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="MiniMax-M2.5",
messages=[{"role": "user", "content": "Hello"}],
)
except Exception as e:
if "429" in str(e) and attempt < max_retries - 1:
time.sleep(2 ** attempt) # exponential backoff
else:
raiseNeed higher limits?
Contact our sales team to discuss Enterprise tier plans with custom rate limits.