MARA Cloud Documentation

Rate limits help manage how often you can call the API within set time intervals, ensuring stable performance, equitable access, and reliable inference.

Rate limit tiers

MARA Cloud offers three rate limit tiers:

Tier	Activation	Limits
Free	No payment method linked	RPM + RPD + TPD
Developer	Payment method linked	RPM + RPD
Enterprise	Contact sales team	Custom

Where:

RPM - Requests per Minute
RPD - Requests per Day
TPD - Tokens per Day (Free tier only)

How rate limits work

Rate limits are measured per model.
You will hit whichever limit (RPM or RPD) you reach first.
Every API response includes headers indicating your remaining quota and reset times.
If you exceed a rate limit, the API returns a 429 Too Many Requests error. See Error Codes for details.

Production model rate limits

Developer Tier

Model	Model ID	RPM	RPD
GPT OSS 120B	`gpt-oss-120B`	150	50K
DeepSeek V3.1	`DeepSeek-V3.1`	60	15K
MiniMax M2.5	`MiniMax-M2.5`	200	50K

Free Tier

Model	Model ID	RPM	RPD	TPD
GPT OSS 120B	`gpt-oss-120B`	20	20	200K
DeepSeek V3.1	`DeepSeek-V3.1`	20	20	200K
MiniMax M2.5	`MiniMax-M2.5`	20	20	200K

Rate limit response headers

Every API response includes headers indicating your current rate limit status.

Requests per minute (RPM):

Header	Description
`x-ratelimit-limit-requests`	Maximum requests allowed per minute
`x-ratelimit-remaining-requests`	Requests remaining in the current minute
`x-ratelimit-reset-requests`	Time (epoch) until the per-minute quota resets

Requests per day (RPD):

Header	Description
`x-ratelimit-limit-requests-day`	Maximum requests allowed per day
`x-ratelimit-remaining-requests-day`	Requests remaining in the current day
`x-ratelimit-reset-requests-day`	Time (epoch) until the per-day quota resets

Handling rate limits

If you receive a 429 error, wait for the reset window indicated in the response headers before retrying. A simple retry strategy:

python

import time

def make_request_with_retry(client, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="MiniMax-M2.5",
                messages=[{"role": "user", "content": "Hello"}],
            )
        except Exception as e:
            if "429" in str(e) and attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # exponential backoff
            else:
                raise

Need higher limits?

Contact our sales team to discuss Enterprise tier plans with custom rate limits.

Get Started

Models

Features

Platform

Data Privacy

Resources

Rate Limits