Models

Rate Limits

Rate limits help manage how often you can call the API within set time intervals, ensuring stable performance, equitable access, and reliable inference.

Rate limit tiers

MARA Cloud offers three rate limit tiers:
TierActivationLimits
FreeNo payment method linkedRPM + RPD + TPD
DeveloperPayment method linkedRPM + RPD
EnterpriseContact sales teamCustom
Where:
  • RPM - Requests per Minute
  • RPD - Requests per Day
  • TPD - Tokens per Day (Free tier only)

How rate limits work

  • Rate limits are measured per model.
  • You will hit whichever limit (RPM or RPD) you reach first.
  • Every API response includes headers indicating your remaining quota and reset times.
  • If you exceed a rate limit, the API returns a 429 Too Many Requests error. See Error Codes for details.

Production model rate limits

Developer Tier

ModelModel IDRPMRPD
GPT OSS 120Bgpt-oss-120B15050K
DeepSeek V3.1DeepSeek-V3.16015K
MiniMax M2.5MiniMax-M2.520050K

Free Tier

ModelModel IDRPMRPDTPD
GPT OSS 120Bgpt-oss-120B2020200K
DeepSeek V3.1DeepSeek-V3.12020200K
MiniMax M2.5MiniMax-M2.52020200K

Rate limit response headers

Every API response includes headers indicating your current rate limit status.
Requests per minute (RPM):
HeaderDescription
x-ratelimit-limit-requestsMaximum requests allowed per minute
x-ratelimit-remaining-requestsRequests remaining in the current minute
x-ratelimit-reset-requestsTime (epoch) until the per-minute quota resets
Requests per day (RPD):
HeaderDescription
x-ratelimit-limit-requests-dayMaximum requests allowed per day
x-ratelimit-remaining-requests-dayRequests remaining in the current day
x-ratelimit-reset-requests-dayTime (epoch) until the per-day quota resets

Handling rate limits

If you receive a 429 error, wait for the reset window indicated in the response headers before retrying. A simple retry strategy:
python
import time

def make_request_with_retry(client, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="MiniMax-M2.5",
                messages=[{"role": "user", "content": "Hello"}],
            )
        except Exception as e:
            if "429" in str(e) and attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # exponential backoff
            else:
                raise

Need higher limits?

Contact our sales team to discuss Enterprise tier plans with custom rate limits.