MARA Cloud Documentation

MARA Cloud provides access to high-performance open-source models optimized for enterprise inference workloads. Models are deployed across dedicated node clusters for consistent, low-latency performance.

Production models

Production models are intended for use in production environments and meet our high standards for speed, quality, and reliability.

Model	Model ID	Price per 1M tokens	Context Window	Hugging Face
GPT OSS 120B	`gpt-oss-120B`	$0.15 input / $0.75 output	128K	Model card
DeepSeek V3.1	`DeepSeek-V3.1`	$0.60 input / $1.70 output	128K	Model card
MiniMax M2.5	`MiniMax-M2.5`	$0.30 input / $1.20 output	160K	Model card

Note: Pricing is per 1 million tokens. Input and output tokens are priced separately.

Usage example

Use the model ID when making API requests:

python

completion = client.chat.completions.create(
    model="MiniMax-M2.5",
    messages=[
        {"role": "user", "content": "Hello, world!"}
    ],
)

Next steps

Rate Limits - Understand request and token limits for each model.
Model Deprecations - Stay informed about upcoming model changes.

Get Started

Models

Features

Platform

Data Privacy

Resources

MARA Cloud Models

Production models

Usage example

Next steps