MARA Cloud provides access to high-performance open-source models optimized for enterprise inference workloads. Models are deployed across dedicated node clusters for consistent, low-latency performance.
Production models
Production models are intended for use in production environments and meet our high standards for speed, quality, and reliability.
| Model | Model ID | Price per 1M tokens | Context Window | Hugging Face |
|---|---|---|---|---|
| GPT OSS 120B | gpt-oss-120B | $0.15 input / $0.75 output | 128K | Model card |
| DeepSeek V3.1 | DeepSeek-V3.1 | $0.60 input / $1.70 output | 128K | Model card |
| MiniMax M2.5 | MiniMax-M2.5 | $0.30 input / $1.20 output | 160K | Model card |
Note: Pricing is per 1 million tokens. Input and output tokens are priced separately.
Usage example
Use the model ID when making API requests:
python
completion = client.chat.completions.create(
model="MiniMax-M2.5",
messages=[
{"role": "user", "content": "Hello, world!"}
],
)Next steps
- Rate Limits - Understand request and token limits for each model.
- Model Deprecations - Stay informed about upcoming model changes.