MARA Cloud Documentation

A reference of common terms used throughout the MARA Cloud documentation.

General

API Key A unique secret credential used to authenticate your requests to the MARA Cloud API. Generate and manage your keys from the MARA Cloud portal. See API Keys and URLs for details.

Base URL The root endpoint for all API requests. For MARA Cloud: https://bczfskny6zqw.poweredby.snova.ai/v1

Chat Completion The primary API endpoint for generating text. You send a list of messages (conversation history) and receive a model-generated response. See Chat Completions.

Context Window The maximum number of tokens a model can process in a single request, including both input and output tokens. Check the Model Catalog for each model's context window.

Endpoint A specific URL path that performs a particular action. For example, /v1/chat/completions is the endpoint for generating chat completions.

Inference The process of generating a response from a model based on your input. When you send a prompt to the API, the model runs inference to produce the output.

OpenAI Compatibility MARA Cloud's API is designed to be compatible with OpenAI's client libraries. This means you can use the OpenAI Python or JavaScript SDK by changing the base URL and API key. See OpenAI Compatibility.

Rate Limit A cap on how many requests or tokens you can use within a given time period. Measured in RPM (requests per minute) and RPD (requests per day). See Rate Limits.

Streaming A mode where the API returns tokens progressively as they are generated, rather than waiting for the full response. Useful for real-time applications and chat interfaces.

Token The basic unit of text that models process. A token can be a word, part of a word, or a punctuation mark. As a rough guide, 1 token is approximately 4 characters in English.

Roles

System message A message with the system role that sets the model's behavior and personality for the conversation. For example: "You are a helpful assistant that responds in JSON."

User message A message with the user role that represents the end user's input or question.

Assistant message A message with the assistant role that contains the model's previous response. Used in multi-turn conversations to give the model context of prior exchanges.

Tool message A message with the tool role that contains the result of a function call. Used in function calling workflows to pass execution results back to the model.

Model Parameters

temperature Controls the randomness of the model's output. Lower values (e.g., 0.2) produce more focused and deterministic responses. Higher values (e.g., 0.8) produce more varied and creative output. MARA Cloud supports values between 0 and 1.

max_tokens The maximum number of tokens the model will generate in its response. The combined total of input tokens and output tokens cannot exceed the model's context window.

top_p Also known as nucleus sampling. The model considers only the smallest set of tokens whose cumulative probability exceeds the top_p value. For example, top_p: 0.1 means only tokens in the top 10% probability mass are considered. Range: 0 to 1.

top_k Limits token selection to the K most probable next tokens. For example, top_k: 10 means only the 10 most likely tokens are considered at each step. Range: 1 to 100.

stop One or more sequences that tell the model to stop generating. When the model produces a stop sequence, it ends the response. The stop sequence itself is not included in the output.

stream A boolean parameter. When set to true, the API returns tokens as server-sent events as they are generated. When false (default), the full response is returned at once.

response_format Controls the format of the model's output. Options include plain text (default), JSON Mode (json_object), and JSON Schema (json_schema) for structured outputs. See Function Calling and JSON Mode.

tool_choice Controls whether and how the model uses the tools you've defined. Options: auto (model decides), required (must call a tool), none (no tool calls), or a specific function name. See Function Calling and JSON Mode.

Get Started

Models

Features

Platform

Data Privacy

Resources

Glossary

General

Roles

Model Parameters