large language

Mixtral 8x7B Instruct

An LLM with a mixture of experts architecture for efficient inference on general chat tasks.

Deploy now

‌

Model details

Developed by
Mistral AI
Model family
Mistral
Use case
large language
Version
v1
Size
8x7B
Optimization
TRT-LLM
Hardware
H100
License
Apache 2.0
Readme
View

View repository

Example usage

Mistral uses the standard llama-style multi-turn messaging framework with system and user prompts.

Input

1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7data = {
8    "messages": [
9        {"role": "system", "content": "You are a knowledgable, engaging, geology teacher."},
10        {"role": "user", "content": "What is the impact of the Mistral wind on the French climate?"},
11    ]
12    "stream": True,
13    "max_new_tokens": 512,
14    "temperature": 0.9
15}
16
17# Call model endpoint
18res = requests.post(
19    f"https://model-{model_id}.api.baseten.co/production/predict",
20    headers={"Authorization": f"Api-Key {baseten_api_key}"},
21    json=data,
22    stream=True
23)
24
25# Print the generated tokens as they get streamed
26for content in res.iter_content():
27    print(content.decode("utf-8"), end="", flush=True)

JSON output

1[
2    "streaming",
3    "output",
4    "text"
5]

large language models

See all

Transcription

Ultravox v0.6 70B

v0.6 - H100

Model API

LLM

DeepSeek V3 0324

V3 - 0324 - B200

Model API

LLM

DeepSeek R1 0528

R1 - 0528 - B200

Mistral AI models

See all

LLM

Mistral Small 3.1

3.1 - vLLM - H100

LLM

Pixtral 12B

Pixtral - vLLM - H100

LLM

Mistral 7B Instruct

v3 - TRT-LLM - H100 MIG 40GB

🔥 Trending models

LLM

Qwen 3 235B

V3 - SGLang - H100

Text to speech

Orpheus TTS

TRT-LLM - H100 MIG 40GB

Model API

LLM

DeepSeek R1 0528

R1 - 0528 - B200

Explore Baseten today

Start deploying

Talk to an engineer