large language

Phi 3.5 Mini Instruct

A highly capable lightweight LLM from Microsoft

Deploy now

‌

Model details

Developed by
Microsoft
Model family
Phi
Use case
large language
Version
3.5
Variant
128k
Size
3.8B
Optimization
vLLM
Hardware
A10G
License
MIT
Readme
View

View repository

Example usage

Phi 3.5 uses the standard set of LLM parameters and has optional streaming output.

Input

1import requests
2import os
3
4# Replace the empty string with your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8messages = [
9    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
10    {"role": "user", "content": "Who are you?"},
11]
12data = {
13    "messages": messages,
14    "stream": True,
15    "temperature": 0.5
16}
17
18# Call model endpoint
19res = requests.post(
20    f"https://model-{model_id}.api.baseten.co/production/predict",
21    headers={"Authorization": f"Api-Key {baseten_api_key}"},
22    json=data,
23    stream=True
24)
25
26# Print the generated tokens as they get streamed
27for content in res.iter_content():
28    print(content.decode("utf-8"), end="", flush=True)

JSON output

1[
2    "arrrg",
3    "me hearty",
4    "I",
5    "be",
6    "doing",
7    "..."
8]

large language models

See all

Model API

LLM

Kimi K2 0905

0905 - K2

Model API

LLM

DeepSeek V3.1

V3.1 - B200

Model API

LLM

Qwen3 235B 2507

2507

Microsoft models

See all

LLM

Phi 3.5 Mini Instruct

3.5 - 128k - vLLM - A10G

LLM

Phi 3 Mini 128K Instruct

3 - 128k - T4

LLM

Phi 3 Mini 4K Instruct

3 - 4k - T4

🔥 Trending models

Model API

LLM

GPT OSS 120B

MoE

LLM

GPT OSS 20B

MoE

Image generation

Qwen Image

Text-to-Image

Explore Baseten today

Start deploying

Talk to an engineer