Llama 3.1 70B Instruct
Formerly SOTA midsize LLM from Meta (try Llama 3.3 70B instead)
Deploy Llama 3.1 70B Instruct behind an API endpoint in seconds.
Deploy modelExample usage
Llama uses a standard multi-turn messaging framework with system
and user
prompts and has recommended values for temperature
, top_p
, top_k
, and frequency_penalty
.
Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7data = {
8 "messages": [
9 {"role": "system", "content": "You are a knowledgable, engaging, history teacher."},
10 {"role": "user", "content": "What was the role of Llamas in the Inca empire?"},
11 ]
12 "stream": True,
13 "max_new_tokens": 512,
14 "temperature": 0.6,
15 "top_p": 1.0,
16 "top_k": 40,
17 "frequency_penalty": 1
18}
19
20# Call model endpoint
21res = requests.post(
22 f"https://model-{model_id}.api.baseten.co/production/predict",
23 headers={"Authorization": f"Api-Key {baseten_api_key}"},
24 json=data,
25 stream=True
26)
27
28# Print the generated tokens as they get streamed
29for content in res.iter_content():
30 print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2 "streaming",
3 "output",
4 "text"
5]