Llama 3.1 Nemotron 70B
Llama 3.1 70B fine-tuned by NVIDIA to beat GPT-4o on benchmarks
Deploy Llama 3.1 Nemotron 70B behind an API endpoint in seconds.
Deploy modelExample usage
Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7messages = [
8 {"role": "user", "content": "How many r in strawberry?"},
9]
10data = {
11 "messages": messages,
12 "stream": True,
13 "max_new_tokens": 512
14}
15
16# Call model endpoint
17res = requests.post(
18 f"https://model-{model_id}.api.baseten.co/production/predict",
19 headers={"Authorization": f"Api-Key {baseten_api_key}"},
20 json=data,
21 stream=True
22)
23
24# Print the generated tokens as they get streamed
25for content in res.iter_content():
26 print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2 "A sweet question!",
3 "Let's count the 'R's in 'strawberry':",
4 "1. S",
5 "2. T",
6 "3. R",
7 "4. A",
8 "5. W",
9 "6. B",
10 "7. E",
11 "8. R",
12 "9. R",
13 "10. Y",
14 "There are **3 'R's** in the word 'strawberry'."
15]