Llama 3 8B Instruct TRT-LLM
LLama 3 8B Optimized For Performance
Deploy Llama 3 8B Instruct TRT-LLM behind an API endpoint in seconds.
Deploy modelExample usage
Example usage to call Llama 3 8B Instruct using Python
Input
1import requests
2
3# Replace the empty string with your model id below
4model_id = ""
5baseten_api_key = os.environ["BASETEN_API_KEY"]
6
7messages = [
8 {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
9 {"role": "user", "content": "Who are you?"},
10]
11data = {
12 "messages": messages,
13 "stream": True,
14 "max_new_tokens": 512,
15 "temperature": 0.5
16}
17
18# Call model endpoint
19res = requests.post(
20 f"https://model-{model_id}.api.baseten.co/production/predict",
21 headers={"Authorization": f"Api-Key {baseten_api_key}"},
22 json=data,
23 stream=True
24)
25
26# Print the generated tokens as they get streamed
27for content in res.iter_content():
28 print(content.decode("utf-8"), end="", flush=True)
JSON output
1[
2 "arrrg",
3 "me hearty",
4 "I",
5 "be",
6 "doing",
7 "..."
8]