Whisper V3 Turbo
A low-latency Whisper V3 Turbo deployment optimized for shorter audio clips
Deploy Whisper V3 Turbo behind an API endpoint in seconds.
Deploy modelExample usage
For a full list of supported parameters, see the TensorRT-LLM Engine Builder documentation for Whisper.
Input
1import requests
2import os
3
4# Model ID for production deployment
5model_id = ""
6# Read secrets from environment variables
7baseten_api_key = os.environ["BASETEN_API_KEY"]
8
9# Call model endpoint
10resp = requests.post(
11 f"https://model-{model_id}.api.baseten.co/production/predict",
12 headers={"Authorization": f"Api-Key {baseten_api_key}"},
13 json={
14 "url": "https://www2.cs.uic.edu/~i101/SoundFiles/gettysburg10.wav",
15 }
16)
17
18print(resp.content.decode("utf-8"))
JSON output
1{
2 "segments": [
3 {
4 "start": 0,
5 "end": 9.8,
6 "text": "Four score and seven years ago, our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal."
7 }
8 ],
9 "language_code": "en"
10}