Orpheus TTS

Canopy Labs LogoOrpheus TTS

An incredibly lifelike speech synthesis model by Canopy Labs.

Deploy Orpheus TTS behind an API endpoint in seconds.

Example usage

Orpheus TTS must generate ~83 tokens/second for real-time streaming. This implementation supports streaming and, on an H100 MIG GPU, can produce seven concurrent real-time streams.

We'll be releasing a new version of this model soon with even better performance.

Input
1import asyncio
2import aiohttp
3import uuid
4import time
5import string
6
7BASETEN_HOST = "YOUR_PREDICT_URL"
8BASETEN_API_KEY = "YOUR_API_KEY"
9
10base_request_payload = {
11    "text": "Hey there! My name is Tara. How are you doing today?",
12    "max_tokens": 10000,
13    "voice": "tara"
14}
15
16async def stream_to_buffer(session, stream_label, payload):
17    """
18    Sends a streaming request and accumulates the response in a buffer.
19    """
20    unique_id = str(uuid.uuid4())
21    payload_with_id = payload.copy()
22    payload_with_id["request_id"] = unique_id
23
24    print(f"[{stream_label}] Starting request with request_id: {unique_id}")
25    start_time = time.time()
26    
27    async with session.post(
28        BASETEN_HOST,
29        json=payload_with_id,
30        headers={"Authorization": f"Api-Key {BASETEN_API_KEY}"}
31    ) as resp:
32        if resp.status != 200:
33            print(f"[{stream_label}] Error: Received status code {resp.status}")
34            return b""
35        buffer = b""
36        chunk_count = 0
37        async for chunk in resp.content.iter_chunked(4096):
38            chunk_count += 1
39            now = time.time()
40            execution_time_ms = (now - start_time) * 1000
41            print(f"[{stream_label}] Received chunk {chunk_count} ({len(chunk)} bytes) after {execution_time_ms:.2f}ms")
42            buffer += chunk
43
44        total_time = time.time() - start_time
45        print(f"[{stream_label}] Completed receiving stream. Total size: {len(buffer)} bytes in {total_time:.2f}s")
46        return buffer
47
48async def main():
49    n = 1 # number of streams to run in parallel
50    stream_labels = [f"Stream{chr(65+i)}" for i in range(n)]
51    
52    async with aiohttp.ClientSession() as session:
53        tasks = [
54            stream_to_buffer(session, label, base_request_payload)
55            for label in stream_labels
56        ]
57        results = await asyncio.gather(*tasks)
58        
59        for label, buffer in zip(stream_labels, results):
60            filename = f"output_{label}.wav"
61            with open(filename, "wb") as f:
62                f.write(buffer)
63            print(f"Saved {filename}")
64
65if __name__ == '__main__':
66    asyncio.run(main())
JSON output
1null

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G