three triangles with the bottom edge missing inside each otherMARS6

MARS6 is a frontier text-to-speech model by CAMB.AI with voice/prosody cloning capabilities in 10 languages. MARS6 must be licensed for commercial use, we can help!

Deploy MARS6 behind an API endpoint in seconds.

Example usage

This model requires at least four inputs:

  1. text: The input text that needs to be spoken

  2. audio_ref: An audio file containing the audio of a single person

  3. ref_text: What is spoken in audio_ref

  4. language: The language code for the target language

The model will try to output an audio file containing the speech in the reference audio's style. The output is a base64 string so it needs to get converted to an audio format before it can be played.

data = {"text": "The quick brown fox jumps over the lazy dog",
        "audio_ref": encoded_str, 
        "ref_text": prompt_txt,
        "language": 'en-us', # Target language, in this case english. 
        # "top_p": 0.7, # Optionally specify a top_p (default 0.7)
        # "temperature": 0.7, # Optionally specify a temperature (default 0.7)
        # "chunk_length": 200, # Optional text chunk length for splitting long pieces of input text. Default 200
        # "max_new_tokens": 0, # Optional limit on max number of new tokens, default is zero (unlimited)
        # "repetition_penalty": 1.5 # Optional rep penalty, default 1.5
} 
Input
1import httpx
2import base64
3import time
4import torchaudio
5import IPython.display as ipd
6import librosa, librosa.display
7import torch
8import io
9
10# Step 1: set endpoint url and api key:
11url = "<YOUR PREDICTION ENDPOINT>"
12headers = {"Authorization": "Api-Key <YOUR API KEY>"}
13
14
15# Step 2: pick reference audio to clone, encode it as base64
16file_path = 'ref_debug.flac' # any valid audio filepath, ideally between 6s-90s.
17wav, sr = librosa.load(file_path, sr=None, mono=True, offset=0, duration=5)
18io_data = io.BytesIO()
19torchaudio.save(io_data, torch.from_numpy(wav)[None], sample_rate=sr, format='wav')
20io_data.seek(0)
21encoded_data = base64.b64encode(io_data.read())
22encoded_str = encoded_data.decode("utf-8")
23# OPTIONAL: specify the transcript of the reference/prompt (slightly speeds up inference, and may make it sound a bit better).
24prompt_txt = None # if unspecified, can be left as None
25
26# Step 3: define other inference settings:
27data = {"text": "The quick brown fox jumps over the lazy dog",
28        "audio_ref": encoded_str, 
29        "ref_text": prompt_txt,
30        "language": 'en-us', # Target language, in this case english. 
31        # "top_p": 0.7, # Optionally specify a top_p (default 0.7)
32        # "temperature": 0.7, # Optionally specify a temperature (default 0.7)
33        # "chunk_length": 200, # Optional text chunk length for splitting long pieces of input text. Default 200
34        # "max_new_tokens": 0, # Optional limit on max number of new tokens, default is zero (unlimited)
35        # "repetition_penalty": 1.5 # Optional rep penalty, default 1.5
36} 
37
38# Step 4: Send the POST request (note the first request might be a bit slow, but following requests should be fast)
39st = time.time()
40response = httpx.post(url, headers=headers, json=data, timeout=120)
41et = time.time()
42
43print(f"Runtime: {et-st:.2f} seconds")
44# Check the response status code
45if response.status_code == 200: print("Request successful!")
46else: print("Request failed with status code", response.status_code, response.content)
47
48# Step 5: decode base64 output back to audio
49wav, sr = torchaudio.load(base64.b64decode(response.json()['result'])), rate=sr))
JSON output
1{
2    "result": "iVBORw0KGgoAAAANSUhEU"
3}
Preview
00:00/00:00

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G