XTTS V2
State of the art text to speech model
Deploy XTTS V2 behind an API endpoint in seconds.
Deploy modelExample usage
This model requires at least two inputs:
text
: The input text that needs to be spokenspeaker_voice
: An audio file containing the audio of a single person
The model will try to output an audio file containing the speech in the speaker's style. The output is a base64 string so it needs to get converted to an audio format before it can be played.
1import base64
2import sys
3
4# Paste your model id below
5model_id = ""
6baseten_api_key = os.environ["BASETEN_API_KEY"]
7
8def wav_to_base64(file_path):
9 with open(file_path, "rb") as wav_file:
10 binary_data = wav_file.read()
11 base64_data = base64.b64encode(binary_data)
12 base64_string = base64_data.decode("utf-8")
13 return base64_string
14
15def base64_to_wav(base64_string, output_file_path):
16 binary_data = base64.b64decode(base64_string)
17 with open(output_file_path, "wb") as wav_file:
18 wav_file.write(binary_data)
19
20voice = wav_to_base64("/path/to/wav/file/voice.wav")
21text = "Listen up, people. Life's a wild ride, and sometimes you gotta grab it by the horns and steer it where you want to go. You can't just sit around waiting for things to happen – you gotta make 'em happen. Yeah, it's gonna get tough, but that's when you dig deep, find that inner badass, and come out swinging. Remember, success ain't handed to you on a silver platter; you gotta snatch it like it owes you money. So, lace up your boots, square those shoulders, and let the world know that you're here to play, and you're playing for keeps"
22data = {"text": text, "speaker_voice": voice, "language": "en"}
23
24res = requests.post(
25 f"https://model-{model_id}.api.baseten.co/production/predict",
26 headers={"Authorization": f"Api-Key {baseten_api_key}"},
27 json=data
28)
29
30res = res.json()
31output = base64_to_wav(res.get('output'), "output.wav")
1{
2 "output": "iVBORw0KGgoAAAANSUhEU"
3}