Pricing

The fastest, most scalable model inference in our cloud or yours

Inference for custom-built LLMs could be a major headache. Thanks to Baseten, we’re getting cost-effective high-performance model serving without any extra burden on our internal engineering teams. Instead, we get to focus our expertise on creating the best possible domain-specific LLMs for our customers.
Waseem Alshikh, CTO and Co-Founder of Writer

Enterprise

Unparalleled control over AI model deployments

Included in Enterprise:

Meet strict data residency requirements

Align with industry compliance standards

Customize hardware and GPU usage

Use existing cloud commitments

Get a custom quote

Trusted by top engineering and machine learning teams

The best hardware on the market

Only pay for the compute you use, down to the minute

Best-in-class model performance, effortless autoscaling, and blazing fast cold starts mean you get the most out of each GPU, saving money along the way.

Get started for free

Select an instance type

Volume discounts available

T4x4x16

16 GiB VM, 4 vCPUs, 16 GiB RAM

$0.01052

L4x4x16

24 GiB VRAM, 4 vCPUs, 16 GiB RAM

$0.01414

A10Gx4x16

24 GiB VM, 4 vCPUs, 16 GiB RAM

$0.02012

A100x12x144

80 GiB VRAM, 12 vCPUs, 144 GiB RAM

$0.06667

H100x26x234

80 GiB VRAM, 26 vCPUs, 234 GiB RAM

$0.10833

H100MIG:3x13x117

40 GiB VRAM, 13 vCPUs, 117 GiB RAM

$0.06250

B200x28x384

180 GiB VRAM, 28 vCPUs, 384 GiB RAM

$0.16633

1x2

1 vCPU, 2GiB RAM

$0.00058

1x4

1 vCPU, 4GiB RAM

$0.00086

2x8

2 vCPUs, 8GiB RAM

$0.00173

4x16

4 vCPUs, 16GiB RAM

$0.00346

8x32

8 vCPUs, 32GiB RAM

$0.00691

16x64

16 vCPUs, 64GiB RAM

$0.01382

With Baseten, we gained a lot of control over our entire inference pipeline and worked with Baseten’s team to optimize each step.
Sahaj Garg, Co-Founder and CTO

You guys have literally enabled us to hit insane revenue numbers without ever thinking about GPUs and scaling. I know I ask for a lot so I just wanted to let you guys know that I am so blown away by everything Baseten.
Isaiah Granet, CEO and Co-Founder of Bland AI

Rime’s state-of-the-art p99 latency and 100% uptime is driven by our shared laser focus on fundamentals, and we’re excited to push the frontier even further with Baseten.

Lily Clifford, Co-founder and CEO of Rime

Baseten enabled us to achieve something remarkable—delivering real-time AI phone calls with sub-400 millisecond response times. That level of speed set us apart from every competitor.
Isaiah Granet, CEO and Co-Founder of Bland AI

Vincent Wilmet, Co-founder and CTO @ toby

A week ago we reached out with a hefty goal and within days your team helped us get set up and stable for a launch. It went smoothly, entirely thanks to you guys. 100% couldn’t have gone live without the software and hardware support you guys worked through the weekend to get us. The custom optimized Whisper on Baseten’s autoscaling L4 GPUs saved us.
Vincent Wilmet, Co-founder and CTO @ toby

Inference for custom-built LLMs could be a major headache. Thanks to Baseten, we’re getting cost-effective high-performance model serving without any extra burden on our internal engineering teams. Instead, we get to focus our expertise on creating the best possible domain-specific LLMs for our customers.
Waseem Alshikh, CTO and Co-Founder of Writer

You guys have literally enabled us to hit insane revenue numbers without ever thinking about GPUs and scaling. We would be stuck in GPU AWS land without y'all. Truss files are amazing, y'all are on top of it always, and the product is well thought out. I know I ask for a lot so I just wanted to let you guys know that I am so blown away by everything Baseten.
Isaiah Granet

Nikhil Harithas, Senior ML Engineer at Patreon

Baseten gets the stuff we don't want to do out of the way. Now, our small, scrappy team can punch above our weight. It's everything from model serving, to auto-scaling, to iterating on products around those models, so we can deliver value to our customers and not worry about ML infrastructure.
Nikhil Harithas, Senior ML Engineer at Patreon

Faaez Ul Haq, Head of Data Science at Pipe

Baseten provides an easy way for us to host our models, iterate on them, and experiment without worrying about any of the DevOps involved.
Faaez Ul Haq, Head of Data Science at Pipe

Andrew Ward, VP of Machine Learning at Laurel

Baseten has allowed us to efficiently build an entirely new machine learning platform in just 4 months. By not needing to worry about managing our model infrastructure, Laurel has been able to drastically reduce our time to develop new predictive features and maintain more than double the number of models from our old platform.
Andrew Ward, VP of Machine Learning at Laurel

Commonly asked questions

You can deploy open source and custom models on Baseten. Start with an off-the-shelf model from our model library. Or deploy any model using Truss, our open source standard for packaging and serving models built in any framework.
You have control over what GPUs your models use. See our instance type reference for a full list of the GPUs currently available on Baseten. Reach out to us to request additional GPU types.
Yes, new Baseten accounts come with $30 of free credit so that you can start running models for free.
Yes, Baseten is SOC 2 Type II certified and HIPAA compliant. You can read more about our SOC 2 Type II certification here. And you can read more about our HIPAA compliance here.
No, you do not pay for idle time – you only pay for the time your model is using compute on Baseten. This includes the time your model is actively deploying, scaling up or down, or making predictions. And you have full control over how your model scales up or down.
Customer support levels vary by plan. We offer email, in-app chat, Slack, and Zoom support. We also offer dedicated forward-deployed engineering support. Talk to our Sales team to figure out a customer support level that works for your needs.
Yes, discounts on compute can be negotiated as part of our Pro plan. Talk to our Sales team to learn more.
Yes, you can self-host Baseten in order to manage security and use your own cloud commitments. Talk to our Sales team to learn more.

Explore Baseten today

We love partnering with companies developing innovative AI products by providing the most customizable model deployment with the lowest latency.

Get started free Talk to sales