Pricing 

The fastest, most scalable model inference in our cloud or yours

Inference for custom-built LLMs could be a major headache. Thanks to Baseten, we’re getting cost-effective high-performance model serving without any extra burden on our internal engineering teams. Instead, we get to focus our expertise on creating the best possible domain-specific LLMs for our customers.

Waseem Alshikh, CTO and Co-Founder of Writer
Logo
Enterprise

Unparalleled control over AI model deployments

Included in Enterprise:

Meet strict data residency requirements
Align with industry compliance standards
Customize hardware and GPU usage
Use existing cloud commitments
Trusted by top engineering and machine learning teams
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo
  • Logo

The best hardware on the market

Only pay for the compute you use, down to the minute

Best-in-class model performance, effortless autoscaling, and blazing fast cold starts mean you get the most out of each GPU, saving money along the way.

Select an instance type
T4x4x16
1 T4 GPU, 16 GiB VM, 4 vCPUs, 16 GiB
$0.01052/min
L4x4x16
1 L4 GPU, 24 GiB VRAM, 4 vCPUs, 16 GiB
$0.01414/min
A10Gx4x16
1 A10s GPU, 24 GiB VM, 4 vCPUs, 16 GiB
$0.02012/min
A100x12x144
1 A100 GPU, 80 GiB VRAM, 12 vCPUs, 144 GiB
$0.10240/min
H100x26x234
1 H100 GPU, 80 GiB VRAM, 26 vCPUs, 234 GiB
$0.16640/min
H100MIG:3x13x117
1 H100 MIG, 40 GiB VRAM, 13 vCPUs, 117 GiB
$0.08250/min
1x2
1 vCPU, 2GiB RAM
$0.00058/min
1x4
1 vCPU, 4GiB RAM
$0.0008/min
2x8
2 vCPUs, 8GiB RAM
$0.00173/min
4x16
4 vCPUs, 16GiB RAM
$0.00346/min
8x32
8 vCPUs, 32GiB RAM
$0.00691/min
16x64
16 vCPUs, 64GiB RAM
$0.01382/min

Commonly asked questions

  • You can deploy open source and custom models on Baseten. Start with an off-the-shelf model from our model library. Or deploy any model using Truss, our open source standard for packaging and serving models built in any framework. You can deploy open source and custom models on Baseten. Start with an off-the-shelf model from our model library. Or deploy any model using Truss, our open source standard for packaging and serving models built in any framework.
  • You have control over what GPUs your models use. See our instance type reference for a full list of the GPUs currently available on Baseten. Reach out to us to request additional GPU types.
  • Yes, new Baseten accounts come with $30 of free credit so that you can start running models for free.
  • No, you do not pay for idle time – you only pay for the time your model is using compute on Baseten. This includes the time your model is actively deploying, scaling up or down, or making predictions. And you have full control over how your model scales up or down.
  • Customer support levels vary by plan. We offer email, in-app chat, Slack, and Zoom support. We also offer dedicated forward-deployed engineering support. Talk to our Sales team to figure out a customer support level that works for your needs.
  • Yes, discounts on compute can be negotiated as part of our Pro plan. Talk to our Sales team to learn more.
  • Yes, you can self-host Baseten in order to manage security and use your own cloud commitments. Talk to our Sales team to learn more.

Explore Baseten today

We love partnering with companies developing innovative AI products by providing the most customizable model deployment with the lowest latency.