Baseten / Blog / News

Introducing Baseten Hybrid: control and flexibility in your cloud and ours

Phil Howes

Mike Bilodeau

Rachel Rapp

A GIF showing Baseten Hybrid: inference is run in your VPC, with optional spillover to Baseten Cloud.

TL;DR

With Baseten Hybrid, you can run sensitive workloads in your own VPC with flex compute available on-demand on Baseten Cloud. As part of our launch of Baseten Hybrid early access, we're offering special pricing from now until the end of October for companies looking to scale on their own VPC: deploy self-hosted workloads at a flat rate of $25K per region, with no additional fees for GPU utilization. Get in touch to learn more!

A GIF of Baseten Hybrid: workloads run in your cloud, with optional spillover to Baseten cloud

Baseten Hybrid enables you to run inference in your own VPC, with optional spillover to Baseten Cloud for on-demand flex compute.

We’re excited to introduce early access to Baseten Hybrid, a multi-cloud solution that enables you to run inference in your cloud—with the ability to add flex capacity in ours.

We’ve seen it time and again: AI builders want to self-host some of their workloads for compliance reasons, but when traffic picks up, they need access to additional compute. With Baseten Hybrid, you have complete control over your policies and workloads while gaining true cloud elasticity: seamlessly spill over to Baseten Cloud, with zero engineering effort required.

Whether you’re using your cloud credits or negotiating your GPU commits with different cloud providers, you can bring them to Baseten while gaining the performance, scalability, and security we excel in.

No other solution on the market offers this level of multi-cloud flexibility.

How Baseten Hybrid works

Baseten Hybrid combines our two solutions for ML inference: Self-hosted and Cloud. Keep sensitive workloads securely on your own cloud to meet specific data residency requirements or fully utilize in-house resources. When you need extra compute, effortlessly spill over to ours. Specify regions to reduce latency and meet compliance, and do it all with zero effort to make your workloads compliant with different platforms.

Many companies label their multi-cloud solutions as "hybrid," but in reality, they just shuffle all of your workloads between public cloud providers. Instead of waiting for enough compute to run everything on one cloud, we dynamically distribute your workloads across any available capacity. You get true cloud elasticity and agnosticism, with zero time investment required; our infra takes care of that.

Gain full support and management from our team of expert engineers, with zero-downtime deployments and no-disruption infrastructure updates. Eliminate vendor lock-in while utilizing your existing GPU allocation, spend commit, and credits with cloud providers like AWS and GCP.

Hybrid cloud use cases

Gain early access to Baseten Hybrid for:

Meeting specific security needs. Self-host workloads according to strict data residency requirements, IP protection, requirements from customers, or regulations that mandate that inference be run on your cloud.
Multi-cloud elasticity and spillover. Enjoy the same experience of Baseten Cloud, while defining which workloads run where. Use your GPUs whenever compute is available, and spill over to our compute to maintain SLAs during traffic spikes.
Spend down cloud commits and avoid vendor lock-in. Our multi-cloud solution enables enterprises to flexibly use any cloud vendor while utilizing existing GPU allocation, spend commit, and credits.
Blazing-fast inference. Our ML infra is optimized for ultra-low-latency inference with elastic scale. We use the best inference optimization techniques (including TensorRT/TensorRT-LLM and vLLM) and model serving tooling, with instance- and network-level improvements for blazing-fast cold starts and end-to-end latency.
Compound AI systems. Baseten Self-hosted works with any setup, including compound AI. Build modular, scalable, efficient pipelines using multiple models and processing steps, while optimizing GPU utilization and reducing latency.

A gif showing the flow of data through a modularized workflow built with Baseten Chains

A speech-to-text Chain with custom autoscaling for each step.

Baseten Hybrid vs. Baseten Self-hosted vs. Baseten Cloud

Baseten Hybrid combines the best of Baseten Self-hosted and Baseten Cloud:

For a more detailed comparison, check out our guide to choosing the right deployment option for your inference workloads.

Our mission is to support companies with highly performant, reliable, and secure AI infrastructure customized to their needs. We built Baseten Hybrid to enable you to do inference in your own cloud and meet stringent security requirements—without compromising on performance. Future-proof your infrastructure against traffic bursts and maintain SLAs without additional capital expenditure.

If we can help you meet your security and compliance needs while providing necessary resources, scalability, and performance, get in touch!

Subscribe to our newsletter

Stay up to date on model performance, GPUs, and more.

‌

Introducing Baseten Hybrid: control and flexibility in your cloud and ours

TL;DR

How Baseten Hybrid works

Hybrid cloud use cases

Baseten Hybrid vs. Baseten Self-hosted vs. Baseten Cloud

Subscribe to our newsletter

Related News posts

Announcing Baseten’s $75M Series C

Export your model inference metrics to your favorite observability tool

Baseten partners with Google Cloud to deliver high-performance AI infrastructure to a broader audience