Product
Create custom environments for deployments on Baseten
Test and deploy ML models reliably with production-ready custom environments, persistent endpoints, and seamless CI/CD.
Introducing canary deployments on Baseten
Our canary deployments feature lets you roll out new model deployments with minimal risk to your end-user experience.
Using asynchronous inference in production
Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.
Baseten Chains explained: building multi-component AI workflows at scale
A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows
New in May 2024
AI events, multicluster model serving architecture, tokenizer efficiency, and forward-deployed engineering
New in April 2024
Use four new best in class LLMs, stream synthesized speech with XTTS, and deploy models with CI/CD
New in March 2024
Fast Mistral 7B, fractional H100 GPUs, FP8 quantization, and API endpoints for model management.
New in February 2024
3x throughput with H100 GPUs, 40% lower SDXL latency with TensorRT, and multimodal open source models.
New in January 2024
A library for open source models, general availability for L4 GPUs, and performance benchmarking for ML inference
New in December 2023
Faster Mixtral inference, Playground v2 image generation, and ComfyUI pipelines as API endpoints.