Changes to instance type management
As part of ongoing improvements to Baseten’s infrastructure platform, we’re working on giving you more flexibility in how resources are provisioned for each model deployment. In the interim, we’re...
See our latest feature releases, product improvements and bug fixes
Oct 31, 2024
As part of ongoing improvements to Baseten’s infrastructure platform, we’re working on giving you more flexibility in how resources are provisioned for each model deployment. In the interim, we’re...
Oct 15, 2024
We're excited to introduce canary deployments on Baseten, designed to phase in new deployments with minimal impact on production latency and uptime. When enabled for a model, Baseten gradually shifts...
Oct 11, 2024
Today we’re excited to introduce custom environments to help manage your model’s release cycles. Environments provide a way to ensure quality, stability, and scalability before your model reaches end...
Oct 9, 2024
We've introduced three new request metrics to enhance model monitoring. You can now view percentiles and averages for the following: - Request size: Tracks the distribution of request sizes, serving...
Oct 2, 2024
You can now easily export model inference metrics to your favorite observability platforms, including Prometheus , Datadog , Grafana Cloud , and New Relic ! Seamlessly sync inference request counts,...
Sep 26, 2024
Today we introduced early access to Baseten Hybrid , a multi-cloud solution that enables you to self-host inference with seamless flex capacity on Baseten Cloud. Check out our announcement blog to...
Sep 16, 2024
Our OpenAI Bridge is now compatible with vLLM models out of the box! Deploy your vLLM model with Truss and let the docs guide you to an easy integration using the OpenAI completions SDK.
Sep 15, 2024
As of Truss version 0.9.34, you can now promote Chains to a production environment, bringing the same deployment workflow used for Truss Models to Chains. To promote a Chain, simply use the --promote...
Sep 13, 2024
We improved truss watch for more reliable live reloads, so you can test changes on production hardware in seconds without manually pushing via truss push or creating new deployments each time. Adding...
Sep 12, 2024
Models deployed with the TensorRT-LLM Engine Builder now support function calling (aka tool use) and structured output (aka JSON mode). Learn more: Launch announcement blog post Engineering deep dive...