Baseten Blog | Page 3
Introducing Baseten Self-hosted
Gain granular control over data locality, align with strict compliance standards, meet specific performance requirements, and more with Baseten Self-hosted.
Compound AI systems explained
Compound AI systems combine multiple models and processing steps, and are forming the next generation of AI products.
Introducing automatic LLM optimization with TensorRT-LLM Engine Builder
The TensorRT-LLM Engine Builder empowers developers to deploy extremely efficient and performant inference servers for open source and fine-tuned LLMs.
Deploying custom ComfyUI workflows as APIs
Easily package your ComfyUI workflow to use any custom node or model checkpoint.
Ten reasons to join Baseten
Baseten is a Series B startup building infrastructure for AI. We're actively hiring for many roles — here are ten reasons to join the Baseten team.
How to serve 10,000 fine-tuned LLMs from a single GPU
LoRA swapping with TRT-LLM supports in-flight batching and loads LoRA weights in 1-2 ms, enabling each request to hit a different fine-tune.
Using asynchronous inference in production
Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.
Baseten Chains explained: building multi-component AI workflows at scale
A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows
Introducing Baseten Chains
Learn about Baseten's new Chains framework for deploying complex ML inference workflows across compound AI systems using multiple models and components