Lead Developer Advocate
Machine learning infrastructure that just works
Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalable, and cost-efficiently.
Lead Developer Advocate
Benchmarking Stable Diffusion XL performance across latency, throughput, and cost depends on factors from hardware to model variant to inference config.
This guide helps you interpret LLM performance metrics to make direct comparisons on latency, throughput, and cost.
Mixtral 8x7B structurally has faster inference than similarly-powerful Llama 2 70B, but we can make it even faster using TensorRT-LLM and int8 quantization.
Playground v2, a new text-to-image model, matches SDXL's speed & quality with a unique AAA game-style aesthetic. Ideal choice varies by use case & art taste.
This guide details deploying ComfyUI image generation pipelines via API for app integration, using Truss for packaging & production deployment.
The A10, an Ampere-series GPU, excels in tasks like running 7B parameter LLMs. AWS's A10G variant, similar in GPU memory & bandwidth, is mostly interchangeable.
Use ChatCompletions API to test open-source LLMs like Llama in your AI app with just three minor code modifications.
Building on top of open source models gives you access to a wide range of capabilities that you would otherwise lack from a black box endpoint provider.