Baseten Blog | Page 4
Comparing few-step image generation models
Few-step image generation models like LCMs, SDXL Turbo, and SDXL Lightning can generate images fast, but there's a tradeoff when it comes to speed vs quality.
How latent consistency models work
Latent Consistency Models (LCMs) improve on generative AI methods to produce high-quality images in just 2-4 steps, taking less than a second for inference.
New in May 2024
AI events, multicluster model serving architecture, tokenizer efficiency, and forward-deployed engineering
What I learned as a forward-deployed engineer working at an AI startup
My first six months at Baseten exposed me to a huge range of exciting engineering challenges as I learned how to make an impact as a forward-deployed engineer.
Control plane vs workload plane in model serving infrastructure
A separation of concerns between a control plane and workload planes enables multi-cloud, multi-region model serving and self-hosted inference.
Comparing tokens per second across LLMs
To accurately compare tokens per second between different large language models, we need to adjust for tokenizer efficiency.
New in April 2024
Use four new best in class LLMs, stream synthesized speech with XTTS, and deploy models with CI/CD
CI/CD for AI model deployments
In this article, we outline a continuous integration and continuous deployment (CI/CD) pipeline for using AI models in production.
Streaming real-time text to speech with XTTS V2
In this tutorial, we'll build a streaming endpoint for the XTTS V2 text to speech model with real-time narration and 200 ms time to first chunk.