Model Performance Engineer

Michael Feil

Model performance

How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM

Discover how we optimized embedding, reranker, and classifier inference using TensorRT-LLM, doubling throughput and achieving ultra-low latency at scale.

Product

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Baseten Embeddings Inference (BEI) delivers 2x higher throughput and 10% lower latency for production embedding, reranker and classification models at scale.

Machine learning infrastructure that just works

Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalable, and cost-efficiently.