Model Performance Engineer

Michael Feil

How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM

Discover how we optimized embedding, reranker, and classifier inference using TensorRT-LLM, doubling throughput and achieving ultra-low latency at scale.

Michael Feil

1 other

A library -- it's classical with dark wooden shelves and glowing golden lights and grand architectural design. However, the books -- which fly on and off the shelves themselves -- are ghostly glowing blue holograms.

Product

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Baseten Embeddings Inference (BEI) delivers 2x higher throughput and 10% lower latency for production embedding, reranker and classification models at scale.

Michael Feil

1 other

Baseten Embeddings Inference (BEI) is a new toolkit offering the most performant embeddings inference in production

Machine learning infrastructure that just works

Baseten provides all the infrastructure you need to deploy and serve ML models performantly, scalable, and cost-efficiently.