Baseten Blog | Page 1

Topics

Latest Model performance Hacks & projects GPU guides ML models Glossary Community Product News

Accelerating inference with NVIDIA B200 GPUs

NVIDIA B200 GPUs improve cost, throughput, and latency for use cases like code generation, search, reasoning, agents, and more.

Philip Kiely

High-performance AI inference with NVIDIA B200 GPUs

Community

Building performant embedding workflows with Chroma and Baseten

Integrate Chroma’s open-source vector database with Baseten’s fast inference engine for efficient, real-time embedding inference in your AI-native apps.

Philip Kiely

Build performant embedding workflows with Chroma and Baseten

ML models

The best open-source embedding models

Discover the best open-source embedding models for search, RAG, and recommendations—curated picks for performance, speed, and cost-efficiency.

Philip Kiely

Model performance

How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM

Discover how we optimized embedding, reranker, and classifier inference using TensorRT-LLM, doubling throughput and achieving ultra-low latency at scale.

Michael Feil

1 other

A library -- it's classical with dark wooden shelves and glowing golden lights and grand architectural design. However, the books -- which fly on and off the shelves themselves -- are ghostly glowing blue holograms.

Product

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Baseten Embeddings Inference (BEI) delivers 2x higher throughput and 10% lower latency for production embedding, reranker and classification models at scale.