BEI

5 BEI models

Tulu 3 8B Reward

Text embeddings

V3RewardBEIH100 MIG 40GB

BGE Reranker M3

Text embeddings

BEIH100

BGE Embedding ICL

Text embeddings

BEIH100

Mixedbread Embed Large V1

Text embeddings

V1EmbeddingBEIL4

Prompt: A movie still of a squirrel in a forest green ski suit

Nomic Embed Code

Text embeddings

BEIH100 MIG 40GB

BEI is Baseten's solution for production-grade deployments via TensorRT-LLM for (text) embeddings, reranking models and prediction models. With BEI you get the following benefits:

Lowest-latency inference across any embedding solution (vLLM, SGlang, Infinity, TEI, Ollama)
Highest-throughput inference across any embedding solution (vLLM, SGlang, Infinity, TEI, Ollama) - thanks to XQA kernels, FP8 and dynamic batching.2
High parallelism: up to 1400 client embeddings per second
Cached model weights for fast vertical scaling and high availability - no Hugging Face hub dependency at runtime

5 BEI models

Deploy any model in just a few commands