5 BEI models

Allen AITulu 3 8B Reward

Text embeddings
V3RewardBEIH100 MIG 40GB

BAAIBGE Reranker M3

Text embeddings
BEIH100

BAAIBGE Embedding ICL

Text embeddings
BEIH100

MixedbreadMixedbread Embed Large V1

Text embeddings
V1EmbeddingBEIL4

Nomic AI logoNomic Embed Code

Text embeddings
BEIH100 MIG 40GB

BEI is Baseten's solution for production-grade deployments via TensorRT-LLM for (text) embeddings, reranking models and prediction models. With BEI you get the following benefits:

  • Lowest-latency inference across any embedding solution (vLLM, SGlang, Infinity, TEI, Ollama)

  • Highest-throughput inference across any embedding solution (vLLM, SGlang, Infinity, TEI, Ollama) - thanks to XQA kernels, FP8 and dynamic batching.2

  • High parallelism: up to 1400 client embeddings per second

  • Cached model weights for fast vertical scaling and high availability - no Hugging Face hub dependency at runtime

Deploy any model in just a few commands

Avoid getting tangled in complex deployment processes. Deploy best-in-class open-source models and take advantage of optimized serving for your own models.

$

truss init -- example stable-diffusion-2-1-base ./my-sd-truss

$

cd ./my-sd-truss

$

export BASETEN_API_KEY=MdNmOCXc.YBtEZD0WFOYKso2A6NEQkRqTe

$

truss push

INFO

Serializing Stable Diffusion 2.1 truss.

INFO

Making contact with Baseten 👋 👽

INFO

🚀 Uploading model to Baseten 🚀

Upload progress: 0% | | 0.00G/2.39G