Product

Topics

Latest Model performance Hacks & projects GPU guides ML models Glossary Community Product News

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Baseten Embeddings Inference (BEI) delivers 2x higher throughput and 10% lower latency for production embedding, reranker and classification models at scale.

Michael Feil

1 other

Baseten Embeddings Inference (BEI) is a new toolkit offering the most performant embeddings inference in production

Baseten Chains is now GA for production compound AI systems

Baseten Chains delivers ultra-low-latency compound AI at scale, with custom hardware per model and simplified model orchestration.

Marius Killinger

2 others

Baseten Chains delivers ultra-low-latency, scalable compound AI with custom hardware per model and seamless model orchestration.

New observability features: activity logging, LLM metrics, and metrics dashboard customization

We added three new observability features for improved monitoring and debugging: an activity log, LLM metrics, and customizable metrics dashboards.

Suren Atoyan

4 others

Introducing three new observability features on Baseten: the activity log, LLM metrics, and customizable metrics dashboards

Introducing our Speculative Decoding Engine Builder integration for ultra-low-latency LLM inference

Our new Speculative Decoding integration can cut latency in half for production LLM workloads.

Justin Yi

3 others

Baseten's Speculative Decoding integration can cut latency in half for production LLM workloads.

Introducing Custom Servers: Deploy production-ready model servers from Docker images

Deploy production-ready model servers on Baseten directly from any Docker image using just a YAML file.

Tianshu Cheng

2 others

Custom Servers on Baseten let you deploy any Docker image as a production-ready ML model server.

Create custom environments for deployments on Baseten

Test and deploy ML models reliably with production-ready custom environments, persistent endpoints, and seamless CI/CD.

Samiksha Pal

3 others

Create custom environments for deployments on Baseten

Introducing canary deployments on Baseten

Our canary deployments feature lets you roll out new model deployments with minimal risk to your end-user experience.

Sid Shanker

3 others

Canary deployments on Baseten let you gradually increase the amount of incoming traffic to new deployments.

Using asynchronous inference in production

Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.

Samiksha Pal

2 others

The overall asynchronous inference workflow.

Baseten Chains explained: building multi-component AI workflows at scale

A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows

Marius Killinger

1 other

New in May 2024

AI events, multicluster model serving architecture, tokenizer efficiency, and forward-deployed engineering

Baseten

Prompt: A solarpunk pier for a futuristic water taxi

1 2 3 4 5