We raised a $75m series C to build the future of inference

Baseten / Blog / GPU guides

GPU guides

Topics

Latest Model performance Hacks & projects GPU guides ML models Glossary Community Product News

Accelerating inference with NVIDIA B200 GPUs

NVIDIA B200 GPUs improve cost, throughput, and latency for use cases like code generation, search, reasoning, agents, and more.

Philip Kiely

High-performance AI inference with NVIDIA B200 GPUs

Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud

The NVIDIA GH200 Superchip combines an NVIDIA Hopper GPU with an ARM CPU via high-bandwidth interconnect

Philip Kiely

Pankaj Gupta

1 other

Header image

Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference

Are NVIDIA H200 GPUs cost-effective for model inference? We tested an 8xH200 cluster provided by Lambda to discover suitable inference workload profiles.

Philip Kiely

Pankaj Gupta

1 other

AI image

Using fractional H100 GPUs for efficient model serving

Multi-Instance GPUs enable splitting a single H100 GPU across two model serving instances for performance that matches or beats an A100 GPU at a 20% lower cost.

Philip Kiely

Pankaj Gupta

Vlad Shulman

Matt Howard

3 others

Prompt: Two tron-style motorcycles racing on an empty highway

NVIDIA A10 vs A10G for ML model inference

The A10, an Ampere-series GPU, excels in tasks like running 7B parameter LLMs. AWS's A10G variant, similar in GPU memory & bandwidth, is mostly interchangeable.

Philip Kiely

Prompt: a glowing solarpunk treehouse

NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference

This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.

Philip Kiely

NVIDIA A10 vs A100

Understanding NVIDIA’s Datacenter GPU line

This guide helps you navigate NVIDIA’s datacenter GPU lineup and map it to your model serving needs.

Philip Kiely

Prompt: A glowing cyberpunk GPU embedded in a field

Comparing GPUs across architectures and tiers

So what are reliable metrics for comparing GPUs across architectures and tiers? We’ll consider core count, FLOPS, VRAM, and TDP.

Philip Kiely

Prompt: A glowing solarpunk GPU in a forest

Comparing NVIDIA GPUs for AI: T4 vs A10

Comparing NVIDIA T4 vs. A10 GPUs for AI training/art: We analyze price & specs to determine the best GPU for ML.

Philip Kiely

Prompt: two glowing solarpunk GPUs

Choosing the right horizontal scaling setup for high-traffic models

Horizontal scaling via replicas with load balancing is an important technique for handling high traffic to an ML model.

Philip Kiely

Prompt: A glowing GPU in the mountains