GPU guides

Using fractional H100 GPUs for efficient model serving

Multi-Instance GPUs enable splitting a single H100 GPU across two model serving instances for performance that matches or beats an A100 GPU at a 20% lower cost.

3 others

NVIDIA A10 vs A10G for ML model inference

The A10, an Ampere-series GPU, excels in tasks like running 7B parameter LLMs. AWS's A10G variant, similar in GPU memory & bandwidth, is mostly interchangeable.

NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference

This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.

Understanding NVIDIA’s Datacenter GPU line

This guide helps you navigate NVIDIA’s datacenter GPU lineup and map it to your model serving needs.

Comparing GPUs across architectures and tiers

So what are reliable metrics for comparing GPUs across architectures and tiers? We’ll consider core count, FLOPS, VRAM, and TDP.

Comparing NVIDIA GPUs for AI: T4 vs A10

Comparing NVIDIA T4 vs. A10 GPUs for AI training/art: We analyze price & specs to determine the best GPU for ML.

Choosing the right horizontal scaling setup for high-traffic models

Horizontal scaling via replicas with load balancing is an important technique for handling high traffic to an ML model.

How to choose the right instance size for your ML models

This post simplifies instance sizing with heuristics to choose an optimal size for your model, balancing performance and compute cost.