GPU guides
Evaluating NVIDIA H200 Tensor Core GPUs for LLM inference
Are NVIDIA H200 GPUs cost-effective for model inference? We tested an 8xH200 cluster provided by Lambda to discover suitable inference workload profiles.
Using fractional H100 GPUs for efficient model serving
Multi-Instance GPUs enable splitting a single H100 GPU across two model serving instances for performance that matches or beats an A100 GPU at a 20% lower cost.
NVIDIA A10 vs A10G for ML model inference
The A10, an Ampere-series GPU, excels in tasks like running 7B parameter LLMs. AWS's A10G variant, similar in GPU memory & bandwidth, is mostly interchangeable.
NVIDIA A10 vs A100 GPUs for LLM and Stable Diffusion inference
This article compares two popular GPUs—the NVIDIA A10 and A100—for model inference and discusses the option of using multi-GPU instances for larger models.
Understanding NVIDIA’s Datacenter GPU line
This guide helps you navigate NVIDIA’s datacenter GPU lineup and map it to your model serving needs.
Comparing GPUs across architectures and tiers
So what are reliable metrics for comparing GPUs across architectures and tiers? We’ll consider core count, FLOPS, VRAM, and TDP.
Comparing NVIDIA GPUs for AI: T4 vs A10
Comparing NVIDIA T4 vs. A10 GPUs for AI training/art: We analyze price & specs to determine the best GPU for ML.
Choosing the right horizontal scaling setup for high-traffic models
Horizontal scaling via replicas with load balancing is an important technique for handling high traffic to an ML model.
How to choose the right instance size for your ML models
This post simplifies instance sizing with heuristics to choose an optimal size for your model, balancing performance and compute cost.