San Jose Sharks Vs Carolina Hurricanes @ GTC
Join us in a private suite for the Hurricanes vs Sharks game during NVIDIA GTC.
Enjoy hockey & refreshments while you network with fellow AI experts.
Host
Baseten
Related resources
How we built high-throughput embedding, reranker, and classifier inference with TensorRT-LLM
Discover how we optimized embedding, reranker, and classifier inference using TensorRT-LLM, doubling throughput and achieving ultra-low latency at scale.
Introducing Baseten Embeddings Inference: The fastest embeddings solution available
Baseten Embeddings Inference (BEI) delivers 2x higher throughput and 10% lower latency for production embedding, reranker and classification models at scale.
Announcing Baseten’s $75M Series C
Baseten raised a $75M Series C to power mission-critical AI inference for leading AI companies.
How multi-node inference works for massive LLMs like DeepSeek-R1
Running DeepSeek-R1 on H100 GPUs requires multi-node inference to connect the 16 H100s needed to hold the model weights.
Testing Llama 3.3 70B inference performance on NVIDIA GH200 in Lambda Cloud
The NVIDIA GH200 Superchip combines an NVIDIA Hopper GPU with an ARM CPU via high-bandwidth interconnect
Baseten Chains is now GA for production compound AI systems
Baseten Chains delivers ultra-low-latency compound AI at scale, with custom hardware per model and simplified model orchestration.
Live podcast: What we learned building compound AI from our customers
Learn how to deploy ultra-low-latency compound AI with seamless model orchestration, custom autoscaling, and optimized hardware.