We raised a $75m series C to build the future of inference

Baseten Blog | Page 4

Topics

Latest Model performance Hacks & projects GPU guides ML models Glossary Community Product News

1 2 3 4 5…14

SPC hackathon winners build with Llama 3.1 on Baseten

SPC hackathon winner TestNinja and finalist VibeCheck used Baseten to power apps for test generation and mood board creation.

Philip Kiely

Introducing Baseten Self-hosted

Gain granular control over data locality, align with strict compliance standards, meet specific performance requirements, and more with Baseten Self-hosted.

Rachel Rapp

1 other

Baseten Self-hosted workflow with inference running in a customer's VPC

Compound AI systems explained

Compound AI systems combine multiple models and processing steps, and are forming the next generation of AI products.

Rachel Rapp

An AI-generated image representing a compound AI system with multiple components.

Introducing automatic LLM optimization with TensorRT-LLM Engine Builder

The TensorRT-LLM Engine Builder empowers developers to deploy extremely efficient and performant inference servers for open source and fine-tuned LLMs.

Philip Kiely

Abu Qader

1 other

TensorRT-LLM Engine Builder header

Hacks & projects

Deploying custom ComfyUI workflows as APIs

Easily package your ComfyUI workflow to use any custom node or model checkpoint.

Rachel Rapp

Het Trivedi

1 other

Ten reasons to join Baseten

Baseten is a Series B startup building infrastructure for AI. We're actively hiring for many roles — here are ten reasons to join the Baseten team.

Philip Kiely

Dustin Michaels

Dustin Michaels

1 other

Team photo, Q3 2023

Model performance

How to serve 10,000 fine-tuned LLMs from a single GPU

LoRA swapping with TRT-LLM supports in-flight batching and loads LoRA weights in 1-2 ms, enabling each request to hit a different fine-tune.

Philip Kiely

Pankaj Gupta

1 other

Prompt: Different-colored friendly robots standing in a field

Using asynchronous inference in production

Learn how async inference works, protects against common inference failures, is applied in common use cases, and more.

Rachel Rapp

Helen Yang

Samiksha Pal

2 others

The overall asynchronous inference workflow.

Baseten Chains explained: building multi-component AI workflows at scale

A Delightful Developer Experience for Building and Deploying Compound ML Inference Workflows

Rachel Rapp

Marius Killinger

Marius Killinger

1 other

Baseten Chains explained image

1 2 3 4 5…14