New in November 2023

SDXL prompt: A green sailboat in the icy sea

TL;DR

New this month: Resources to make it easier to switch from closed source inference endpoints to open source ML models, a guide to model inference math, and Stability.ai's new generative AI video model

Migrating to open source models in 3 lines of code

Switching from closed source inference endpoints to open source ML models can seem like an intimidating leap. To make it easier, we created a checklist for going open source and built some tooling to make the migration seamless.

A checklist for switching to open source ML models

If you’re using the ChatCompletions API and want to experiment with open source LLMs for your generative AI application, we’ve built a bridge that lets you try out models like Mistral 7B with just three tiny code changes.

This special inference endpoint lets you use open source models like Mistral 7B without ripping out the OpenAI client:

1from openai import OpenAI
2import os
3
4client = OpenAI(
5   # api_key=os.environ["OPENAI_API_KEY"],
6   api_key=os.environ["BASETEN_API_KEY"],
7   # Add base_url
8   base_url="https://bridge.baseten.co/{model_id}/v1"
9)
10
11response = client.chat.completions.create(
12 # model="gpt-3.5-turbo",
13 model="mistral-7b",
14 messages=[
15   {"role": "user", "content": "Who won the world series in 2020?"},
16   {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
17   {"role": "user", "content": "Where was it played?"}
18 ]
19)
20
21print(response.choices[0].message.content)

Get started with our written tutorial or video walkthrough.

And when you switch to open source, you get a level of customization, security, and independence not offered by closed source inference endpoints.

Calculating LLM performance

When you’re running LLMs on GPUs, it’s important to make inference as performant as possible to capture the maximum value from the expensive hardware required to run these models.

The first step to optimization is understanding where the bottleneck exists: is inference compute-bound or memory-bound?

In our new guide to LLM inference and performance, we dive deep into the math behind LLM inference to figure out the limiting factors on model speed and GPU usage. The results may be surprising — join the discussion on Hacker News and share your thoughts!

Introducing Stable Video Diffusion

Stable Video Diffusion is a new research image-to-video model from Stability.ai. The model creates short videos, adding movement to images.

Animated koi fish in a pond generated by SDXL + Stable Video Diffusion

Here’s an introduction to Stable Video Diffusion with more details on the model. If you want to deploy Stable Video Diffusion, let us know and we can get you set up with this model.

We’ll be back next month with more models, guides, and open source projects!

Thanks for reading!

— The team at Baseten

Subscribe to our newsletter

Stay up to date on model performance, GPUs, and more.

‌

New in November 2023

TL;DR

Migrating to open source models in 3 lines of code

Calculating LLM performance

Introducing Stable Video Diffusion

Subscribe to our newsletter

Related Product posts

Introducing Baseten Embeddings Inference: The fastest embeddings solution available

Baseten Chains is now GA for production compound AI systems

New observability features: activity logging, LLM metrics, and metrics dashboard customization