Export your model inference metrics to your favorite observability tool

Today, we’re excited to announce our new export metrics integration, which lets you export model inference metrics like response time, replica count, and hardware utilization to a long list of observability platforms including Grafana, New Relic, Datadog, and Prometheus.

Screenshot showing response time side-by-side in Baseten and Grafana

Exporting metrics unlocks key improvements to production model management workflows:

  • Single source of truth: unify data within your observability stack and build dashboards that integrate every component of your AI-powered applications.

  • Fine-grained metrics and control: dive deeper into your metrics with the power of dedicated tooling.

  • Custom alerts: wire in your inference metrics to get alerts for unexpected traffic, latency, or status codes.

Integrating metrics from model inference into your broader observability stack is essential for operating AI-powered applications in production.

Rime’s model serving infrastructure is mission-critical – our stellar uptime relies on Baseten’s stellar uptime. We have complex deployments combining multiple models with key infrastructure components, all of which needs to be monitored. Adding our Baseten metrics to our Grafana dashboards lets us apply the same DevOps best practices to our AI models that we do to our other critical components.

Lily Clifford, CEO of Rime

To start integrating your inference metrics with your existing observability tools, check out the documentation or read on for answers to common questions.

What metrics are supported?

Almost everything that you see in the “Metrics” tab for a model in your Baseten workspace can be exported via this integration.

Screenshot showing inference volume side-by-side in Baseten and Grafana

Available metrics include:

  • Inference request count, with status, environment (i.e. production vs development), and more.

  • End-to-end response time, including the status and environment.

  • Replica count, both active and starting up.

  • Hardware usage, both compute and memory, for CPU and GPU resources.

For a full reference, see the metrics support matrix in the documentation.

How are the metrics exported?

Our metrics export is based on the vendor-neutral OpenTelemetry standard to maximize compatibility across platforms. Metrics are integrated by configuring a Prometheus receiver for an OpenTelemetry Collector.

Metrics are available via a /metrics endpoint, which observability tools can scrape at a set interval (we generally recommend every 60 seconds). Collecting metrics requires configuring a receiver, such as this example for Prometheus:

1global:
2  scrape_interval: 60s
3scrape_configs:
4  - job_name: 'baseten'
5    metrics_path: '/metrics'
6    authorization:
7      type: "Api-Key"
8      credentials: "{BASETEN_API_KEY}"
9    static_configs:
10      - targets: ['app.baseten.co']
11    scheme: https

See the export metrics documentation for details on receiver configuration.

What observability platforms are supported?

We chose to implement our exporting feature with OpenTelemetry in mind for the broad compatibility it gives across a wide range of tools. You can scrape Baseten inference metrics from any exporter in the OpenTelemetry registry – over 800 tools at the time of writing.

In addition to this broad support, we’ve ensured compatibility with and written specific documentation for a set of popular platforms:

To confirm support or add documentation for your favorite observability platform, get in touch and we’ll work with you to build and test the integration!