Oct 9, 2023

Measure end-to-end response time vs inference time

On the model metrics tab, you can now use the dropdown menu to toggle between two different views for model inference time:

End-to-end response time includes time for cold starts, queuing, and inference (but not client-side latency). This most closely mirrors the performance of your model as experienced by your users.
Inference time includes just the time spent running the model, including pre- and post-processing. This is useful for optimizing the performance of your model code at the single replica level.