New request metrics

We've introduced three new request metrics to enhance model monitoring. You can now view percentiles and averages for the following: -

  • Request size: Tracks the distribution of request sizes, serving as a proxy for input tokens.

  • Response size: Monitors the distribution of response sizes, acting as a proxy for generated tokens.

  • Time to first byte: Measures the time-to-first-byte (TTFB), including any queuing and routing time, offering insights into overall latency.