Skip to content

HTTP Scaler Metrics Architecture

This document explains how HTTP traffic metrics are collected, aggregated, and used for scaling decisions in Kedify.

Kedify uses a distributed metrics collection system where:

  1. kedify-proxy (Envoy) handles traffic and collects metrics
  2. Interceptor aggregates metrics and pushes them to the scaler via gRPC bridge
  3. Scaler receives metrics and reports to KEDA
  4. KEDA makes scaling decisions based on metrics

Http Scaler Metrics Architecture)

The kedify-proxy is an Envoy-based proxy that:

  • Routes HTTP traffic to backend services
  • Collects traffic metrics (RPS, Concurrency)
  • Pushes metrics to the interceptor via gRPC

Envoy Configuration:

stats_flush_interval: 1s
stats_sinks:
- name: kedify_metrics_sink
typed_config:
'@type': type.googleapis.com/envoy.config.metrics.v3.MetricsServiceConfig
transport_api_version: V3
report_counters_as_deltas: true
grpc_service:
envoy_grpc:
cluster_name: kedify_metrics_service

The interceptor runs a gRPC server on port 9901 that receives metrics from kedify-proxy. This server is exposed trough a kubernetes service pointing to port 9901 on the interceptor.

Key Components:

  • MetricsServiceServer: Implements Envoy’s MetricsService gRPC interface
  • externalQueues: In-memory storage for metrics received from Envoy
  • Cluster-to-HSO mapping: Maps Envoy cluster names to HTTPScaledObjects

Service: keda-add-ons-http-interceptor-kedify-proxy-metric-sink

ports:
- name: proxy
port: 9901
targetPort: 9901
- name: control-plane
port: 5678
targetPort: 5678

The interceptor pushes metrics to the scaler via a gRPC bridge (enabled by default). This is a streaming connection where the interceptor periodically sends metric batches.

How It Works:

  • Interceptor acts as gRPC client, connects to scaler
  • Scaler acts as gRPC server, receives metric streams
  • Metrics are pushed periodically

Note: The /queue REST endpoint still exists for debugging. See Debugging section.

The scaler:

  • Receives metrics from interceptors via gRPC bridge (see above)
  • Implements KEDA’s External Scaler gRPC interface
  • Aggregates metrics across all interceptors when KEDA queries

KEDA uses the scaler’s metrics to:

  1. Determine if workload should be active (scale from zero)
  2. Calculate desired replica count
  3. Update the HorizontalPodAutoscaler (HPA)
  • Source: cluster.upstream_rq_total from Envoy
  • Use case: scalingMetric: requestRate
  • Source: cluster.upstream_rq_active from Envoy
  • Use case: scalingMetric: concurrency

Metrics are keyed by namespace/httpscaledobject-name. This means:

  • All traffic matching an HTTPScaledObject is aggregated together
  • Different paths, hosts, or query parameters within the same HSO share metrics
Terminal window
kubectl get --raw /api/v1/namespaces/keda/services/keda-add-ons-http-interceptor-admin:9090/proxy/queue
Terminal window
kubectl get --raw /api/v1/namespaces/keda/services/keda-add-ons-http-interceptor-admin:9090/proxy/envoy-metrics-map

For inference workloads using inferencePool:

  • Traffic flows through port 9002 on kedify-proxy
  • External Processing (ext_proc) filter integrates with Endpoint Picker
  • Metrics are collected the same way as regular traffic
  • Key format remains namespace/httpscaledobject-name

See HTTP Scaler for Inference for more details.