HTTP Scaler Metrics Architecture

This document explains how HTTP traffic metrics are collected, aggregated, and used for scaling decisions in Kedify.

Overview

Kedify uses a distributed metrics collection system where:

kedify-proxy (Envoy) handles traffic and collects metrics
Interceptor aggregates metrics and pushes them to the scaler via gRPC bridge
Scaler receives metrics and reports to KEDA
KEDA makes scaling decisions based on metrics

Architecture Diagram

Http Scaler Metrics Architecture )

Components

1. kedify-proxy (Envoy)

The kedify-proxy is an Envoy-based proxy that:

Routes HTTP traffic to backend services
Collects traffic metrics (RPS, Concurrency)
Pushes metrics to the interceptor via gRPC

Envoy Configuration:

stats_flush_interval: 1s
stats_sinks:
  - name: kedify_metrics_sink
    typed_config:
      '@type': type.googleapis.com/envoy.config.metrics.v3.MetricsServiceConfig
      transport_api_version: V3
      report_counters_as_deltas: true
      grpc_service:
        envoy_grpc:
          cluster_name: kedify_metrics_service

2. Interceptor - envoysink (Port 9901)

The interceptor runs a gRPC server on port 9901 that receives metrics from kedify-proxy. This server is exposed trough a kubernetes service pointing to port 9901 on the interceptor.

Key Components:

MetricsServiceServer: Implements Envoy’s MetricsService gRPC interface
externalQueues: In-memory storage for metrics received from Envoy
Cluster-to-HSO mapping: Maps Envoy cluster names to HTTPScaledObjects

Service: keda-add-ons-http-interceptor-kedify-proxy-metric-sink

ports:
  - name: proxy
    port: 9901
    targetPort: 9901
  - name: control-plane
    port: 5678
    targetPort: 5678

3. gRPC Bridge (Interceptor → Scaler)

The interceptor pushes metrics to the scaler via a gRPC bridge (enabled by default). This is a streaming connection where the interceptor periodically sends metric batches.

How It Works:

Interceptor acts as gRPC client, connects to scaler
Scaler acts as gRPC server, receives metric streams
Metrics are pushed periodically

Note: The /queue REST endpoint still exists for debugging. See Debugging section.

4. Scaler

The scaler:

Receives metrics from interceptors via gRPC bridge (see above)
Implements KEDA’s External Scaler gRPC interface
Aggregates metrics across all interceptors when KEDA queries

5. KEDA and HPA

KEDA uses the scaler’s metrics to:

Determine if workload should be active (scale from zero)
Calculate desired replica count
Update the HorizontalPodAutoscaler (HPA)

Metrics

RPS (Requests Per Second)

Source: cluster.upstream_rq_total from Envoy
Use case: scalingMetric: requestRate

Concurrency

Source: cluster.upstream_rq_active from Envoy
Use case: scalingMetric: concurrency

Metric Keys

Metrics are keyed by namespace/httpscaledobject-name. This means:

All traffic matching an HTTPScaledObject is aggregated together
Different paths, hosts, or query parameters within the same HSO share metrics

Debugging

Query Current Metrics

kubectl get --raw /api/v1/namespaces/keda/services/keda-add-ons-http-interceptor-admin:9090/proxy/queue

Query Envoy-to-HSO Mapping

kubectl get --raw /api/v1/namespaces/keda/services/keda-add-ons-http-interceptor-admin:9090/proxy/envoy-metrics-map

Inference Traffic Specifics

For inference workloads using inferencePool:

Traffic flows through port 9002 on kedify-proxy
External Processing (ext_proc) filter integrates with Endpoint Picker
Metrics are collected the same way as regular traffic
Key format remains namespace/httpscaledobject-name

See HTTP Scaler for Inference for more details.