Skip to content

Kedify High Availability Configuration

Each part of Kedify deployment can be configured to run with multiple replicas, although it has different implications for every component.

You can simply increase the number of replicas in the Helm chart values.

  1. For keda-operator in the keda chart:
operator:
replicaCount: 2
  1. For kedify-agent in the kedify-agent chart:
agent:
replicas: 2

Both of these run with leader election enabled by default, so only one replica will be active at a time and reconcile resources. Other replicas will be on standby and take over if the active one fails to maintain the leader lease. Running 2 replicas makes the delay between failover and leader election slightly smaller, running more than 2 replicas does not provide any additional benefit, only consumes resources by idle compute.

Also just a matter of increasing the number of replicas in the keda Helm chart values.

metricsServer:
replicaCount: 2

The metrics-apiserver is a stateless component that serves metrics to HPA through v1beta1.external.metrics.k8s.io APIService endpoint and fetches the metrics from keda-operator over gRPC connection. Requests from HPA will get loadbalanced across multiple metrics-apiserver replicas, but each replica will fetch the metrics from the same keda-operator instance, because only one operator can be active leader at a time. Running more than 2 replicas has negligible benefit.

You can increase the number of replicas in the keda Helm chart values.

webhooks:
replicaCount: 2

The keda-admission-webhooks validates ScaledObjects and other KEDA resources for any misconfigurations. It scales horizontally well without limitations, but also is typically not a bottleneck in the system.

You can increase the number of replicas in the http-add-on Helm chart values.

scaler:
replicas: 2

The external-scaler is a stateful cache for traffic metrics. It aggregates metrics from all replicas of interceptors and serves them over gRPC stream to keda-operator. Running multiple replicas does loadbalance the requests in a limited way, because the gRPC stream is bound to a specific replica for the whole lifetime of the stream. Multiple replicas do provide redundancy in case one replica fails. However, running multiple replicas has a slight negative performance impact for each interceptor replica, because each interceptor has to maintain a gRPC stream connection to each external-scaler replica and duplicate the metrics to each of them.

This component has autoscaling enabled by default. You can control the bounds of replicas in the http-add-on Helm chart values.

interceptor:
replicas:
min: 3
max: 10

Each interceptor instance calculates partial traffic metrics and sends them to all replicas of external-scaler over gRPC stream where they are aggregated. It also configures kedify-proxy envoy fleet and handles cold-starts for applications that have scale to zero enabled. The interceptor scales horizontally well without limitations.

The proxy has full support for horizontal autoscaling, but users are advised to configure the parameters depending on their traffic patterns and expectations. The configuration can be done in the kedify-agent Helm chart values either globally or per namespace.

agent:
kedifyProxy:
globalValues:
deployment:
replicas: 2 # static configuration
namespacedValues:
namespace-1: # different configuration in specific `namespace-1` namespace
autoscaling: # with autoscaling
enabled: true
minReplicaCount: 3
maxReplicaCount: 10

Because the kedify-proxy fleet routes the traffic for all autoscaled applications, it is very important to ensure availability and low latency. Each instance maintains gRPC stream for traffic metrics to a particular instance of interceptor for further processing. The proxy fleet scales horizontally well without limitations and is also by default deployed per namespace.