Kedify High Availability Configuration
Enable High Availability Mode for Kedify
Section titled “Enable High Availability Mode for Kedify”Each part of Kedify deployment can be configured to run with multiple replicas, although it has different implications for every component.
keda-operator and kedify-agent
Section titled “keda-operator and kedify-agent”You can simply increase the number of replicas in the Helm chart values.
- For
keda-operatorin thekedachart:
operator: replicaCount: 2- For
kedify-agentin thekedify-agentchart:
agent: replicas: 2Both of these run with leader election enabled by default, so only one replica will be active at a time and reconcile resources. Other replicas will be on standby and take over if the active one fails to maintain the leader lease. Running 2 replicas makes the delay between failover and leader election slightly smaller, running more than 2 replicas does not provide any additional benefit, only consumes resources by idle compute.
keda-operator-metrics-apiserver
Section titled “keda-operator-metrics-apiserver”Also just a matter of increasing the number of replicas in the keda Helm chart values.
metricsServer: replicaCount: 2The metrics-apiserver is a stateless component that serves metrics to HPA through v1beta1.external.metrics.k8s.io APIService endpoint and fetches the metrics from keda-operator over gRPC connection.
Requests from HPA will get loadbalanced across multiple metrics-apiserver replicas, but each replica will fetch the metrics from the same keda-operator instance, because only one operator can be active leader at a time.
Running more than 2 replicas has negligible benefit.
keda-admission-webhooks
Section titled “keda-admission-webhooks”You can increase the number of replicas in the keda Helm chart values.
webhooks: replicaCount: 2The keda-admission-webhooks validates ScaledObjects and other KEDA resources for any misconfigurations. It scales horizontally well without limitations, but also is typically not a bottleneck in the system.
keda-add-ons-http-external-scaler
Section titled “keda-add-ons-http-external-scaler”You can increase the number of replicas in the http-add-on Helm chart values.
scaler: replicas: 2The external-scaler is a stateful cache for traffic metrics. It aggregates metrics from all replicas of interceptors and serves them over gRPC stream to keda-operator.
Running multiple replicas does loadbalance the requests in a limited way, because the gRPC stream is bound to a specific replica for the whole lifetime of the stream. Multiple replicas do provide redundancy in case one replica fails.
However, running multiple replicas has a slight negative performance impact for each interceptor replica, because each interceptor has to maintain a gRPC stream connection to each external-scaler replica and duplicate the metrics to each of them.
keda-add-ons-http-interceptor
Section titled “keda-add-ons-http-interceptor”This component has autoscaling enabled by default. You can control the bounds of replicas in the http-add-on Helm chart values.
interceptor: replicas: min: 3 max: 10Each interceptor instance calculates partial traffic metrics and sends them to all replicas of external-scaler over gRPC stream where they are aggregated. It also configures kedify-proxy envoy fleet and handles cold-starts for applications that have scale to zero enabled.
The interceptor scales horizontally well without limitations.
kedify-proxy
Section titled “kedify-proxy”The proxy has full support for horizontal autoscaling, but users are advised to configure the parameters depending on their traffic patterns and expectations. The configuration can be done in the kedify-agent Helm chart values either globally or per namespace.
agent: kedifyProxy: globalValues: deployment: replicas: 2 # static configuration namespacedValues: namespace-1: # different configuration in specific `namespace-1` namespace autoscaling: # with autoscaling enabled: true minReplicaCount: 3 maxReplicaCount: 10Because the kedify-proxy fleet routes the traffic for all autoscaled applications, it is very important to ensure availability and low latency. Each instance maintains gRPC stream for traffic metrics to a particular instance of interceptor for further processing.
The proxy fleet scales horizontally well without limitations and is also by default deployed per namespace.