Kedify High Availability Configuration
Enable High Availability Mode for Kedify
Section titled “Enable High Availability Mode for Kedify”Each part of Kedify deployment can be configured to run with multiple replicas, although it has different implications for every component.
keda-operator
and kedify-agent
Section titled “keda-operator and kedify-agent”You can simply increase the number of replicas in the Helm chart values.
- For
keda-operator
in thekeda
chart:
operator: replicaCount: 2
- For
kedify-agent
in thekedify-agent
chart:
agent: replicas: 2
Both of these run with leader election enabled by default, so only one replica will be active at a time and reconcile resources. Other replicas will be on standby and take over if the active one fails to maintain the leader lease. Running 2 replicas makes the delay between failover and leader election slightly smaller, running more than 2 replicas does not provide any additional benefit, only consumes resources by idle compute.
keda-operator-metrics-apiserver
Section titled “keda-operator-metrics-apiserver”Also just a matter of increasing the number of replicas in the keda
Helm chart values.
metricsServer: replicaCount: 2
The metrics-apiserver
is a stateless component that serves metrics to HPA through v1beta1.external.metrics.k8s.io
APIService endpoint and fetches the metrics from keda-operator
over gRPC connection.
Requests from HPA will get loadbalanced across multiple metrics-apiserver
replicas, but each replica will fetch the metrics from the same keda-operator
instance, because only one operator can be active leader at a time.
Running more than 2 replicas has negligible benefit.
keda-admission-webhooks
Section titled “keda-admission-webhooks”You can increase the number of replicas in the keda
Helm chart values.
webhooks: replicaCount: 2
The keda-admission-webhooks
validates ScaledObjects
and other KEDA resources for any misconfigurations. It scales horizontally well without limitations, but also is typically not a bottleneck in the system.
keda-add-ons-http-external-scaler
Section titled “keda-add-ons-http-external-scaler”You can increase the number of replicas in the http-add-on
Helm chart values.
scaler: replicas: 2
The external-scaler
is a stateful cache for traffic metrics. It aggregates metrics from all replicas of interceptors
and serves them over gRPC stream to keda-operator
.
Running multiple replicas does loadbalance the requests in a limited way, because the gRPC stream is bound to a specific replica for the whole lifetime of the stream. Multiple replicas do provide redundancy in case one replica fails.
However, running multiple replicas has a slight negative performance impact for each interceptor
replica, because each interceptor
has to maintain a gRPC stream connection to each external-scaler
replica and duplicate the metrics to each of them.
keda-add-ons-http-interceptor
Section titled “keda-add-ons-http-interceptor”This component has autoscaling enabled by default. You can control the bounds of replicas in the http-add-on
Helm chart values.
interceptor: replicas: min: 3 max: 10
Each interceptor
instance calculates partial traffic metrics and sends them to all replicas of external-scaler
over gRPC stream where they are aggregated. It also configures kedify-proxy
envoy fleet and handles cold-starts for applications that have scale to zero enabled.
The interceptor
scales horizontally well without limitations.
kedify-proxy
Section titled “kedify-proxy”The proxy
has full support for horizontal autoscaling, but users are advised to configure the parameters depending on their traffic patterns and expectations. The configuration can be done in the kedify-agent
Helm chart values either globally or per namespace.
agent: kedifyProxy: globalValues: deployment: replicas: 2 # static configuration namespacedValues: namespace-1: # different configuration in specific `namespace-1` namespace autoscaling: # with autoscaling enabled: true minReplicaCount: 3 maxReplicaCount: 10
Because the kedify-proxy
fleet routes the traffic for all autoscaled applications, it is very important to ensure availability and low latency. Each instance maintains gRPC stream for traffic metrics to a particular instance of interceptor
for further processing.
The proxy
fleet scales horizontally well without limitations and is also by default deployed per namespace.