Autoscaling Checks
Autoscaling Checks are small canary workloads and runner Pods that validate the autoscaling control plane in a real cluster. They check that signal generation, metric delivery, KEDA, HPA, and Deployment replica changes work together. They are not intended to test application business logic.
The checks expose kedify_autoscaling_check_duration_seconds, a gauge with the latest complete check iteration duration in seconds. Use it for alerts when autoscaling takes longer than your operating threshold.
What Gets Installed
Section titled “What Gets Installed”The Helm chart installs separate runner and target apps for each enabled check. HTTP and CPU are enabled by default. Prometheus and memory are opt-in because they depend on additional cluster setup or take longer to settle.
| Check | Signal source | Validates |
|---|---|---|
| HTTP | In-cluster HTTP requests | Kedify HTTP scaler metric, HPA metric, HPA activation, scale-up/down |
| Prometheus | Work-simulator sample app metric | Source Prometheus metric, KEDA metric, HPA metric, scale-up/down |
| CPU | Load-generator sample app CPU profile | metrics.k8s.io, HPA CPU metric, HPA activation, scale-up/down |
| Memory | Load-generator sample app memory load | metrics.k8s.io, HPA memory metric, HPA activation, scale-up/down |
Architecture
Section titled “Architecture”Each check has its own runner and target. The runner creates the autoscaling signal, validates the metric and HPA path, waits for the target to scale up and back down, and exposes kedify_autoscaling_check_* metrics.
HTTP scaler check:
check-http-runner -> Kedify HTTP proxy -> check-http-target -> KEDA HTTP external metric -> KEDA-generated HPA -> Deployment replicas -> kedify_autoscaling_check_* metricsPrometheus scaler check:
check-prometheus-runner -> check-prometheus-target work requests -> work_simulator_inprogress_tasks -> Prometheus scrape and query -> KEDA Prometheus scaler -> KEDA-generated HPA -> Deployment replicas -> kedify_autoscaling_check_* metricsCPU resource check:
check-cpu-runner -> check-cpu-target CPU profile -> metrics.k8s.io CPU samples -> KEDA CPU scaler -> KEDA-generated HPA -> Deployment replicas -> kedify_autoscaling_check_* metricsMemory resource check:
check-memory-runner -> check-memory-target memory profile -> metrics.k8s.io memory samples -> KEDA memory scaler -> KEDA-generated HPA -> Deployment replicas -> kedify_autoscaling_check_* metricsPrerequisites
Section titled “Prerequisites”- Kedify/KEDA is installed in the cluster.
- The Kedify HTTP scaler is installed if the HTTP check is enabled.
metrics.k8s.iois available if the CPU or memory check is enabled.- Prometheus is reachable from the check runner Pods if the Prometheus check is enabled.
- Prometheus scrapes the bundled work-simulator target if the Prometheus check is enabled.
- The Kedify Agent version supports autoscaling-check Service discovery.
Install
Section titled “Install”Install the OCI chart from GHCR. If the package is still private, authenticate first:
echo "$GITHUB_TOKEN" | helm registry login ghcr.io --username <github-user> --password-stdinInstall the default HTTP and CPU checks:
helm upgrade --install autoscaling-checks oci://ghcr.io/kedify/charts/autoscaling-checks \ --version <version> \ --namespace autoscaling-checks \ --create-namespaceEnable the memory check when you want to cover memory resource scaling too:
helm upgrade --install autoscaling-checks oci://ghcr.io/kedify/charts/autoscaling-checks \ --version <version> \ --namespace autoscaling-checks \ --create-namespace \ --set memory.enabled=trueIf you are testing from a private checkout, use the local chart path and set the image tag explicitly:
helm upgrade --install autoscaling-checks ./charts/autoscaling-checks \ --namespace autoscaling-checks \ --create-namespace \ --set image.repository=ghcr.io/kedify/autoscaling-checks \ --set image.tag=<version>Enable or disable checks explicitly:
helm upgrade --install autoscaling-checks oci://ghcr.io/kedify/charts/autoscaling-checks \ --version <version> \ --namespace autoscaling-checks \ --create-namespace \ --set prometheus.enabled=true \ --set memory.enabled=true \ --set cpu.enabled=falseThe autoscaling-checks namespace is recommended. Other namespaces work as long as the runner Services keep the chart labels and the Kedify Agent can read Services.
Configure Prometheus
Section titled “Configure Prometheus”Set the Prometheus HTTP API address and the source query for the Prometheus check:
helm upgrade --install autoscaling-checks oci://ghcr.io/kedify/charts/autoscaling-checks \ --version <version> \ --namespace autoscaling-checks \ --create-namespace \ --set prometheus.enabled=true \ --set prometheus.serverAddress=http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090 \ --set 'prometheus.sourceQuery=sum(work_simulator_inprogress_tasks)'If you use Prometheus Operator, enable ServiceMonitors for the runner metrics and the Prometheus target sample app:
helm upgrade --install autoscaling-checks oci://ghcr.io/kedify/charts/autoscaling-checks \ --version <version> \ --namespace autoscaling-checks \ --create-namespace \ --set monitoring.serviceMonitor.enabled=true \ --set prometheus.targetServiceMonitor.enabled=trueIf your Prometheus uses pod annotations, the chart enables scrape annotations by default with monitoring.scrapeAnnotations=true.
Dashboard Visibility
Section titled “Dashboard Visibility”The Kedify Agent discovers autoscaling-check runner Services automatically. No extra metrics endpoint setting is required.
The runner Services must have these labels:
app.kubernetes.io/name=autoscaling-checksapp.kubernetes.io/part-of=autoscaling-checksautoscaling-checks.kedify.io/role=runnerThe Helm chart sets these labels and exposes a Service port named metrics. The dashboard shows the results in the cluster detail view under the Autoscaling Checks tab.
Alerting
Section titled “Alerting”Use kedify_autoscaling_check_duration_seconds for the primary alert. It records the latest complete iteration duration and includes bounded labels for the overall result and each step.
Example Prometheus alert:
groups: - name: kedify-autoscaling-checks rules: - alert: KedifyAutoscalingCheckSlow expr: kedify_autoscaling_check_duration_seconds > 300 for: 2m labels: severity: warning annotations: summary: Kedify autoscaling check is slow description: Check {{ $labels.check }} took {{ $value }}s in the latest iteration.Use the step labels and kedify_autoscaling_check_step_duration_seconds to identify whether the delay is in signal generation, source metrics, KEDA metrics, HPA metrics, HPA activation, scale-up, or scale-down.
Verify
Section titled “Verify”Wait for the runner and target Deployments:
kubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-http-runnerkubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-http-targetkubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-cpu-runnerkubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-cpu-targetIf optional checks are enabled, wait for those runners and targets too:
kubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-prometheus-runnerkubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-prometheus-targetkubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-memory-runnerkubectl -n autoscaling-checks rollout status deploy/autoscaling-checks-check-memory-targetInspect one runner directly:
kubectl -n autoscaling-checks port-forward svc/autoscaling-checks-check-http-runner 8080:8080curl -s localhost:8080/healthzcurl -s localhost:8080/metricsCheck the generated autoscaling resources:
kubectl -n autoscaling-checks get scaledobject,hpa,deploy,svc -l app.kubernetes.io/part-of=autoscaling-checksTroubleshooting
Section titled “Troubleshooting”If the dashboard tab is empty, verify that the agent can read Services and that the runner Services have the labels shown above and a metrics port.
If the Prometheus check fails, query Prometheus directly with the configured prometheus.sourceQuery and confirm it returns the expected value during a check run.
If the CPU or memory check fails before scaling up, confirm that metrics.k8s.io returns Pod resource metrics in the autoscaling-checks namespace.
If scale-down takes too long, check the ScaledObject cooldown values and the chart common.scaleDownTimeout setting.
Uninstall
Section titled “Uninstall”helm uninstall autoscaling-checks --namespace autoscaling-checks