Skip to content

Vertical Scaling with PodResourceAutoscaler

This guide demonstrates vertical scaling with PodResourceAutoscaler (PRA). For reproducibility it uses Kedify’s load-generator sample app as the workload driver, but the same setup applies to your own applications.

  • A running Kubernetes cluster.
  • The kubectl command-line utility installed and accessible.
  • Connect your cluster in the Kedify Dashboard.
  • Kedify Agent installed with Pod Resource Autoscaler feature enabled:
    • agent.features.podResourceAutoscalersEnabled=true
    • Installed Agent version: v0.4.14 or later (see Versions & Compatibility).
  • Kubernetes support for in-place pod resize.
    • This is enabled by default in Kubernetes v1.33+.
    • If using an older cluster, enable the InPlacePodVerticalScaling feature gate.

This guide uses Kedify’s sample load-generator application. Source code: github.com/kedify/examples/tree/pra/samples/load-generator

Apply the following Deployment and Service:

Terminal window
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: load-generator
labels:
app: load-generator
spec:
replicas: 1
selector:
matchLabels:
app: load-generator
template:
metadata:
labels:
app: load-generator
annotations:
pra.kedify.io/reconcile: enabled
spec:
containers:
- name: load-generator
image: ghcr.io/kedify/sample-load-generator:latest
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
env:
- name: BASELINE_CPU_MILLICORES
value: "100"
- name: BASELINE_MEMORY_MIB
value: "96"
resources:
requests:
cpu: 100m
memory: 96Mi
limits:
cpu: 300m
memory: 600Mi
readinessProbe:
httpGet:
path: /healthz
port: http
periodSeconds: 5
livenessProbe:
httpGet:
path: /healthz
port: http
initialDelaySeconds: 5
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: load-generator
labels:
app: load-generator
spec:
type: ClusterIP
ports:
- name: http
port: 8080
targetPort: http
selector:
app: load-generator
EOF

Verify the Pod is ready:

Terminal window
kubectl rollout status deploy/load-generator

Apply PRA configuration that manages both CPU and memory requests/limits:

Terminal window
cat <<EOF | kubectl apply -f -
apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler
metadata:
name: load-generator
spec:
target:
kind: deployment
name: load-generator
containerName: load-generator
policy:
pollInterval: 5s
consecutiveSamples: 2
cooldown: 30s
after: containerReady
delay: 5s
cpu:
requests:
scaleUpThreshold: 75
scaleDownThreshold: 45
targetUtilization: 60
limits:
scaleUpThreshold: 85
scaleDownThreshold: 60
targetUtilization: 70
memory:
requests:
scaleUpThreshold: 75
scaleDownThreshold: 50
targetUtilization: 60
limits:
scaleUpThreshold: 85
scaleDownThreshold: 60
targetUtilization: 70
bounds:
cpu:
requests:
min: 100m
max: "2"
step: 200m
stepPercent: 50
limits:
min: 300m
max: "3"
step: 300m
stepPercent: 50
memory:
requests:
min: 96Mi
max: 2Gi
step: 128Mi
stepPercent: 50
limits:
min: 600Mi
max: 3Gi
step: 256Mi
stepPercent: 50
EOF

Open a terminal and watch container resources:

Terminal window
watch "kubectl get pod -l app=load-generator -ojsonpath=\"{.items[0].spec.containers[?(.name=='load-generator')].resources}\" | jq"

You should see similar output:

{
"limits": {
"cpu": "300m",
"memory": "600Mi"
},
"requests": {
"cpu": "169m",
"memory": "196Mi"
}
}

Start port-forward in another terminal:

Terminal window
kubectl port-forward svc/load-generator 8080:8080

Set workload to idle (low footprint) first:

Terminal window
curl -s -X POST localhost:8080/profile/idle | jq

Check current profile:

Terminal window
curl -s localhost:8080/status | jq

You should see similar output:

{
"baseline": {
"cpuMillicores": 100,
"memoryMiB": 96
},
"desired": {
"cpuMillicores": 100,
"memoryMiB": 96
},
"current": {
"cpuMillicores": 100,
"memoryMiB": 96
},
"activeProfile": "idle",
"schedule": {
"active": false
},
"cpu": {
"workers": 2,
"targetMillicores": 100,
"maxMillicores": 2000,
"requestedMillicores": 100
},
"memory": {
"targetMiB": 96,
"allocatedMiB": 96
},
"uptimeSeconds": 26676
}

Trigger higher load:

Terminal window
curl -s -X POST localhost:8080/profile/high | jq

Verify in the watch output from Step 3 that requests/limits go up.

Then return to idle:

Terminal window
curl -s -X POST localhost:8080/profile/idle | jq

Verify in the watch output that requests/limits go down again. Replica count should remain 1 for the whole flow.

Check PRA status and latest scaling action:

Terminal window
kubectl describe pra load-generator
kubectl get pra load-generator -o jsonpath='{.status.lastScaleAction}{"\n"}{.status.lastScaleResourceChange}{"\n"}'

Inspect related Kubernetes events:

Terminal window
kubectl get events --sort-by=.lastTimestamp | grep PodResourceAutoscaler

You should see events similar to PodResourceAutoscalerScaleUp and PodResourceAutoscalerScaleDown.

Terminal window
kubectl delete pra load-generator
kubectl delete svc load-generator
kubectl delete deploy load-generator