Vertical Scaling with PodResourceAutoscaler

This guide demonstrates vertical scaling with PodResourceAutoscaler (PRA). For reproducibility it uses Kedify’s load-generator sample app as the workload driver, but the same setup applies to your own applications.

Prerequisites

A running Kubernetes cluster.
The kubectl command-line utility installed and accessible.
Connect your cluster in the Kedify Dashboard.
- If you do not have a connected cluster, see installation documentation.
Kedify Agent installed with Pod Resource Autoscaler feature enabled:
- agent.features.podResourceAutoscalersEnabled=true
- Installed Agent version: v0.4.14 or later (see Versions & Compatibility).
Kubernetes support for in-place pod resize.
- This is enabled by default in Kubernetes v1.33+.
- If using an older cluster, enable the InPlacePodVerticalScaling feature gate.

For local testing with k3d, you can create a compatible cluster with:

k3d cluster create pra-demo --no-lb \
  --k3s-arg "--disable=traefik,servicelb@server:*" \
  --k3s-arg "--kube-apiserver-arg=feature-gates=InPlacePodVerticalScaling=true@server:*"

Step 1: Deploy the Example Workload

This guide uses Kedify’s sample load-generator application. Source code: github.com/kedify/examples/tree/pra/samples/load-generator

Apply the following Deployment and Service:

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: load-generator
  labels:
    app: load-generator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: load-generator
  template:
    metadata:
      labels:
        app: load-generator
      annotations:
        pra.kedify.io/reconcile: enabled
    spec:
      containers:
        - name: load-generator
          image: ghcr.io/kedify/sample-load-generator:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 8080
              name: http
          env:
            - name: BASELINE_CPU_MILLICORES
              value: "100"
            - name: BASELINE_MEMORY_MIB
              value: "96"
          resources:
            requests:
              cpu: 100m
              memory: 96Mi
            limits:
              cpu: 300m
              memory: 600Mi
          readinessProbe:
            httpGet:
              path: /healthz
              port: http
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: load-generator
  labels:
    app: load-generator
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 8080
      targetPort: http
  selector:
    app: load-generator
EOF

Verify the Pod is ready:

kubectl rollout status deploy/load-generator

Step 2: Create PodResourceAutoscaler

Apply PRA configuration that manages both CPU and memory requests/limits:

cat <<EOF | kubectl apply -f -
apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler
metadata:
  name: load-generator
spec:
  target:
    kind: deployment
    name: load-generator
  containerName: load-generator
  policy:
    pollInterval: 5s
    consecutiveSamples: 2
    cooldown: 30s
    after: containerReady
    delay: 5s
    cpu:
      requests:
        scaleUpThreshold: 75
        scaleDownThreshold: 45
        targetUtilization: 60
      limits:
        scaleUpThreshold: 85
        scaleDownThreshold: 60
        targetUtilization: 70
    memory:
      requests:
        scaleUpThreshold: 75
        scaleDownThreshold: 50
        targetUtilization: 60
      limits:
        scaleUpThreshold: 85
        scaleDownThreshold: 60
        targetUtilization: 70
  bounds:
    cpu:
      requests:
        min: 100m
        max: "2"
        step: 200m
        stepPercent: 50
      limits:
        min: 300m
        max: "3"
        step: 300m
        stepPercent: 50
    memory:
      requests:
        min: 96Mi
        max: 2Gi
        step: 128Mi
        stepPercent: 50
      limits:
        min: 600Mi
        max: 3Gi
        step: 256Mi
        stepPercent: 50
EOF

Step 3: Watch In-Place Resource Resizing

Open a terminal and watch container resources:

watch "kubectl get pod -l app=load-generator -ojsonpath=\"{.items[0].spec.containers[?(.name=='load-generator')].resources}\" | jq"

You should see similar output:

{
  "limits": {
    "cpu": "300m",
    "memory": "600Mi"
  },
  "requests": {
    "cpu": "169m",
    "memory": "196Mi"
  }
}

Step 4: Change Runtime Load Profiles

Start port-forward in another terminal:

kubectl port-forward svc/load-generator 8080:8080

Set workload to idle (low footprint) first:

curl -s -X POST localhost:8080/profile/idle | jq

Check current profile:

curl -s localhost:8080/status | jq

You should see similar output:

{
  "baseline": {
    "cpuMillicores": 100,
    "memoryMiB": 96
  },
  "desired": {
    "cpuMillicores": 100,
    "memoryMiB": 96
  },
  "current": {
    "cpuMillicores": 100,
    "memoryMiB": 96
  },
  "activeProfile": "idle",
  "schedule": {
    "active": false
  },
  "cpu": {
    "workers": 2,
    "targetMillicores": 100,
    "maxMillicores": 2000,
    "requestedMillicores": 100
  },
  "memory": {
    "targetMiB": 96,
    "allocatedMiB": 96
  },
  "uptimeSeconds": 26676
}

Trigger higher load:

curl -s -X POST localhost:8080/profile/high | jq

Verify in the watch output from Step 3 that requests/limits go up.

Then return to idle:

curl -s -X POST localhost:8080/profile/idle | jq

Verify in the watch output that requests/limits go down again. Replica count should remain 1 for the whole flow.

Step 5: Verify PRA Status and Events

Check PRA status and latest scaling action:

kubectl describe pra load-generator
kubectl get pra load-generator -o jsonpath='{.status.lastScaleAction}{"\n"}{.status.lastScaleResourceChange}{"\n"}'

Inspect related Kubernetes events:

kubectl get events --sort-by=.lastTimestamp | grep PodResourceAutoscaler

You should see events similar to PodResourceAutoscalerScaleUp and PodResourceAutoscalerScaleDown.

Cleanup

kubectl delete pra load-generator
kubectl delete svc load-generator
kubectl delete deploy load-generator