Skip to content

Pod Resource Autoscaler

Pod Resource Autoscaler (PRA) continuously adjusts CPU and/or memory requests/limits for a single container in matching pods, based on observed utilization. Unlike horizontal autoscaling, PRA does not change replica count; it resizes container resources in place.

PRA applies resource changes through the pods/resize subresource. If in-place resize is not supported by your cluster, resize attempts will fail and PRA will emit warning events.

Relevant Kubernetes docs:

  • Kubernetes API access to nodes/proxy (PRA calls kubelet /stats/summary through the apiserver node proxy).
  • Cluster support for in-place pod resize (required for updates to take practical effect).

PRA is available via the Kedify Agent and can be enabled or disabled using the environment variable PRA_ENABLED (defaults to false).

By default, all pods matched by a PodResourceAutoscaler are eligible for reconciliation. If you want explicit per-pod opt-in, set the environment variable PRA_REQUIRES_ANNOTATED_PODS on Kedify Agent to true

Both flags can be set in the Kedify Agent Helm Values file:

# to enable PRA set:
agent.features.podResourceAutoscalersEnabled=true
# to require annotated pods set:
agent.features.praRequiresAnnotatedPods=false

When annotation requirement is enabled, pods must include:

pra.kedify.io/reconcile: enabled

The custom resource uses:

apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler

At a high level, a PRA describes:

  • which pods to manage (.spec.selector or .spec.target)
  • which container to resize (.spec.containerName)
  • how to evaluate and apply resize actions (.spec.policy)
  • optional bounds/step caps (.spec.bounds)

You must specify exactly one of .spec.selector or .spec.target.

  • paused: optional bool (default false)
  • priority: optional int (default 0)
  • target: optional workload reference (XOR with selector)
  • selector: optional label selector (XOR with target)
  • containerName: required string; the single container per pod to resize
  • policy: required; evaluation and sizing rules
  • bounds: optional; per-resource bounds and step caps for requests and/or limits
  • conditions: Ready, Active, Scaling, Metrics
  • observedGeneration
  • effectiveSelector
  • lastScaleTime
  • lastScaleAction: scaleUp|scaleDown
  • lastScaleResourceChange: compact before/after delta from the last scale action
  • scaling: boolean derived from lastScaleTime + cooldown window (cleared after cooldown elapses by periodic runtime sync)
apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler
metadata:
name: load-generator
spec:
target:
kind: deployment
name: load-generator
containerName: load-generator
policy:
pollInterval: 5s
consecutiveSamples: 3
cooldown: 2m
after: containerReady
delay: 10s
cpu:
requests:
scaleUpThreshold: 75
scaleDownThreshold: 60
targetUtilization: 70
limits:
scaleUpThreshold: 85
scaleDownThreshold: 70
targetUtilization: 75
memory:
requests:
scaleUpThreshold: 75
scaleDownThreshold: 60
targetUtilization: 70
limits:
scaleUpThreshold: 85
scaleDownThreshold: 70
targetUtilization: 75
bounds:
cpu:
requests:
min: 100m
max: "2"
step: 100m
stepPercent: 25
limits:
min: 100m
max: "3"
step: 100m
stepPercent: 25
memory:
requests:
min: 96Mi
max: 2Gi
step: 64Mi
stepPercent: 25
limits:
min: 300Mi
max: 3Gi
step: 64Mi
stepPercent: 25

You can target pods in two mutually exclusive ways:

  • .spec.selector
  • .spec.target

Exactly one of these must be set.

Allowed target kinds:

  • deployment
  • statefulset
  • daemonset

When using .spec.selector, the selector uses the same shape as deployment.spec.selector.

If multiple PRAs match the same pod, the winner selection is deterministic:

  • highest priority, then earliest next-eligible time, then lexicographic PRA name
  • Watches PodResourceAutoscaler and Pod.
  • Tracks matching pods keyed by pod UID.
  • Groups tracked pods by node and runs one poller per node.
  • The poller fetches kubelet /stats/summary once per cycle and evaluates only tracked pods on that node.
  • Node poll interval is the minimum pollInterval among tracked pods on that node.
  • Even when node polling is more frequent, each pod is evaluated only when its own pollInterval has elapsed.
  • Apply gating (after/delay) controls patch application, not metric sampling.

Top-level policy fields:

  • pollInterval: per-pod evaluation interval (node poll interval becomes the minimum interval among tracked pods on a node).
  • consecutiveSamples: required, must be >= 1
  • cooldown: required duration (minimum time between resize actions for the pod)
  • after: optional apply gate, one of running|containerReady|podReady (default containerReady)
  • delay: optional duration before apply (default 15s)

Resource policies are optional and can be configured per resource and per “side”:

  • policy.cpu.requests / policy.cpu.limits
  • policy.memory.requests / policy.memory.limits

Each utilization policy includes:

  • scaleUpThreshold (percent)
  • scaleDownThreshold (percent)
  • targetUtilization (percent)
  • consecutiveSamples (optional override)
  • cooldown (optional override)

Notes:

  • If you specify policy.cpu, you must configure requests and/or limits under it.
  • If you specify policy.memory, you must configure requests and/or limits under it.
  • scaleUpThreshold must be greater than scaleDownThreshold.

At least one of policy.cpu.requests|policy.cpu.limits|policy.memory.requests|policy.memory.limits must be set.

  • CPU usage is derived from kubelet cumulative counters (rate over time). Memory usage is derived from kubelet working set bytes.
  • For each configured side, utilization is computed as usage / baseline * 100, where the baseline is the container’s request or limit for that side.
  • Utilization is evaluated independently for each configured side.
  • Scale up if any configured side requests scale up.
  • Scale down only if no side requests scale up and at least one side requests scale down.
  • Desired values are computed from targetUtilization and current usage (separately per side), using desired = usage / (targetUtilization / 100).
  • requests <= limits is always enforced.
  • Resize apply is gated by after and delay, and skipped while a pod resize is pending/in progress.

Additional details:

  • CPU needs an initial warm-up sample before a first rate can be computed.

  • Threshold checks are inclusive (>= for up, <= for down).

  • CPU cores used are computed as:

    cores_used = ((cpu_ns_now - cpu_ns_prev) / 1e9) / dt_seconds

.spec.bounds is optional. If bounds are omitted for a managed side, scaling still works without extra clamping or step caps.

Bounds are configured per resource and per side:

  • bounds.cpu.requests / bounds.cpu.limits
  • bounds.memory.requests / bounds.memory.limits

Each side supports:

  • min
  • max
  • step (absolute max delta per action)
  • stepPercent (relative max delta per action)

If both step and stepPercent are set, the smaller cap is applied.

Set:

spec:
paused: true

When paused, PRA stops tracking/scaling for that autoscaler and resumes on unpause.

While paused, PRA status conditions are set to:

  • Ready=True (reason=Paused)
  • Active=False (reason=Paused)
  • Scaling=False (reason=Paused)

PRP and PRA can both write pod resources. If both target the same container/resource fields, regular patch semantics apply (last writer wins), which can cause control-loop fights.

Recommendation:

  • Do not overlap ownership of the same container requests/limits fields between PRP and PRA.

PRA patches the pods/resize subresource using a merge-style patch (not server-side apply). If another controller (or manual apply) writes the same fields, last writer wins.

  • Pod QoS class must not change during in-place resize; candidate updates that would change QoS are skipped and reported via event.
  • If memory resize may restart a container (for example resizePolicy.restartPolicy=RestartContainer), PRA emits an event to signal a restart may occur.
  • Scale up is skipped while resize is pending/deferred/infeasible; any scaling is skipped while resize is actively in progress.
  • Apply is gated by policy.after and policy.delay:
    • after=containerReady (default): applies only when the target container is ready
    • after=running: applies as soon as the pod is Running
    • after=podReady: requires PodReady instead of container readiness

PRA emits Kubernetes events for validation, metrics health, and scaling lifecycle. Common events include:

  • PodResourceAutoscalerMetricsUnavailable / PodResourceAutoscalerMetricsRecovered
  • PodResourceAutoscalerBaselineUnavailable / PodResourceAutoscalerBaselineAvailable
  • PodResourceAutoscalerScaleUp / PodResourceAutoscalerScaleDown
  • PodResourceAutoscalerResizeInProgress / PodResourceAutoscalerResizePending
  • PodResourceAutoscalerQoSInvariantBlocked
  • PodResourceAutoscalerResizeMayRestart
  • PodResourceAutoscalerScaleFailed

CRD-level validation enforces:

  • XOR target / selector
  • at least one side among cpu.requests|cpu.limits|memory.requests|memory.limits
  • scaleUpThreshold > scaleDownThreshold
  • percent range checks (thresholds and stepPercent)
  • consecutiveSamples >= 1
  • bounds supports only cpu and memory keys

Runtime validation/guards include:

  • parseable pollInterval, cooldown, and delay (if set)
  • after must be one of running|containerReady|podReady
  • selected container exists
  • baseline resources exist and are non-zero for configured sides
  • counter resets / missing samples are handled (skipped)

1. Create a cluster with in-place resize enabled

Section titled “1. Create a cluster with in-place resize enabled”
Terminal window
k3d cluster create dyn-resources --no-lb \
--k3s-arg "--disable=traefik,servicelb@server:*" \
--k3s-arg "--kube-apiserver-arg=feature-gates=InPlacePodVerticalScaling=true@server:*"
Terminal window
kubectl create deployment my-app --image=ghcr.io/kedify/sample-minute-metrics:latest
kubectl rollout status deploy/my-app
Terminal window
cat <<EOF | kubectl apply -f -
apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler
metadata:
name: my-app-pra
spec:
target:
kind: deployment
name: my-app
containerName: my-app
policy:
pollInterval: 5s
consecutiveSamples: 2
cooldown: 30s
cpu:
requests:
scaleUpThreshold: 1
scaleDownThreshold: 0
targetUtilization: 50
bounds:
cpu:
requests:
min: 100m
max: "2"
step: 100m
EOF
Terminal window
kubectl get pra -o wide
kubectl get events --sort-by=.lastTimestamp | grep PodResourceAutoscaler
kubectl get pod -l app=my-app -o jsonpath="{.items[0].spec.containers[0].resources}"

PRA publishes internal Prometheus metrics (names prefixed with kedify_agent_pra_), including:

  • kedify_agent_pra_reconcile_count
  • kedify_agent_pra_resource_updates
  • kedify_agent_pra_total
  • kedify_agent_pra_error_total
  • kedify_agent_pra_scaling_active

Additional decision and sizing metrics are also exported:

  • kedify_agent_pra_scale_decisions_total
  • kedify_agent_pra_scale_blocked_total
  • kedify_agent_pra_scale_delta_absolute
  • kedify_agent_pra_scale_delta_percent
  • kedify_agent_pra_evaluation_duration_seconds
  • kedify_agent_pra_time_to_apply_seconds

Notes:

  • kedify_agent_pra_scaling_active is derived from lastScaleTime + cooldown window.
  • Delta histogram units: CPU absolute delta is measured in millicores; memory absolute delta is measured in MiB.

PRA requires (at minimum):

  • Pods: get/list/watch
  • Pods/resize subresource: patch
  • Nodes proxy: get
  • PodResourceAutoscalers: get/list/watch/update/patch + status get/update/patch
  • Deployments/DaemonSets/StatefulSets: get/list/watch