Pod Resource Autoscaler

Pod Resource Autoscaler (PRA) continuously adjusts CPU and/or memory requests/limits for a single container in matching pods, based on observed utilization. Unlike horizontal autoscaling, PRA does not change replica count; it resizes container resources in place.

In-place Updates

PRA applies resource changes through the pods/resize subresource. If in-place resize is not supported by your cluster, resize attempts will fail and PRA will emit warning events.

Relevant Kubernetes docs:

Prerequisites

Kubernetes API access to nodes/proxy (PRA calls kubelet /stats/summary through the apiserver node proxy).
Cluster support for in-place pod resize (required for updates to take practical effect).

Enablement and Pod Opt-in

PRA is available via the Kedify Agent and can be enabled or disabled using the environment variable PRA_ENABLED (defaults to false).

By default, all pods matched by a PodResourceAutoscaler are eligible for reconciliation. If you want explicit per-pod opt-in, set the environment variable PRA_REQUIRES_ANNOTATED_PODS on Kedify Agent to true

Both flags can be set in the Kedify Agent Helm Values file:

# to enable PRA set:
agent.features.podResourceAutoscalersEnabled=true

# to require annotated pods set:
agent.features.praRequiresAnnotatedPods=false

When annotation requirement is enabled, pods must include:

pra.kedify.io/reconcile: enabled

Pod Resource Autoscaler (PRA) CRD

The custom resource uses:

apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler

Spec Overview

At a high level, a PRA describes:

which pods to manage (.spec.selector or .spec.target)
which container to resize (.spec.containerName)
how to evaluate and apply resize actions (.spec.policy)
optional bounds/step caps (.spec.bounds)

You must specify exactly one of .spec.selector or .spec.target.

Spec fields

paused: optional bool (default false)
priority: optional int (default 0)
target: optional workload reference (XOR with selector)
selector: optional label selector (XOR with target)
containerName: required string; the single container per pod to resize
policy: required; evaluation and sizing rules
bounds: optional; per-resource bounds and step caps for requests and/or limits

Status fields

conditions: Ready, Active, Scaling, Metrics
observedGeneration
effectiveSelector
lastScaleTime
lastScaleAction: scaleUp|scaleDown
lastScaleResourceChange: compact before/after delta from the last scale action
scaling: boolean derived from lastScaleTime + cooldown window (cleared after cooldown elapses by periodic runtime sync)

Example

apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler
metadata:
  name: load-generator
spec:
  target:
    kind: deployment
    name: load-generator
  containerName: load-generator

  policy:
    pollInterval: 5s
    consecutiveSamples: 3
    cooldown: 2m
    after: containerReady
    delay: 10s

    cpu:
      requests:
        scaleUpThreshold: 75
        scaleDownThreshold: 60
        targetUtilization: 70
      limits:
        scaleUpThreshold: 85
        scaleDownThreshold: 70
        targetUtilization: 75
    memory:
      requests:
        scaleUpThreshold: 75
        scaleDownThreshold: 60
        targetUtilization: 70
      limits:
        scaleUpThreshold: 85
        scaleDownThreshold: 70
        targetUtilization: 75

  bounds:
    cpu:
      requests:
        min: 100m
        max: "2"
        step: 100m
        stepPercent: 25
      limits:
        min: 100m
        max: "3"
        step: 100m
        stepPercent: 25
    memory:
      requests:
        min: 96Mi
        max: 2Gi
        step: 64Mi
        stepPercent: 25
      limits:
        min: 300Mi
        max: 3Gi
        step: 64Mi
        stepPercent: 25

Addressing Pods

You can target pods in two mutually exclusive ways:

.spec.selector
.spec.target

Exactly one of these must be set.

Allowed target kinds:

deployment
statefulset
daemonset

When using .spec.selector, the selector uses the same shape as deployment.spec.selector.

If multiple PRAs match the same pod, the winner selection is deterministic:

highest priority, then earliest next-eligible time, then lexicographic PRA name

Controller Behavior

Watches PodResourceAutoscaler and Pod.
Tracks matching pods keyed by pod UID.
Groups tracked pods by node and runs one poller per node.
The poller fetches kubelet /stats/summary once per cycle and evaluates only tracked pods on that node.

Polling Semantics

Node poll interval is the minimum pollInterval among tracked pods on that node.
Even when node polling is more frequent, each pod is evaluated only when its own pollInterval has elapsed.
Apply gating (after/delay) controls patch application, not metric sampling.

Scaling Policy

Top-level policy fields:

pollInterval: per-pod evaluation interval (node poll interval becomes the minimum interval among tracked pods on a node).
consecutiveSamples: required, must be >= 1
cooldown: required duration (minimum time between resize actions for the pod)
after: optional apply gate, one of running|containerReady|podReady (default containerReady)
delay: optional duration before apply (default 15s)

Resource policies are optional and can be configured per resource and per “side”:

policy.cpu.requests / policy.cpu.limits
policy.memory.requests / policy.memory.limits

Each utilization policy includes:

scaleUpThreshold (percent)
scaleDownThreshold (percent)
targetUtilization (percent)
consecutiveSamples (optional override)
cooldown (optional override)

Notes:

If you specify policy.cpu, you must configure requests and/or limits under it.
If you specify policy.memory, you must configure requests and/or limits under it.
scaleUpThreshold must be greater than scaleDownThreshold.

At least one of policy.cpu.requests|policy.cpu.limits|policy.memory.requests|policy.memory.limits must be set.

How scaling decisions work

CPU usage is derived from kubelet cumulative counters (rate over time). Memory usage is derived from kubelet working set bytes.
For each configured side, utilization is computed as usage / baseline * 100, where the baseline is the container’s request or limit for that side.
Utilization is evaluated independently for each configured side.
Scale up if any configured side requests scale up.
Scale down only if no side requests scale up and at least one side requests scale down.
Desired values are computed from targetUtilization and current usage (separately per side), using desired = usage / (targetUtilization / 100).
requests <= limits is always enforced.
Resize apply is gated by after and delay, and skipped while a pod resize is pending/in progress.

Additional details:

CPU needs an initial warm-up sample before a first rate can be computed.
Threshold checks are inclusive (>= for up, <= for down).
CPU cores used are computed as:

cores_used = ((cpu_ns_now - cpu_ns_prev) / 1e9) / dt_seconds

Resource Bounds and Step Control

.spec.bounds is optional. If bounds are omitted for a managed side, scaling still works without extra clamping or step caps.

Bounds are configured per resource and per side:

bounds.cpu.requests / bounds.cpu.limits
bounds.memory.requests / bounds.memory.limits

Each side supports:

min
max
step (absolute max delta per action)
stepPercent (relative max delta per action)

If both step and stepPercent are set, the smaller cap is applied.

Pause / Resume

Set:

spec:
  paused: true

When paused, PRA stops tracking/scaling for that autoscaler and resumes on unpause.

While paused, PRA status conditions are set to:

Ready=True (reason=Paused)
Active=False (reason=Paused)
Scaling=False (reason=Paused)

PRP and PRA Together

PRP and PRA can both write pod resources. If both target the same container/resource fields, regular patch semantics apply (last writer wins), which can cause control-loop fights.

Recommendation:

Do not overlap ownership of the same container requests/limits fields between PRP and PRA.

PRA patches the pods/resize subresource using a merge-style patch (not server-side apply). If another controller (or manual apply) writes the same fields, last writer wins.

Resize Safety and Application

Pod QoS class must not change during in-place resize; candidate updates that would change QoS are skipped and reported via event.
If memory resize may restart a container (for example resizePolicy.restartPolicy=RestartContainer), PRA emits an event to signal a restart may occur.
Scale up is skipped while resize is pending/deferred/infeasible; any scaling is skipped while resize is actively in progress.
Apply is gated by policy.after and policy.delay:
- after=containerReady (default): applies only when the target container is ready
- after=running: applies as soon as the pod is Running
- after=podReady: requires PodReady instead of container readiness

Events

PRA emits Kubernetes events for validation, metrics health, and scaling lifecycle. Common events include:

PodResourceAutoscalerMetricsUnavailable / PodResourceAutoscalerMetricsRecovered
PodResourceAutoscalerBaselineUnavailable / PodResourceAutoscalerBaselineAvailable
PodResourceAutoscalerScaleUp / PodResourceAutoscalerScaleDown
PodResourceAutoscalerResizeInProgress / PodResourceAutoscalerResizePending
PodResourceAutoscalerQoSInvariantBlocked
PodResourceAutoscalerResizeMayRestart
PodResourceAutoscalerScaleFailed

Validation

CRD-level validation enforces:

XOR target / selector
at least one side among cpu.requests|cpu.limits|memory.requests|memory.limits
scaleUpThreshold > scaleDownThreshold
percent range checks (thresholds and stepPercent)
consecutiveSamples >= 1
bounds supports only cpu and memory keys

Runtime validation/guards include:

parseable pollInterval, cooldown, and delay (if set)
after must be one of running|containerReady|podReady
selected container exists
baseline resources exist and are non-zero for configured sides
counter resets / missing samples are handled (skipped)

Quick Start

1. Create a cluster with in-place resize enabled

k3d cluster create dyn-resources --no-lb \
  --k3s-arg "--disable=traefik,servicelb@server:*" \
  --k3s-arg "--kube-apiserver-arg=feature-gates=InPlacePodVerticalScaling=true@server:*"

2. Deploy an app

kubectl create deployment my-app --image=ghcr.io/kedify/sample-minute-metrics:latest
kubectl rollout status deploy/my-app

3. Create PRA

cat <<EOF | kubectl apply -f -
apiVersion: keda.kedify.io/v1alpha1
kind: PodResourceAutoscaler
metadata:
  name: my-app-pra
spec:
  target:
    kind: deployment
    name: my-app
  containerName: my-app
  policy:
    pollInterval: 5s
    consecutiveSamples: 2
    cooldown: 30s
    cpu:
      requests:
        scaleUpThreshold: 1
        scaleDownThreshold: 0
        targetUtilization: 50
  bounds:
    cpu:
      requests:
        min: 100m
        max: "2"
        step: 100m
EOF

4. Verify

kubectl get pra -o wide
kubectl get events --sort-by=.lastTimestamp | grep PodResourceAutoscaler
kubectl get pod -l app=my-app -o jsonpath="{.items[0].spec.containers[0].resources}"

Metrics

PRA publishes internal Prometheus metrics (names prefixed with kedify_agent_pra_), including:

kedify_agent_pra_reconcile_count
kedify_agent_pra_resource_updates
kedify_agent_pra_total
kedify_agent_pra_error_total
kedify_agent_pra_scaling_active

Additional decision and sizing metrics are also exported:

kedify_agent_pra_scale_decisions_total
kedify_agent_pra_scale_blocked_total
kedify_agent_pra_scale_delta_absolute
kedify_agent_pra_scale_delta_percent
kedify_agent_pra_evaluation_duration_seconds
kedify_agent_pra_time_to_apply_seconds

Notes:

kedify_agent_pra_scaling_active is derived from lastScaleTime + cooldown window.
Delta histogram units: CPU absolute delta is measured in millicores; memory absolute delta is measured in MiB.

RBAC

PRA requires (at minimum):

Pods: get/list/watch
Pods/resize subresource: patch
Nodes proxy: get
PodResourceAutoscalers: get/list/watch/update/patch + status get/update/patch
Deployments/DaemonSets/StatefulSets: get/list/watch