Pod Resource Autoscaler
Pod Resource Autoscaler (PRA) continuously adjusts CPU and/or memory requests/limits for a single container in matching pods, based on observed utilization. Unlike horizontal autoscaling, PRA does not change replica count; it resizes container resources in place.
In-place Updates
Section titled “In-place Updates”PRA applies resource changes through the pods/resize subresource. If in-place resize is not supported by your cluster, resize attempts will fail and PRA will emit warning events.
Relevant Kubernetes docs:
- https://kubernetes.io/blog/2023/05/12/in-place-pod-resize-alpha/
- https://kubernetes.io/docs/concepts/workloads/autoscaling/#requirements-for-in-place-resizing
Prerequisites
Section titled “Prerequisites”- Kubernetes API access to
nodes/proxy(PRA calls kubelet/stats/summarythrough the apiserver node proxy). - Cluster support for in-place pod resize (required for updates to take practical effect).
Enablement and Pod Opt-in
Section titled “Enablement and Pod Opt-in”PRA is available via the Kedify Agent and can be enabled or disabled using the environment variable PRA_ENABLED (defaults to false).
By default, all pods matched by a PodResourceAutoscaler are eligible for reconciliation.
If you want explicit per-pod opt-in, set the environment variable PRA_REQUIRES_ANNOTATED_PODS on Kedify Agent to true
Both flags can be set in the Kedify Agent Helm Values file:
# to enable PRA set:agent.features.podResourceAutoscalersEnabled=true
# to require annotated pods set:agent.features.praRequiresAnnotatedPods=falseWhen annotation requirement is enabled, pods must include:
pra.kedify.io/reconcile: enabledPod Resource Autoscaler (PRA) CRD
Section titled “Pod Resource Autoscaler (PRA) CRD”The custom resource uses:
apiVersion: keda.kedify.io/v1alpha1kind: PodResourceAutoscalerSpec Overview
Section titled “Spec Overview”At a high level, a PRA describes:
- which pods to manage (
.spec.selectoror.spec.target) - which container to resize (
.spec.containerName) - how to evaluate and apply resize actions (
.spec.policy) - optional bounds/step caps (
.spec.bounds)
You must specify exactly one of .spec.selector or .spec.target.
Spec fields
Section titled “Spec fields”paused: optional bool (defaultfalse)priority: optional int (default0)target: optional workload reference (XOR withselector)selector: optional label selector (XOR withtarget)containerName: required string; the single container per pod to resizepolicy: required; evaluation and sizing rulesbounds: optional; per-resource bounds and step caps forrequestsand/orlimits
Status fields
Section titled “Status fields”conditions:Ready,Active,Scaling,MetricsobservedGenerationeffectiveSelectorlastScaleTimelastScaleAction:scaleUp|scaleDownlastScaleResourceChange: compact before/after delta from the last scale actionscaling: boolean derived fromlastScaleTime + cooldownwindow (cleared after cooldown elapses by periodic runtime sync)
Example
Section titled “Example”apiVersion: keda.kedify.io/v1alpha1kind: PodResourceAutoscalermetadata: name: load-generatorspec: target: kind: deployment name: load-generator containerName: load-generator
policy: pollInterval: 5s consecutiveSamples: 3 cooldown: 2m after: containerReady delay: 10s
cpu: requests: scaleUpThreshold: 75 scaleDownThreshold: 60 targetUtilization: 70 limits: scaleUpThreshold: 85 scaleDownThreshold: 70 targetUtilization: 75 memory: requests: scaleUpThreshold: 75 scaleDownThreshold: 60 targetUtilization: 70 limits: scaleUpThreshold: 85 scaleDownThreshold: 70 targetUtilization: 75
bounds: cpu: requests: min: 100m max: "2" step: 100m stepPercent: 25 limits: min: 100m max: "3" step: 100m stepPercent: 25 memory: requests: min: 96Mi max: 2Gi step: 64Mi stepPercent: 25 limits: min: 300Mi max: 3Gi step: 64Mi stepPercent: 25Addressing Pods
Section titled “Addressing Pods”You can target pods in two mutually exclusive ways:
.spec.selector.spec.target
Exactly one of these must be set.
Allowed target kinds:
deploymentstatefulsetdaemonset
When using .spec.selector, the selector uses the same shape as deployment.spec.selector.
If multiple PRAs match the same pod, the winner selection is deterministic:
- highest
priority, then earliest next-eligible time, then lexicographic PRA name
Controller Behavior
Section titled “Controller Behavior”- Watches
PodResourceAutoscalerandPod. - Tracks matching pods keyed by pod UID.
- Groups tracked pods by node and runs one poller per node.
- The poller fetches kubelet
/stats/summaryonce per cycle and evaluates only tracked pods on that node.
Polling Semantics
Section titled “Polling Semantics”- Node poll interval is the minimum
pollIntervalamong tracked pods on that node. - Even when node polling is more frequent, each pod is evaluated only when its own
pollIntervalhas elapsed. - Apply gating (
after/delay) controls patch application, not metric sampling.
Scaling Policy
Section titled “Scaling Policy”Top-level policy fields:
pollInterval: per-pod evaluation interval (node poll interval becomes the minimum interval among tracked pods on a node).consecutiveSamples: required, must be>= 1cooldown: required duration (minimum time between resize actions for the pod)after: optional apply gate, one ofrunning|containerReady|podReady(defaultcontainerReady)delay: optional duration before apply (default15s)
Resource policies are optional and can be configured per resource and per “side”:
policy.cpu.requests/policy.cpu.limitspolicy.memory.requests/policy.memory.limits
Each utilization policy includes:
scaleUpThreshold(percent)scaleDownThreshold(percent)targetUtilization(percent)consecutiveSamples(optional override)cooldown(optional override)
Notes:
- If you specify
policy.cpu, you must configurerequestsand/orlimitsunder it. - If you specify
policy.memory, you must configurerequestsand/orlimitsunder it. scaleUpThresholdmust be greater thanscaleDownThreshold.
At least one of policy.cpu.requests|policy.cpu.limits|policy.memory.requests|policy.memory.limits must be set.
How scaling decisions work
Section titled “How scaling decisions work”- CPU usage is derived from kubelet cumulative counters (rate over time). Memory usage is derived from kubelet working set bytes.
- For each configured side, utilization is computed as
usage / baseline * 100, where the baseline is the container’s request or limit for that side. - Utilization is evaluated independently for each configured side.
- Scale up if any configured side requests scale up.
- Scale down only if no side requests scale up and at least one side requests scale down.
- Desired values are computed from
targetUtilizationand current usage (separately per side), usingdesired = usage / (targetUtilization / 100). requests <= limitsis always enforced.- Resize apply is gated by
afteranddelay, and skipped while a pod resize is pending/in progress.
Additional details:
-
CPU needs an initial warm-up sample before a first rate can be computed.
-
Threshold checks are inclusive (
>=for up,<=for down). -
CPU cores used are computed as:
cores_used = ((cpu_ns_now - cpu_ns_prev) / 1e9) / dt_seconds
Resource Bounds and Step Control
Section titled “Resource Bounds and Step Control”.spec.bounds is optional. If bounds are omitted for a managed side, scaling still works without extra clamping or step caps.
Bounds are configured per resource and per side:
bounds.cpu.requests/bounds.cpu.limitsbounds.memory.requests/bounds.memory.limits
Each side supports:
minmaxstep(absolute max delta per action)stepPercent(relative max delta per action)
If both step and stepPercent are set, the smaller cap is applied.
Pause / Resume
Section titled “Pause / Resume”Set:
spec: paused: trueWhen paused, PRA stops tracking/scaling for that autoscaler and resumes on unpause.
While paused, PRA status conditions are set to:
Ready=True(reason=Paused)Active=False(reason=Paused)Scaling=False(reason=Paused)
PRP and PRA Together
Section titled “PRP and PRA Together”PRP and PRA can both write pod resources. If both target the same container/resource fields, regular patch semantics apply (last writer wins), which can cause control-loop fights.
Recommendation:
- Do not overlap ownership of the same container
requests/limitsfields between PRP and PRA.
PRA patches the pods/resize subresource using a merge-style patch (not server-side apply). If another controller (or manual apply) writes the same fields, last writer wins.
Resize Safety and Application
Section titled “Resize Safety and Application”- Pod QoS class must not change during in-place resize; candidate updates that would change QoS are skipped and reported via event.
- If memory resize may restart a container (for example
resizePolicy.restartPolicy=RestartContainer), PRA emits an event to signal a restart may occur. - Scale up is skipped while resize is pending/deferred/infeasible; any scaling is skipped while resize is actively in progress.
- Apply is gated by
policy.afterandpolicy.delay:after=containerReady(default): applies only when the target container is readyafter=running: applies as soon as the pod is Runningafter=podReady: requires PodReady instead of container readiness
Events
Section titled “Events”PRA emits Kubernetes events for validation, metrics health, and scaling lifecycle. Common events include:
PodResourceAutoscalerMetricsUnavailable/PodResourceAutoscalerMetricsRecoveredPodResourceAutoscalerBaselineUnavailable/PodResourceAutoscalerBaselineAvailablePodResourceAutoscalerScaleUp/PodResourceAutoscalerScaleDownPodResourceAutoscalerResizeInProgress/PodResourceAutoscalerResizePendingPodResourceAutoscalerQoSInvariantBlockedPodResourceAutoscalerResizeMayRestartPodResourceAutoscalerScaleFailed
Validation
Section titled “Validation”CRD-level validation enforces:
- XOR
target/selector - at least one side among
cpu.requests|cpu.limits|memory.requests|memory.limits scaleUpThreshold > scaleDownThreshold- percent range checks (thresholds and
stepPercent) consecutiveSamples >= 1boundssupports onlycpuandmemorykeys
Runtime validation/guards include:
- parseable
pollInterval,cooldown, anddelay(if set) aftermust be one ofrunning|containerReady|podReady- selected container exists
- baseline resources exist and are non-zero for configured sides
- counter resets / missing samples are handled (skipped)
Quick Start
Section titled “Quick Start”1. Create a cluster with in-place resize enabled
Section titled “1. Create a cluster with in-place resize enabled”k3d cluster create dyn-resources --no-lb \ --k3s-arg "--disable=traefik,servicelb@server:*" \ --k3s-arg "--kube-apiserver-arg=feature-gates=InPlacePodVerticalScaling=true@server:*"2. Deploy an app
Section titled “2. Deploy an app”kubectl create deployment my-app --image=ghcr.io/kedify/sample-minute-metrics:latestkubectl rollout status deploy/my-app3. Create PRA
Section titled “3. Create PRA”cat <<EOF | kubectl apply -f -apiVersion: keda.kedify.io/v1alpha1kind: PodResourceAutoscalermetadata: name: my-app-praspec: target: kind: deployment name: my-app containerName: my-app policy: pollInterval: 5s consecutiveSamples: 2 cooldown: 30s cpu: requests: scaleUpThreshold: 1 scaleDownThreshold: 0 targetUtilization: 50 bounds: cpu: requests: min: 100m max: "2" step: 100mEOF4. Verify
Section titled “4. Verify”kubectl get pra -o widekubectl get events --sort-by=.lastTimestamp | grep PodResourceAutoscalerkubectl get pod -l app=my-app -o jsonpath="{.items[0].spec.containers[0].resources}"Metrics
Section titled “Metrics”PRA publishes internal Prometheus metrics (names prefixed with kedify_agent_pra_), including:
kedify_agent_pra_reconcile_countkedify_agent_pra_resource_updateskedify_agent_pra_totalkedify_agent_pra_error_totalkedify_agent_pra_scaling_active
Additional decision and sizing metrics are also exported:
kedify_agent_pra_scale_decisions_totalkedify_agent_pra_scale_blocked_totalkedify_agent_pra_scale_delta_absolutekedify_agent_pra_scale_delta_percentkedify_agent_pra_evaluation_duration_secondskedify_agent_pra_time_to_apply_seconds
Notes:
kedify_agent_pra_scaling_activeis derived fromlastScaleTime + cooldownwindow.- Delta histogram units: CPU absolute delta is measured in millicores; memory absolute delta is measured in MiB.
PRA requires (at minimum):
- Pods:
get/list/watch - Pods/resize subresource:
patch - Nodes proxy:
get - PodResourceAutoscalers:
get/list/watch/update/patch+status get/update/patch - Deployments/DaemonSets/StatefulSets:
get/list/watch