-
Overview
- GKE's cluster autoscaler automatically resizes the number of nodes in a given node pool, based on resource requests rather than actual resource utilization of Pods
- Cluster autoscaler periodically checks the status of Pods and nodes, and takes action
- GKE's cluster autoscaler eliminates the need to manually add or remove nodes or over-provision in node pools
- Users specify a minimum and maximum size for the node pool, and the rest is automatic
- Do not enable Compute Engine autoscaling for managed instance groups for cluster nodes
- GKE's cluster autoscaler is separate from Compute Engine autoscaling
- Users or administrators are not manually managing nodes; it can override any manual node management operations
-
Interruption
- If resources are deleted or moved when autoscaling a cluster, workloads might experience transient disruption
- If a workload consists of a controller with a single replica, that replica's Pod might be rescheduled onto a different node if its current node is deleted
- Before enabling cluster autoscaler, design workloads to tolerate potential disruption or ensure that critical Pods are not interrupted
- To increase workload's tolerance to interruption, consider deploying workload using a controller with multiple replicas, such as a Deployment
- Autoscaler assumes all replicated Pods can be restarted on some other node, possibly causing a brief disruption
- If services are not disruption-tolerant, using cluster autoscaler is not recommended
-
Scheduling
- If Pods are unschedulable because there are not enough nodes in the node pool, cluster autoscaler adds nodes, up to the maximum size of the node pool
- If nodes are under-utilized, and all Pods could be scheduled even with fewer nodes in the node pool, Cluster autoscaler removes nodes, down to the minimum size of the node pool
- If the node cannot be drained gracefully after a timeout period (currently 10 minutes), the node is forcibly terminated
- The grace period is not configurable for GKE clusters
- If Pods have requested too few resources (or haven't changed the defaults, which might be insufficient) and nodes are experiencing shortages, cluster autoscaler does not correct the situation
- To ensure cluster autoscaler works as accurately as possible, make explicit resource requests for all workloads
-
Nodepool
- All nodes in a single node pool have the same set of labels
- Cluster autoscaler considers the relative cost of the instance types in the various pools, and attempts to expand the least expensive possible node pool
- To reduced cost, node pools containing preemptible VMs is taken into account
- Labels manually added after initial cluster or node pool creation are not tracked
- Nodes created by cluster autoscaler are assigned labels specified with --node-labels at the time of node pool creation
- If node pool contains multiple managed instance groups with the same instance type, cluster autoscaler attempts to keep these managed instance group sizes balanced when scaling up
- This can help prevent an uneven distribution of nodes among managed instance groups in multiple zones of a node pool
- Cluster autoscaler only balances across zones during a scale-up event
- Cluster autoscaler scales down underutilized nodes regardless of the relative sizes of underlying managed instance groups in a node pool which can cause the nodes to be distributed unevenly across zones
- The minimum and maximum size for each node pool in a cluster can be specified, and cluster autoscaler makes rescaling decisions within these boundaries
- If the current node pool size is lower than the specified minimum or greater than the specified maximum when you enable autoscaling, the autoscaler waits to take effect until a new node is needed in the node pool or until a node can be safely deleted from the node pool
- If a minimum of zero nodes is specified, an idle node pool can scale down completely.
- At least one node must always be available in the cluster to run system Pods
- When clusters are autoscaled, node pool scaling limits are determined by zone availability
-
Scaling
- The decision of when to remove a node is a trade-off between optimizing for utilization or the availability of resources
- Removing underutilized nodes improves cluster utilization, but new workloads might have to wait for resources to be provisioned again before they can run
- Balanced is the default profile autoscaling profile
- Prioritize optimizing utilization over keeping spare resources in the cluster
- When optimize-utilization is enabled, Cluster Autoscaler will scale down the cluster more aggressively
- This profile has been optimized for use with batch workloads that are not sensitive to start-up latency
- It is not recommended to use the optimize-utilization profile with serving workloads
- When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods
- Scheduling and eviction restrictions can prevent a node from being deleted by the autoscaler
- A node's deletion could be prevented if the Pod's affinity or anti-affinity rules prevent rescheduling
- The Pod has local storage
- The Pod is not managed by a Controller such as a Deployment, StatefulSet, Job or ReplicaSet
- An application's PodDisruptionBudget can prevent autoscaling if deleting nodes would cause the budget to be exceeded
- Cluster autoscaler supports up to 1000 nodes running 30 Pods each
- When scaling down, cluster autoscaler honors a graceful termination period of 10 minutes for rescheduling the node's Pods onto a different node before forcibly terminating the node
- Where there is no trigger for system Pods to be moved to a different node, cluster autoscaler might not scale down completely, and an extra node may exist after scaling down
- To work around this limitation, configure a Pod disruption budget
-
VPA
- Vertical pod autoscaling (VPA) frees users from having to think about what values to specify for a container's CPU and memory requests
- Vertical pod autoscaler can recommend values for CPU and memory requests and limits, or it can automatically update the values
- Cluster nodes are used efficiently, because Pods use exactly what they need
- Pods are scheduled onto nodes that have the appropriate resources available
- No need to run time-consuming benchmarking tasks to determine the correct values for CPU and memory requests
- Maintenance time is reduced, because the autoscaler can adjust CPU and memory requests over time without any action on the operator’s part
- Vertical Pod Autoscaling is supported on regional clusters
- Vertical Pod Autoscaling should not be used with Horizontal Pod Autoscaling (HPA) on CPU or memory
- VPA can be used with HPA on custom and external metrics
- Vertical Pod Autoscaler is not yet ready for use with Java workloads due to limited visibility into actual memory usage of the workload
- Vertical Pod Autoscaler cannot automatically apply recommendations for injected sidecars
- sidecars must be opted out using the container resource policy in VPA object for a Pod with injected sidecars with an 'updateMode' other than "Off"
- Opt out the Istio sidecar by extending VerticalPodAutoscaler spec definition
- Due to Kubernetes limitations, the only way to modify the resource requests of a running Pod is to recreate the Pod
- With an updateMode of "Auto", the VerticalPodAutoscaler evicts a Pod if it needs to change the Pod's resource requests
- To limit the amount of Pod restarts, use a pod disruption budget
- To make sure that a cluster can handle the new sizes of workloads, use Cluster Autoscaler and Node Autoprovisioning
- Vertical Pod Autoscaler notifies Cluster Autoscaler ahead of the update, and the resources needed for the resized workload are provided before recreating it, to minimize the disruption time