1. Overview
    1. GKE's cluster autoscaler automatically resizes the number of nodes in a given node pool, based on resource requests rather than actual resource utilization of Pods
    2. Cluster autoscaler periodically checks the status of Pods and nodes, and takes action
    3. GKE's cluster autoscaler eliminates the need to manually add or remove nodes or over-provision in node pools
    4. Users specify a minimum and maximum size for the node pool, and the rest is automatic
    5. Do not enable Compute Engine autoscaling for managed instance groups for cluster nodes
    6. GKE's cluster autoscaler is separate from Compute Engine autoscaling
    7. Users or administrators are not manually managing nodes; it can override any manual node management operations
  2. Interruption
    1. If resources are deleted or moved when autoscaling a cluster, workloads might experience transient disruption
    2. If a workload consists of a controller with a single replica, that replica's Pod might be rescheduled onto a different node if its current node is deleted
    3. Before enabling cluster autoscaler, design workloads to tolerate potential disruption or ensure that critical Pods are not interrupted
    4. To increase workload's tolerance to interruption, consider deploying workload using a controller with multiple replicas, such as a Deployment
    5. Autoscaler assumes all replicated Pods can be restarted on some other node, possibly causing a brief disruption
    6. If services are not disruption-tolerant, using cluster autoscaler is not recommended
  3. Scheduling
    1. If Pods are unschedulable because there are not enough nodes in the node pool, cluster autoscaler adds nodes, up to the maximum size of the node pool
    2. If nodes are under-utilized, and all Pods could be scheduled even with fewer nodes in the node pool, Cluster autoscaler removes nodes, down to the minimum size of the node pool
    3. If the node cannot be drained gracefully after a timeout period (currently 10 minutes), the node is forcibly terminated
    4. The grace period is not configurable for GKE clusters
    5. If Pods have requested too few resources (or haven't changed the defaults, which might be insufficient) and nodes are experiencing shortages, cluster autoscaler does not correct the situation
    6. To ensure cluster autoscaler works as accurately as possible, make explicit resource requests for all workloads
  4. Nodepool
    1. All nodes in a single node pool have the same set of labels
    2. Cluster autoscaler considers the relative cost of the instance types in the various pools, and attempts to expand the least expensive possible node pool
    3. To reduced cost, node pools containing preemptible VMs is taken into account
    4. Labels manually added after initial cluster or node pool creation are not tracked
    5. Nodes created by cluster autoscaler are assigned labels specified with --node-labels at the time of node pool creation
    6. If node pool contains multiple managed instance groups with the same instance type, cluster autoscaler attempts to keep these managed instance group sizes balanced when scaling up
    7. This can help prevent an uneven distribution of nodes among managed instance groups in multiple zones of a node pool
    8. Cluster autoscaler only balances across zones during a scale-up event
    9. Cluster autoscaler scales down underutilized nodes regardless of the relative sizes of underlying managed instance groups in a node pool which can cause the nodes to be distributed unevenly across zones
    10. The minimum and maximum size for each node pool in a cluster can be specified, and cluster autoscaler makes rescaling decisions within these boundaries
    11. If the current node pool size is lower than the specified minimum or greater than the specified maximum when you enable autoscaling, the autoscaler waits to take effect until a new node is needed in the node pool or until a node can be safely deleted from the node pool
    12. If a minimum of zero nodes is specified, an idle node pool can scale down completely.
    13. At least one node must always be available in the cluster to run system Pods
    14. When clusters are autoscaled, node pool scaling limits are determined by zone availability
  5. Scaling
    1. The decision of when to remove a node is a trade-off between optimizing for utilization or the availability of resources
    2. Removing underutilized nodes improves cluster utilization, but new workloads might have to wait for resources to be provisioned again before they can run
    3. Balanced is the default profile autoscaling profile
    4. Prioritize optimizing utilization over keeping spare resources in the cluster
    5. When optimize-utilization is enabled, Cluster Autoscaler will scale down the cluster more aggressively
    6. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency
    7. It is not recommended to use the optimize-utilization profile with serving workloads
    8. When scaling down, cluster autoscaler respects scheduling and eviction rules set on Pods
    9. Scheduling and eviction restrictions can prevent a node from being deleted by the autoscaler
    10. A node's deletion could be prevented if the Pod's affinity or anti-affinity rules prevent rescheduling
    11. The Pod has local storage
    12. The Pod is not managed by a Controller such as a Deployment, StatefulSet, Job or ReplicaSet
    13. An application's PodDisruptionBudget can prevent autoscaling if deleting nodes would cause the budget to be exceeded
    14. Cluster autoscaler supports up to 1000 nodes running 30 Pods each
    15. When scaling down, cluster autoscaler honors a graceful termination period of 10 minutes for rescheduling the node's Pods onto a different node before forcibly terminating the node
    16. Where there is no trigger for system Pods to be moved to a different node, cluster autoscaler might not scale down completely, and an extra node may exist after scaling down
    17. To work around this limitation, configure a Pod disruption budget
  6. VPA
    1. Vertical pod autoscaling (VPA) frees users from having to think about what values to specify for a container's CPU and memory requests
    2. Vertical pod autoscaler can recommend values for CPU and memory requests and limits, or it can automatically update the values
    3. Cluster nodes are used efficiently, because Pods use exactly what they need
    4. Pods are scheduled onto nodes that have the appropriate resources available
    5. No need to run time-consuming benchmarking tasks to determine the correct values for CPU and memory requests
    6. Maintenance time is reduced, because the autoscaler can adjust CPU and memory requests over time without any action on the operator’s part
    7. Vertical Pod Autoscaling is supported on regional clusters
    8. Vertical Pod Autoscaling should not be used with Horizontal Pod Autoscaling (HPA) on CPU or memory
    9. VPA can be used with HPA on custom and external metrics
    10. Vertical Pod Autoscaler is not yet ready for use with Java workloads due to limited visibility into actual memory usage of the workload
    11. Vertical Pod Autoscaler cannot automatically apply recommendations for injected sidecars
    12. sidecars must be opted out using the container resource policy in VPA object for a Pod with injected sidecars with an 'updateMode' other than "Off"
    13. Opt out the Istio sidecar by extending VerticalPodAutoscaler spec definition
    14. Due to Kubernetes limitations, the only way to modify the resource requests of a running Pod is to recreate the Pod
    15. With an updateMode of "Auto", the VerticalPodAutoscaler evicts a Pod if it needs to change the Pod's resource requests
    16. To limit the amount of Pod restarts, use a pod disruption budget
    17. To make sure that a cluster can handle the new sizes of workloads, use Cluster Autoscaler and Node Autoprovisioning
    18. Vertical Pod Autoscaler notifies Cluster Autoscaler ahead of the update, and the resources needed for the resized workload are provided before recreating it, to minimize the disruption time