1. Overview
    1. A single-zone cluster has a single control plane (master) running in one zone
    2. A single-zone cluster control plane manages workloads on nodes running in the same zone
    3. A multi-zonal cluster has a single replica of the control plane running in a single zone, and has nodes running in multiple zones
    4. During an upgrade of the cluster or an outage of the zone where the control plane runs, workloads still run
    5. The cluster, its nodes, and its workloads cannot be configured until the control plane is available
    6. Multi-zonal clusters balance availability and cost for consistent workloads
    7. Use regional clusters to maintain availability, and if the number of nodes and node pools are changing frequently
    8. A regional cluster has multiple replicas of the control plane, running in multiple zones within a given region
    9. Nodes also run in each zone where a replica of the control plane runs
    10. Because a regional cluster replicates the control plane and nodes, it consumes more Compute Engine resources than a single-zone or multi-zonal cluster
    11. Choose the cluster's specific Kubernetes version or make choices about its overall mix of stability and features
    12. Enroll the cluster in a release channel where required stability is known
    13. Google automatically upgrades the cluster and its nodes when an update is available in that release channel.
    14. The Rapid channel receives multiple updates a month, while the Stable channel only receives a few updates a year
    15. Where release channel is not used or a cluster version is not selected, the current default version is used
    16. The default version is selected based on stability and real-world performance, and is changed regularly
    17. A specific supported version of Kubernetes can be specified for a given workload when creating the cluster
    18. Where there is no need to control the specific patch version, enroll cluster in a release channel instead of managing its version directly
    19. An alpha cluster has all Kubernetes alpha APIs (feature gates) enabled
    20. Alpha clusters can be used for early testing and validation of Kubernetes features
    21. Alpha clusters are not supported for production workloads
    22. Alpha clusters cannot be upgraded, and expire within 30 days
    23. GKE clusters can be distinguished according to the way they route traffic from one Pod to another Pod
    24. A cluster that uses Alias IPs is called a VPC-native cluster
    25. A cluster that uses Google Cloud Routes is called a routes-based cluster
    26. VPC-native is the recommended network mode for new clusters
    27. The default cluster network mode depends on the way the cluster is created
    28. Access from public networks to cluster's workloads can be configured
    29. Routes are not created automatically
    30. Private clusters assign internal RFC 1918 IP addresses to Pods and nodes, and workloads are completely isolated from public networks
    31. Binary Authorization provides software supply-chain security to GKE workloads
    32. Binary Authorization works with images deployed to GKE from Container Registry or another container image registry
    33. Binary Authorization can be used to ensure that internal processes that safeguard the quality and integrity of software have successfully completed before an application is deployed to a production environment
  2. Alpha Clusters
    1. Alpha clusters can be used just like normal GKE clusters
    2. Alpha clusters are short-lived clusters that run stable Kubernetes releases with all Kubernetes APIs and features enabled
    3. Alpha clusters are designed for advanced users and early adopters to experiment with workloads that take advantage of new features before those features are production-ready
    4. Alpha clusters default to running the current default version of Kubernetes
    5. Do not use Alpha clusters or alpha features for production workloads
    6. Alpha clusters expire after thirty days and do not receive security updates
    7. Users must migrate data from alpha clusters before they expire
    8. GKE does not automatically save data stored on alpha clusters
    9. Users can experiment with Kubernetes alpha features by creating an alpha cluster
    10. Users can specify a different version during cluster creation
    11. Alpha clusters are not covered by the GKE SLA
    12. Alpha clusters cannot be upgraded
    13. Node auto-upgrade and auto-repair are disabled on alpha clusters
    14. Automatically deleted after 30 days
    15. Do not receive security updates
    16. Alpha clusters do not necessarily run "alpha" versions of GKE
    17. The term alpha cluster means that alpha APIs are enabled, both for Kubernetes and GKE, regardless of the version of Kubernetes the cluster runs
    18. Periodically, Google offers customers the ability to test GKE versions that are not generally available, for testing and validation
    19. Early-access GKE versions can be run as alpha clusters or as clusters without the Kubernetes alpha APIs enabled
    20. Most Kubernetes releases contain new Alpha features that can be tested in alpha clusters
    21. New Kubernetes features are introduced as preview or generally available
    22. To ensure stability and production quality, normal GKE clusters only enable features that are beta or higher
    23. Alpha features are not enabled on normal clusters because they are not production-ready or upgradeable
    24. Since GKE automatically upgrades the Kubernetes control plane, enabling alpha features in production could jeopardize the reliability of the cluster if there are breaking changes in a new version
  3. Regional clusters
    1. By default, a cluster's control plane (master) and nodes all run in a single compute zone, specified when the cluster is created
    2. Regional clusters increase the availability of both a cluster's control plane (master) and its nodes by replicating them across multiple zones of a region
    3. Regional clusters provides the advantages of multi-zonal clusters
    4. If one or more (but not all) zones in a region experience an outage, the cluster's control plane remains accessible as long as one replica of the control plane remains available
    5. During cluster maintenance such as a cluster upgrade, only one replica of the control plane is unavailable at a time, and the cluster is still operational
    6. By default, the control plane and each node pool is replicated across three zones of a region, but users can customize the number of replicas
    7. It is not possible to modify whether a cluster is zonal, multi-zonal, or regional after creating the cluster
    8. Regional clusters replicate cluster masters and nodes across multiple zones within a single region
    9. In the event of an infrastructure outage, workloads continue to run, and nodes can be rebalanced manually or by using the cluster autoscaler
    10. Regional clusters are available across a region rather than a single zone within a region
    11. If a single zone becomes unavailable, the Kubernetes control plane and resources are not impacted
    12. Regional clusters experience zero downtime master upgrades, master resize, and reduced downtime from master failures
    13. Regional clusters provide a high availability control plane, so users can access the control plane even during upgrades
    14. By default, regional clusters consist of nine nodes spread evenly across three zones in a region.
    15. This consumes nine IP addresses
    16. The number of nodes can be reduced down to one per zone, if desired
    17. Newly created Google Cloud accounts are granted only eight IP addresses per region, so it may be necessary to request an increase in quotas for regional in-use IP addresses, depending on the size of the regional cluster
    18. If there are too few available in-use IP addresses, cluster creation fails
    19. For regional clusters that run GPUs, users must choose a region or zones that have GPUs
    20. Node pools cannot be created in zones outside of the cluster's zones
    21. Changing a cluster's zones causes all new and existing nodes to span new zones
    22. Regional clusters are offered at no additional charge
    23. Using regional clusters requires more of a project's regional quotas than a similar zonal or multi-zonal cluster
    24. Understand quotas and Google Kubernetes Engine pricing before using regional clusters
    25. An Insufficient regional quota to satisfy request for resource error, implies the request exceeds the available quota in the current region
    26. Node-to-node traffic across zones is charged
    27. If a workload running in one zone needs to communicate with a workload in a different zone, the cross-zone traffic incurs cost
    28. Persistent storage disks are zonal resources
    29. When a persistent storage is added to a cluster, unless a zone is specified, GKE assigns the disk to a single zone
    30. GKE chooses the zone at random
    31. When using a StatefulSet, the provisioned persistent disks for each replica are spread across zones
    32. Once a persistent disk is provisioned, any Pods referencing the disk are scheduled to the same zone as the disk
    33. A read-write persistent disk cannot be attached to multiple nodes
    34. To maintain capacity in the unlikely event of zonal failure, users can allow GKE to overprovision scaling limits, to guarantee a minimum level of availability even when some zones are unavailable
    35. This is accomplished by specifying a maximum of six nodes per zone rather than four. If one zone fails, the cluster scales to 12 nodes in the remaining zones
    36. Similarly, if a two-zone cluster is overprovisioned to 200%, 100% of traffic is rerouted if half of the cluster's capacity is lost
  4. Private clusters
    1. Private clusters provides the ability to isolate nodes from having inbound and outbound connectivity to the public internet
    2. Private clusters is achieved due to the nodes having internal RFC 1918 IP addresses only
    3. Outbound internet access for certain private nodes can be achieved using Cloud NAT or a self managed NAT gateway
    4. Even though the node IP addresses are private, external clients can reach Services in the cluster
    5. A Service of type LoadBalancer can be used to enable external clients to call the IP address of the load balancer
    6. A Service of type NodePort can be used to create an Ingress
    7. GKE uses information in the Service and the Ingress to configure an HTTP(S) load balancer
    8. External clients can call the external IP address of the HTTP(S) load balancer
    9. By default, Private Google Access is enabled
    10. Private Google Access provides private nodes and their workloads with limited outbound access to Google Cloud APIs and services over Google's private network
    11. Private Google Access makes it possible for private nodes to pull container images from Google Container Registry, and to send logs to the Cloud Operations stack
    12. Every GKE cluster has a Kubernetes API server called the master
    13. The master is in a Google-owned project that is separate from the user project
    14. The master runs on a VM that is in a VPC network in the Google-owned project
    15. A regional cluster has multiple masters, each of which runs on its own VM
    16. In private clusters, the master's VPC network is connected to the cluster's VPC network with VPC Network Peering
    17. VPC network contains the cluster nodes, and a separate Google Cloud VPC network contains the cluster's master
    18. The master's VPC network is located in a project controlled by Google
    19. The user's VPC network and the master's VPC network are connected using VPC Network Peering
    20. Traffic between nodes and the master is routed entirely using internal IP addresses
    21. All newly created private clusters automatically reuse existing VPC Network Peering connections
    22. The first zonal or regional private cluster generates a new VPC Network Peering connection
    23. Additional private clusters in the same zone or region and network can use the same peering, without the need to create any additional VPC Network Peering connections
    24. The master for a private cluster has a private endpoint in addition to a public endpoint
    25. The master for a non-private cluster only has a public endpoint
    26. The private endpoint is an internal IP address in the master's VPC network
    27. In a private cluster, nodes always communicate with the master's private endpoint
    28. Depending on the configuration, the cluster can be managed with tools like kubectl that connect to the private endpoint
    29. Any VM that uses the same subnet that the private cluster uses can also access the private endpoint
    30. Public endpoint is the external IP address of the master
    31. By default, tools like kubectl communicate with the master on its public endpoint
    32. Access to the master endpoing can be controlled using master authorized networks or users can disable access to the public endpoint
    33. Disabling public endpoint access is the most secure option as it prevents all internet access to the master
    34. This is a good choice where the on-premises network has been connected to Google Cloud using Cloud Interconnect and Cloud VPN
    35. Cloud Interconnect and Cloud VPN connect a company network to the VPC without the traffic having to traverse the public internet
    36. With public endpoint access disabled, master authorized networks must be configured for the private endpoint
    37. Without master authorized networks, users can only connect to the private endpoint from cluster nodes or VMs in the same subnet as the cluster
    38. Master authorized networks must be RFC 1918 IP addresses
    39. Even if a customer disables access to the public endpoint, Google can use the master's public endpoint for cluster management purposes, such as scheduled maintenance and automatic master upgrades
    40. Options are available to enable public endpoint and master authorized networks access
    41. Using private clusters with master authorized networks enabled provides restricted access to the master from source IP addresses
    42. Master authorized networks is a good choice where there is no existing VPN infrastructure, remote users or branch offices that connect over the public internet
    43. Public endpoint access enabled, master authorized networks disabled is the default and least restrictive option
    44. Since master authorized networks are not enabled, the cluster can be administered from any source IP address as long as the user is authenticated
  5. Scalability
    1. In a Kubernetes cluster, scalability refers to the ability of the cluster to grow while staying within its service-level objectives (SLOs)
    2. Kubernetes also has its own set of SLOs
    3. Regional clusters are better suited for high availability
    4. Regional clusters have multiple master nodes across multiple compute zones in a region
    5. Zonal clusters have one master node in a single compute zone
    6. If a zonal cluster is upgraded, the single master VM experiences downtime during which the Kubernetes API is not available until the upgrade is complete
    7. In regional clusters, the control plane remains available during cluster maintenance like rotating IPs, upgrading master VMs, or resizing clusters or node pools
    8. When upgrading a regional cluster, two out of three master VMs are always running during the rolling upgrade, so the Kubernetes API is still available
    9. A single-zone outage won't cause any downtime in the regional control plane
    10. Changes to the cluster's configuration take longer because they must propagate across all masters in a regional cluster instead of the single control plane in zonal clusters
    11. If VMs cannot be created in one of the zones, whether from a lack of capacity or other transient problem, clusters cannot be created or upgraded
    12. Use zonal clusters to create or upgrade clusters rapidly when availability is less of a concern
    13. Use regional clusters when availability is more important than flexibility
    14. Carefully select the cluster type when creating a cluster because it cannot change it after the cluster is created
    15. Migrating production traffic between clusters is possible but difficult at scale
    16. Use regional clusters for production workloads clusters as they offer higher availability than zonal clusters
    17. To achieve high availability, the Kubernetes control plane and its nodes need to be spread across different zones
    18. GKE offers zonal and multi-zonal node pools
    19. To deploy a highly available application, distribute workloads across multiple compute zones in a region by using multi-zonal node pools which distribute nodes uniformly across zones
    20. When using cluster autoscaler with multi-zonal node pools, nodes are not guaranteed to be spread equally among zones
    21. If all nodes are in the same zone, Pods can’t be scheduled if that zone becomes unreachable
    22. GPUs are available only in specific zones. It may not be possible to get them in all zones in the region
    23. Round-trip latency between locations within a single region is expected to stay below 1ms on the 95th percentile
    24. The difference in traffic latency between zonal and intrazonal traffic should be negligible
    25. The price of egress traffic between zones in the same region is available on the Compute Engine pricing page
    26. Kubernetes workloads require networking, compute, and storage
    27. Enough CPU and memory is required to run Pods
    28. There are more parameters of underlying infrastructure that can influence performance and scalability of a GKE cluster
    29. GKE offers route-based and a newer VPC-native
    30. With routes-based cluster, each time a node is added, a custom route is added to the routing table in the VPC network
    31. GKE clusters with route-based networking can not scale above 2000 nodes
    32. In the VPC-native cluster mode, the VPC network has a secondary range for all Pod IP addresses
    33. Each node is assigned a slice of the secondary range for its own Pod IP addresses
    34. This allows the VPC network to natively understand how to route traffic to Pods without relying on custom routes
    35. VPC-native clusters are the networking default and are recommended to accommodate large workflows
    36. They scale to a larger number of nodes and allow better interaction with other Google Cloud products
    37. A VPC-native cluster uses the primary IP range for nodes and two secondary IP ranges for Pods and Services
    38. The maximum number of nodes in VPC-native clusters can be limited by available IP addresses
    39. The number of nodes is determined by both the primary range (node subnet) and the secondary range (Pod subnet)
    40. The maximum number of Pods and Services is determined by the size of the cluster's secondary ranges, Pod subnet and Service subnet, respectively
    41. The Pod secondary range defaults to /14 (262,144 IP addresses)
    42. Each node has /24 range assigned for its Pods (256 IP addresses for its Pods)
    43. The node's subnet is /20 (4092 IP addresses)
    44. There must be enough addresses in both ranges (node and Pod) to provision a new node
    45. With defaults, only 1024 can be created due to the number of Pod IPs
    46. By default there can be a maximum of 110 Pods per node, and each node in the cluster has allocated /24 range for its Pods.
    47. This results in 256 Pod IPs per node
    48. By having approximately twice as many available IP addresses as possible Pods, Kubernetes is able to mitigate IP address reuse as Pods are added to and removed from a node
    49. For certain applications which plan to schedule a smaller number of Pods per node, is wasteful
    50. The Flexible Pod CIDR feature allows per-node CIDR block size for Pods to be configured and use fewer IP addresses
    51. By default, the secondary range for Services is set to /20 (4,096 IP addresses), limiting the number of Services in the cluster to 4096
    52. Secondary ranges cannot be changed after creation
    53. When a cluster is created, ensure the ranges chosen are large enough to accommodate anticipated growth
    54. GKE nodes are regular Google Cloud virtual machines
    55. Parameters such as the number of cores or size of disk, can influence how GKE clusters perform
    56. In Google Cloud, the number of cores allocated to the instance determines its network capacity.
    57. In Google Cloud, the size of persistent disks determines the IOPS and throughput of the disk
    58. GKE typically uses Persistent Disks as boot disks and to back Kubernetes' Persistent Volumes
    59. Increasing disk size increases both IOPS and throughput, up to certain limits
    60. Each persistent disk write operation contributes to the virtual machine instance's cumulative network egress cap
    61. IOPS performance of disks, especially SSDs, depends on the number of vCPUs in the instance in addition to disk size
    62. Lower core VMs have lower write IOPS limits due to network egress limitations on write throughput
    63. If a virtual machine instance has insufficient CPUs, the application won't be able to get close to IOPS limit
    64. Use larger and fewer disks to achieve higher IOPS and throughput
    65. Workloads that require high capacity or large numbers of disks need to consider the limits of how many PDs can be attached to a single VM
    66. For regular VMs, that limit is 128 disks with a total size of 64 TB, while shared-core VMs have a limit of 16 PDs with a total size of 3 TB
    67. Google Cloud enforces this limit, not Kubernetes
    68. Kubernetes, as any other system, has limits which needs to be taken into account while designing applications and planning their growth
    69. Kubernetes supports up to 5000 nodes in a single cluster
    70. The number of nodes is only one of many dimensions on which Kubernetes can scale
    71. Other dimensions include the total number of Pods, Services, or backends behind a Service
    72. Do not stretch more than one dimension at the time
    73. This can cause problems even in smaller clusters
    74. For example, trying to schedule 100 Pods per node in a 5k node cluster likely won't succeed because the number of Pods, the number of Pods per node, and the number of nodes would be stretched too far
    75. Extending Kubernetes clusters with webhooks or CRDs is common and can constrain the ability to scale the cluster
    76. Most limits are not enforced, so users can go above them
    77. Exceeding limits won't make the cluster instantly unusable
    78. Performance degrades (sometimes shown by failing SLOs) before failure
    79. Some of the limits are given for largest possible cluster
    80. In smaller clusters, limits are proportionally lower
    81. The performance of iptables degrades if there are too many services or if there is a high number of backends behind a Service
  6. Dashboard
    1. Cloud Console offers useful dashboards for project's GKE clusters and their resources
    2. Dashboards can be used to view, inspect, manage, and delete resources in clusters
    3. Deployments can be created from the Workloads dashboard
    4. In conjunction with the gcloud and kubectl command-line tools, the GKE dashboards are helpful for DevOps workflows and troubleshooting issues
    5. Dashboards can be used to get information about all resources in every cluster quickly and easily.
    6. Kubernetes clusters displays cluster's name, compute zone, cluster size, total cores, total memory, node version, outstanding notifications, and labels
    7. Workloads displays workloads (Deployments, StatefulSets, DaemonSets, Jobs, and Pods) deployed to clusters in current project
    8. Information includes each workload's name, status, type, number of running and total desired Pods, namespace, and cluster
    9. A YAML-based text editor is available for inspecting and editing deployed resources, and a Deploy mechanism for deploying stateless applications
    10. Services display project's Service and Ingress resources, with the resource's name, status, type, endpoints, number of running and total desired Pods, namespace, and cluster
    11. Configuration displays project's Secret and ConfigMap resources
    12. Storage displays PersistentVolumeClaim and StorageClass resources associated with clusters
    13. Object Browser lists all of the objects running in every cluster in a given project
    14. Kubernetes clusters shows every Kubernetes cluster created in a project
    15. Dashboard can be used to inspect details about clusters, make changes to their settings, connect to them using Cloud Shell, and delete them
    16. Clusters and node versions can be upgraded from this dashboard
    17. When a new upgrade is available, the dashboard displays a notification for the relevant cluster
    18. Select a cluster to view details displays the current settings for the cluster and its node pool
    19. Storage displays the persistent volumes and storage classes provisioned for the cluster's nodes
    20. Nodes lists all of the cluster's nodes and their requested CPU, memory, and storage resources
    21. Use the Workloads dashboard to inspect, manage, edit, and delete workloads deployed to clusters
    22. Deploy stateless applications using the menu's Deploy mechanism
    23. Select a workload to displays the current settings for the workload, including its usage metrics, labels and selectors, update strategy, Pods specification, and active revisions
    24. Managed pods lists the Pods that are managed by the workload.
    25. Select a Pod from the list to view that Pod's details, events, logs, and YAML configuration file
    26. Revision history lists each revision of the workload, including the active revision
    27. Events lists human-readable messages for each event affecting the workload
    28. YAML displays the workload's live configuration
    29. Use the YAML-based text editor provided in this menu to make changes to the workload
    30. Copy and download the configuration from this menu
    31. Menus might appear differently depending on the type of workload you're viewing
    32. Use the dashboard's filter search to list only specific workloads
    33. By default, Kubernetes system objects are filtered out
    34. Some workloads have an Actions menu with convenient buttons for performing common operations
    35. Autoscale, update, and scale a Deployment from its Actions menu
    36. Services displays the load-balancing Service and traffic-routing Ingress objects associated with your project
    37. It also displays the default Kubernetes system objects associated with networking, such as the Kubernetes API server, HTTP backend, and DNS
    38. Select a resource from the list to display information about the resource, including its usage metrics, IP, and ports
    39. Events lists human-readable messages for each event affecting the resource
    40. YAML displays the resource's live configuration
    41. Use the YAML-based text editor provided in this menu to make changes to the resource
    42. Copy and download the configuration from this menu
    43. Configuration displays configuration files, Secrets, ConfigMaps, environment variables, and other configuration resources associated with project
    44. It also displays Kubernetes system-level configuration resources, such as tokens used by service accounts
    45. Select a resource from this dashboard to view a detailed page about that resource
    46. Sensitive data stored in Secrets is not displayed in the console
    47. Storage lists the storage resources provisioned for your clusters
    48. PersistentVolumeClaim or StorageClass resource to be used by a cluster's nodes appear in this dashboard
    49. Persistent volume claims list all PersistentVolumeClaim resources in the clusters
    50. PersistentVolumeClaims are used with StatefulSet workloads to have those workloads claim storage space on a persistent disk in the cluster
    51. Storage classes list all StorageClass resources associated with nodes
    52. StorageClasses are used as "blueprints" for using space on a disk
    53. The disk's provisioner, parameters (such as disk type and compute zone), and reclaim policy are specified
    54. StorageClass resources can also be used for dynamic volume provisioning to create storage volumes on demand
    55. Select a resource from these dashboards to view a detailed page for that resource
    56. Object Browser lists all of the objects running in all of the clusters in current project
    57. List and filter resources by specific API groups and Resource Kinds
    58. Preview YAML file for any resource by navigating to its details page
    59. The Kubernetes Dashboard add-on is disabled by default on GKE
    60. Cloud Console provides dashboards to manage, troubleshoot, and monitor GKE clusters, workloads, and applications