Kubernetes Engine Clusters

Overview
1. A single-zone cluster has a single control plane (master) running in one zone
2. A single-zone cluster control plane manages workloads on nodes running in the same zone
3. A multi-zonal cluster has a single replica of the control plane running in a single zone, and has nodes running in multiple zones
4. During an upgrade of the cluster or an outage of the zone where the control plane runs, workloads still run
5. The cluster, its nodes, and its workloads cannot be configured until the control plane is available
6. Multi-zonal clusters balance availability and cost for consistent workloads
7. Use regional clusters to maintain availability, and if the number of nodes and node pools are changing frequently
8. A regional cluster has multiple replicas of the control plane, running in multiple zones within a given region
9. Nodes also run in each zone where a replica of the control plane runs
10. Because a regional cluster replicates the control plane and nodes, it consumes more Compute Engine resources than a single-zone or multi-zonal cluster
11. Choose the cluster's specific Kubernetes version or make choices about its overall mix of stability and features
12. Enroll the cluster in a release channel where required stability is known
13. Google automatically upgrades the cluster and its nodes when an update is available in that release channel.
14. The Rapid channel receives multiple updates a month, while the Stable channel only receives a few updates a year
15. Where release channel is not used or a cluster version is not selected, the current default version is used
16. The default version is selected based on stability and real-world performance, and is changed regularly
17. A specific supported version of Kubernetes can be specified for a given workload when creating the cluster
18. Where there is no need to control the specific patch version, enroll cluster in a release channel instead of managing its version directly
19. An alpha cluster has all Kubernetes alpha APIs (feature gates) enabled
20. Alpha clusters can be used for early testing and validation of Kubernetes features
21. Alpha clusters are not supported for production workloads
22. Alpha clusters cannot be upgraded, and expire within 30 days
23. GKE clusters can be distinguished according to the way they route traffic from one Pod to another Pod
24. A cluster that uses Alias IPs is called a VPC-native cluster
25. A cluster that uses Google Cloud Routes is called a routes-based cluster
26. VPC-native is the recommended network mode for new clusters
27. The default cluster network mode depends on the way the cluster is created
28. Access from public networks to cluster's workloads can be configured
29. Routes are not created automatically
30. Private clusters assign internal RFC 1918 IP addresses to Pods and nodes, and workloads are completely isolated from public networks
31. Binary Authorization provides software supply-chain security to GKE workloads
32. Binary Authorization works with images deployed to GKE from Container Registry or another container image registry
33. Binary Authorization can be used to ensure that internal processes that safeguard the quality and integrity of software have successfully completed before an application is deployed to a production environment
Alpha Clusters
1. Alpha clusters can be used just like normal GKE clusters
2. Alpha clusters are short-lived clusters that run stable Kubernetes releases with all Kubernetes APIs and features enabled
3. Alpha clusters are designed for advanced users and early adopters to experiment with workloads that take advantage of new features before those features are production-ready
4. Alpha clusters default to running the current default version of Kubernetes
5. Do not use Alpha clusters or alpha features for production workloads
6. Alpha clusters expire after thirty days and do not receive security updates
7. Users must migrate data from alpha clusters before they expire
8. GKE does not automatically save data stored on alpha clusters
9. Users can experiment with Kubernetes alpha features by creating an alpha cluster
10. Users can specify a different version during cluster creation
11. Alpha clusters are not covered by the GKE SLA
12. Alpha clusters cannot be upgraded
13. Node auto-upgrade and auto-repair are disabled on alpha clusters
14. Automatically deleted after 30 days
15. Do not receive security updates
16. Alpha clusters do not necessarily run "alpha" versions of GKE
17. The term alpha cluster means that alpha APIs are enabled, both for Kubernetes and GKE, regardless of the version of Kubernetes the cluster runs
18. Periodically, Google offers customers the ability to test GKE versions that are not generally available, for testing and validation
19. Early-access GKE versions can be run as alpha clusters or as clusters without the Kubernetes alpha APIs enabled
20. Most Kubernetes releases contain new Alpha features that can be tested in alpha clusters
21. New Kubernetes features are introduced as preview or generally available
22. To ensure stability and production quality, normal GKE clusters only enable features that are beta or higher
23. Alpha features are not enabled on normal clusters because they are not production-ready or upgradeable
24. Since GKE automatically upgrades the Kubernetes control plane, enabling alpha features in production could jeopardize the reliability of the cluster if there are breaking changes in a new version
Regional clusters
1. By default, a cluster's control plane (master) and nodes all run in a single compute zone, specified when the cluster is created
2. Regional clusters increase the availability of both a cluster's control plane (master) and its nodes by replicating them across multiple zones of a region
3. Regional clusters provides the advantages of multi-zonal clusters
4. If one or more (but not all) zones in a region experience an outage, the cluster's control plane remains accessible as long as one replica of the control plane remains available
5. During cluster maintenance such as a cluster upgrade, only one replica of the control plane is unavailable at a time, and the cluster is still operational
6. By default, the control plane and each node pool is replicated across three zones of a region, but users can customize the number of replicas
7. It is not possible to modify whether a cluster is zonal, multi-zonal, or regional after creating the cluster
8. Regional clusters replicate cluster masters and nodes across multiple zones within a single region
9. In the event of an infrastructure outage, workloads continue to run, and nodes can be rebalanced manually or by using the cluster autoscaler
10. Regional clusters are available across a region rather than a single zone within a region
11. If a single zone becomes unavailable, the Kubernetes control plane and resources are not impacted
12. Regional clusters experience zero downtime master upgrades, master resize, and reduced downtime from master failures
13. Regional clusters provide a high availability control plane, so users can access the control plane even during upgrades
14. By default, regional clusters consist of nine nodes spread evenly across three zones in a region.
15. This consumes nine IP addresses
16. The number of nodes can be reduced down to one per zone, if desired
17. Newly created Google Cloud accounts are granted only eight IP addresses per region, so it may be necessary to request an increase in quotas for regional in-use IP addresses, depending on the size of the regional cluster
18. If there are too few available in-use IP addresses, cluster creation fails
19. For regional clusters that run GPUs, users must choose a region or zones that have GPUs
20. Node pools cannot be created in zones outside of the cluster's zones
21. Changing a cluster's zones causes all new and existing nodes to span new zones
22. Regional clusters are offered at no additional charge
23. Using regional clusters requires more of a project's regional quotas than a similar zonal or multi-zonal cluster
24. Understand quotas and Google Kubernetes Engine pricing before using regional clusters
25. An Insufficient regional quota to satisfy request for resource error, implies the request exceeds the available quota in the current region
26. Node-to-node traffic across zones is charged
27. If a workload running in one zone needs to communicate with a workload in a different zone, the cross-zone traffic incurs cost
28. Persistent storage disks are zonal resources
29. When a persistent storage is added to a cluster, unless a zone is specified, GKE assigns the disk to a single zone
30. GKE chooses the zone at random
31. When using a StatefulSet, the provisioned persistent disks for each replica are spread across zones
32. Once a persistent disk is provisioned, any Pods referencing the disk are scheduled to the same zone as the disk
33. A read-write persistent disk cannot be attached to multiple nodes
34. To maintain capacity in the unlikely event of zonal failure, users can allow GKE to overprovision scaling limits, to guarantee a minimum level of availability even when some zones are unavailable
35. This is accomplished by specifying a maximum of six nodes per zone rather than four. If one zone fails, the cluster scales to 12 nodes in the remaining zones
36. Similarly, if a two-zone cluster is overprovisioned to 200%, 100% of traffic is rerouted if half of the cluster's capacity is lost
Private clusters
1. Private clusters provides the ability to isolate nodes from having inbound and outbound connectivity to the public internet
2. Private clusters is achieved due to the nodes having internal RFC 1918 IP addresses only
3. Outbound internet access for certain private nodes can be achieved using Cloud NAT or a self managed NAT gateway
4. Even though the node IP addresses are private, external clients can reach Services in the cluster
5. A Service of type LoadBalancer can be used to enable external clients to call the IP address of the load balancer
6. A Service of type NodePort can be used to create an Ingress
7. GKE uses information in the Service and the Ingress to configure an HTTP(S) load balancer
8. External clients can call the external IP address of the HTTP(S) load balancer
9. By default, Private Google Access is enabled
10. Private Google Access provides private nodes and their workloads with limited outbound access to Google Cloud APIs and services over Google's private network
11. Private Google Access makes it possible for private nodes to pull container images from Google Container Registry, and to send logs to the Cloud Operations stack
12. Every GKE cluster has a Kubernetes API server called the master
13. The master is in a Google-owned project that is separate from the user project
14. The master runs on a VM that is in a VPC network in the Google-owned project
15. A regional cluster has multiple masters, each of which runs on its own VM
16. In private clusters, the master's VPC network is connected to the cluster's VPC network with VPC Network Peering
17. VPC network contains the cluster nodes, and a separate Google Cloud VPC network contains the cluster's master
18. The master's VPC network is located in a project controlled by Google
19. The user's VPC network and the master's VPC network are connected using VPC Network Peering
20. Traffic between nodes and the master is routed entirely using internal IP addresses
21. All newly created private clusters automatically reuse existing VPC Network Peering connections
22. The first zonal or regional private cluster generates a new VPC Network Peering connection
23. Additional private clusters in the same zone or region and network can use the same peering, without the need to create any additional VPC Network Peering connections
24. The master for a private cluster has a private endpoint in addition to a public endpoint
25. The master for a non-private cluster only has a public endpoint
26. The private endpoint is an internal IP address in the master's VPC network
27. In a private cluster, nodes always communicate with the master's private endpoint
28. Depending on the configuration, the cluster can be managed with tools like kubectl that connect to the private endpoint
29. Any VM that uses the same subnet that the private cluster uses can also access the private endpoint
30. Public endpoint is the external IP address of the master
31. By default, tools like kubectl communicate with the master on its public endpoint
32. Access to the master endpoing can be controlled using master authorized networks or users can disable access to the public endpoint
33. Disabling public endpoint access is the most secure option as it prevents all internet access to the master
34. This is a good choice where the on-premises network has been connected to Google Cloud using Cloud Interconnect and Cloud VPN
35. Cloud Interconnect and Cloud VPN connect a company network to the VPC without the traffic having to traverse the public internet
36. With public endpoint access disabled, master authorized networks must be configured for the private endpoint
37. Without master authorized networks, users can only connect to the private endpoint from cluster nodes or VMs in the same subnet as the cluster
38. Master authorized networks must be RFC 1918 IP addresses
39. Even if a customer disables access to the public endpoint, Google can use the master's public endpoint for cluster management purposes, such as scheduled maintenance and automatic master upgrades
40. Options are available to enable public endpoint and master authorized networks access
41. Using private clusters with master authorized networks enabled provides restricted access to the master from source IP addresses
42. Master authorized networks is a good choice where there is no existing VPN infrastructure, remote users or branch offices that connect over the public internet
43. Public endpoint access enabled, master authorized networks disabled is the default and least restrictive option
44. Since master authorized networks are not enabled, the cluster can be administered from any source IP address as long as the user is authenticated
Scalability
1. In a Kubernetes cluster, scalability refers to the ability of the cluster to grow while staying within its service-level objectives (SLOs)
2. Kubernetes also has its own set of SLOs
3. Regional clusters are better suited for high availability
4. Regional clusters have multiple master nodes across multiple compute zones in a region
5. Zonal clusters have one master node in a single compute zone
6. If a zonal cluster is upgraded, the single master VM experiences downtime during which the Kubernetes API is not available until the upgrade is complete
7. In regional clusters, the control plane remains available during cluster maintenance like rotating IPs, upgrading master VMs, or resizing clusters or node pools
8. When upgrading a regional cluster, two out of three master VMs are always running during the rolling upgrade, so the Kubernetes API is still available
9. A single-zone outage won't cause any downtime in the regional control plane
10. Changes to the cluster's configuration take longer because they must propagate across all masters in a regional cluster instead of the single control plane in zonal clusters
11. If VMs cannot be created in one of the zones, whether from a lack of capacity or other transient problem, clusters cannot be created or upgraded
12. Use zonal clusters to create or upgrade clusters rapidly when availability is less of a concern
13. Use regional clusters when availability is more important than flexibility
14. Carefully select the cluster type when creating a cluster because it cannot change it after the cluster is created
15. Migrating production traffic between clusters is possible but difficult at scale
16. Use regional clusters for production workloads clusters as they offer higher availability than zonal clusters
17. To achieve high availability, the Kubernetes control plane and its nodes need to be spread across different zones
18. GKE offers zonal and multi-zonal node pools
19. To deploy a highly available application, distribute workloads across multiple compute zones in a region by using multi-zonal node pools which distribute nodes uniformly across zones
20. When using cluster autoscaler with multi-zonal node pools, nodes are not guaranteed to be spread equally among zones
21. If all nodes are in the same zone, Pods can’t be scheduled if that zone becomes unreachable
22. GPUs are available only in specific zones. It may not be possible to get them in all zones in the region
23. Round-trip latency between locations within a single region is expected to stay below 1ms on the 95th percentile
24. The difference in traffic latency between zonal and intrazonal traffic should be negligible
25. The price of egress traffic between zones in the same region is available on the Compute Engine pricing page
26. Kubernetes workloads require networking, compute, and storage
27. Enough CPU and memory is required to run Pods
28. There are more parameters of underlying infrastructure that can influence performance and scalability of a GKE cluster
29. GKE offers route-based and a newer VPC-native
30. With routes-based cluster, each time a node is added, a custom route is added to the routing table in the VPC network
31. GKE clusters with route-based networking can not scale above 2000 nodes
32. In the VPC-native cluster mode, the VPC network has a secondary range for all Pod IP addresses
33. Each node is assigned a slice of the secondary range for its own Pod IP addresses
34. This allows the VPC network to natively understand how to route traffic to Pods without relying on custom routes
35. VPC-native clusters are the networking default and are recommended to accommodate large workflows
36. They scale to a larger number of nodes and allow better interaction with other Google Cloud products
37. A VPC-native cluster uses the primary IP range for nodes and two secondary IP ranges for Pods and Services
38. The maximum number of nodes in VPC-native clusters can be limited by available IP addresses
39. The number of nodes is determined by both the primary range (node subnet) and the secondary range (Pod subnet)
40. The maximum number of Pods and Services is determined by the size of the cluster's secondary ranges, Pod subnet and Service subnet, respectively
41. The Pod secondary range defaults to /14 (262,144 IP addresses)
42. Each node has /24 range assigned for its Pods (256 IP addresses for its Pods)
43. The node's subnet is /20 (4092 IP addresses)
44. There must be enough addresses in both ranges (node and Pod) to provision a new node
45. With defaults, only 1024 can be created due to the number of Pod IPs
46. By default there can be a maximum of 110 Pods per node, and each node in the cluster has allocated /24 range for its Pods.
47. This results in 256 Pod IPs per node
48. By having approximately twice as many available IP addresses as possible Pods, Kubernetes is able to mitigate IP address reuse as Pods are added to and removed from a node
49. For certain applications which plan to schedule a smaller number of Pods per node, is wasteful
50. The Flexible Pod CIDR feature allows per-node CIDR block size for Pods to be configured and use fewer IP addresses
51. By default, the secondary range for Services is set to /20 (4,096 IP addresses), limiting the number of Services in the cluster to 4096
52. Secondary ranges cannot be changed after creation
53. When a cluster is created, ensure the ranges chosen are large enough to accommodate anticipated growth
54. GKE nodes are regular Google Cloud virtual machines
55. Parameters such as the number of cores or size of disk, can influence how GKE clusters perform
56. In Google Cloud, the number of cores allocated to the instance determines its network capacity.
57. In Google Cloud, the size of persistent disks determines the IOPS and throughput of the disk
58. GKE typically uses Persistent Disks as boot disks and to back Kubernetes' Persistent Volumes
59. Increasing disk size increases both IOPS and throughput, up to certain limits
60. Each persistent disk write operation contributes to the virtual machine instance's cumulative network egress cap
61. IOPS performance of disks, especially SSDs, depends on the number of vCPUs in the instance in addition to disk size
62. Lower core VMs have lower write IOPS limits due to network egress limitations on write throughput
63. If a virtual machine instance has insufficient CPUs, the application won't be able to get close to IOPS limit
64. Use larger and fewer disks to achieve higher IOPS and throughput
65. Workloads that require high capacity or large numbers of disks need to consider the limits of how many PDs can be attached to a single VM
66. For regular VMs, that limit is 128 disks with a total size of 64 TB, while shared-core VMs have a limit of 16 PDs with a total size of 3 TB
67. Google Cloud enforces this limit, not Kubernetes
68. Kubernetes, as any other system, has limits which needs to be taken into account while designing applications and planning their growth
69. Kubernetes supports up to 5000 nodes in a single cluster
70. The number of nodes is only one of many dimensions on which Kubernetes can scale
71. Other dimensions include the total number of Pods, Services, or backends behind a Service
72. Do not stretch more than one dimension at the time
73. This can cause problems even in smaller clusters
74. For example, trying to schedule 100 Pods per node in a 5k node cluster likely won't succeed because the number of Pods, the number of Pods per node, and the number of nodes would be stretched too far
75. Extending Kubernetes clusters with webhooks or CRDs is common and can constrain the ability to scale the cluster
76. Most limits are not enforced, so users can go above them
77. Exceeding limits won't make the cluster instantly unusable
78. Performance degrades (sometimes shown by failing SLOs) before failure
79. Some of the limits are given for largest possible cluster
80. In smaller clusters, limits are proportionally lower
81. The performance of iptables degrades if there are too many services or if there is a high number of backends behind a Service
Dashboard
1. Cloud Console offers useful dashboards for project's GKE clusters and their resources
2. Dashboards can be used to view, inspect, manage, and delete resources in clusters
3. Deployments can be created from the Workloads dashboard
4. In conjunction with the gcloud and kubectl command-line tools, the GKE dashboards are helpful for DevOps workflows and troubleshooting issues
5. Dashboards can be used to get information about all resources in every cluster quickly and easily.
6. Kubernetes clusters displays cluster's name, compute zone, cluster size, total cores, total memory, node version, outstanding notifications, and labels
7. Workloads displays workloads (Deployments, StatefulSets, DaemonSets, Jobs, and Pods) deployed to clusters in current project
8. Information includes each workload's name, status, type, number of running and total desired Pods, namespace, and cluster
9. A YAML-based text editor is available for inspecting and editing deployed resources, and a Deploy mechanism for deploying stateless applications
10. Services display project's Service and Ingress resources, with the resource's name, status, type, endpoints, number of running and total desired Pods, namespace, and cluster
11. Configuration displays project's Secret and ConfigMap resources
12. Storage displays PersistentVolumeClaim and StorageClass resources associated with clusters
13. Object Browser lists all of the objects running in every cluster in a given project
14. Kubernetes clusters shows every Kubernetes cluster created in a project
15. Dashboard can be used to inspect details about clusters, make changes to their settings, connect to them using Cloud Shell, and delete them
16. Clusters and node versions can be upgraded from this dashboard
17. When a new upgrade is available, the dashboard displays a notification for the relevant cluster
18. Select a cluster to view details displays the current settings for the cluster and its node pool
19. Storage displays the persistent volumes and storage classes provisioned for the cluster's nodes
20. Nodes lists all of the cluster's nodes and their requested CPU, memory, and storage resources
21. Use the Workloads dashboard to inspect, manage, edit, and delete workloads deployed to clusters
22. Deploy stateless applications using the menu's Deploy mechanism
23. Select a workload to displays the current settings for the workload, including its usage metrics, labels and selectors, update strategy, Pods specification, and active revisions
24. Managed pods lists the Pods that are managed by the workload.
25. Select a Pod from the list to view that Pod's details, events, logs, and YAML configuration file
26. Revision history lists each revision of the workload, including the active revision
27. Events lists human-readable messages for each event affecting the workload
28. YAML displays the workload's live configuration
29. Use the YAML-based text editor provided in this menu to make changes to the workload
30. Copy and download the configuration from this menu
31. Menus might appear differently depending on the type of workload you're viewing
32. Use the dashboard's filter search to list only specific workloads
33. By default, Kubernetes system objects are filtered out
34. Some workloads have an Actions menu with convenient buttons for performing common operations
35. Autoscale, update, and scale a Deployment from its Actions menu
36. Services displays the load-balancing Service and traffic-routing Ingress objects associated with your project
37. It also displays the default Kubernetes system objects associated with networking, such as the Kubernetes API server, HTTP backend, and DNS
38. Select a resource from the list to display information about the resource, including its usage metrics, IP, and ports
39. Events lists human-readable messages for each event affecting the resource
40. YAML displays the resource's live configuration
41. Use the YAML-based text editor provided in this menu to make changes to the resource
42. Copy and download the configuration from this menu
43. Configuration displays configuration files, Secrets, ConfigMaps, environment variables, and other configuration resources associated with project
44. It also displays Kubernetes system-level configuration resources, such as tokens used by service accounts
45. Select a resource from this dashboard to view a detailed page about that resource
46. Sensitive data stored in Secrets is not displayed in the console
47. Storage lists the storage resources provisioned for your clusters
48. PersistentVolumeClaim or StorageClass resource to be used by a cluster's nodes appear in this dashboard
49. Persistent volume claims list all PersistentVolumeClaim resources in the clusters
50. PersistentVolumeClaims are used with StatefulSet workloads to have those workloads claim storage space on a persistent disk in the cluster
51. Storage classes list all StorageClass resources associated with nodes
52. StorageClasses are used as "blueprints" for using space on a disk
53. The disk's provisioner, parameters (such as disk type and compute zone), and reclaim policy are specified
54. StorageClass resources can also be used for dynamic volume provisioning to create storage volumes on demand
55. Select a resource from these dashboards to view a detailed page for that resource
56. Object Browser lists all of the objects running in all of the clusters in current project
57. List and filter resources by specific API groups and Resource Kinds
58. Preview YAML file for any resource by navigating to its details page
59. The Kubernetes Dashboard add-on is disabled by default on GKE
60. Cloud Console provides dashboards to manage, troubleshoot, and monitor GKE clusters, workloads, and applications