-
Overview
- A single-zone cluster has a single control plane (master) running in one zone
- A single-zone cluster control plane manages workloads on nodes running in the same zone
- A multi-zonal cluster has a single replica of the control plane running in a single zone, and has nodes running in multiple zones
- During an upgrade of the cluster or an outage of the zone where the control plane runs, workloads still run
- The cluster, its nodes, and its workloads cannot be configured until the control plane is available
- Multi-zonal clusters balance availability and cost for consistent workloads
- Use regional clusters to maintain availability, and if the number of nodes and node pools are changing frequently
- A regional cluster has multiple replicas of the control plane, running in multiple zones within a given region
- Nodes also run in each zone where a replica of the control plane runs
- Because a regional cluster replicates the control plane and nodes, it consumes more Compute Engine resources than a single-zone or multi-zonal cluster
- Choose the cluster's specific Kubernetes version or make choices about its overall mix of stability and features
- Enroll the cluster in a release channel where required stability is known
- Google automatically upgrades the cluster and its nodes when an update is available in that release channel.
- The Rapid channel receives multiple updates a month, while the Stable channel only receives a few updates a year
- Where release channel is not used or a cluster version is not selected, the current default version is used
- The default version is selected based on stability and real-world performance, and is changed regularly
- A specific supported version of Kubernetes can be specified for a given workload when creating the cluster
- Where there is no need to control the specific patch version, enroll cluster in a release channel instead of managing its version directly
- An alpha cluster has all Kubernetes alpha APIs (feature gates) enabled
- Alpha clusters can be used for early testing and validation of Kubernetes features
- Alpha clusters are not supported for production workloads
- Alpha clusters cannot be upgraded, and expire within 30 days
- GKE clusters can be distinguished according to the way they route traffic from one Pod to another Pod
- A cluster that uses Alias IPs is called a VPC-native cluster
- A cluster that uses Google Cloud Routes is called a routes-based cluster
- VPC-native is the recommended network mode for new clusters
- The default cluster network mode depends on the way the cluster is created
- Access from public networks to cluster's workloads can be configured
- Routes are not created automatically
- Private clusters assign internal RFC 1918 IP addresses to Pods and nodes, and workloads are completely isolated from public networks
- Binary Authorization provides software supply-chain security to GKE workloads
- Binary Authorization works with images deployed to GKE from Container Registry or another container image registry
- Binary Authorization can be used to ensure that internal processes that safeguard the quality and integrity of software have successfully completed before an application is deployed to a production environment
-
Alpha Clusters
- Alpha clusters can be used just like normal GKE clusters
- Alpha clusters are short-lived clusters that run stable Kubernetes releases with all Kubernetes APIs and features enabled
- Alpha clusters are designed for advanced users and early adopters to experiment with workloads that take advantage of new features before those features are production-ready
- Alpha clusters default to running the current default version of Kubernetes
- Do not use Alpha clusters or alpha features for production workloads
- Alpha clusters expire after thirty days and do not receive security updates
- Users must migrate data from alpha clusters before they expire
- GKE does not automatically save data stored on alpha clusters
- Users can experiment with Kubernetes alpha features by creating an alpha cluster
- Users can specify a different version during cluster creation
- Alpha clusters are not covered by the GKE SLA
- Alpha clusters cannot be upgraded
- Node auto-upgrade and auto-repair are disabled on alpha clusters
- Automatically deleted after 30 days
- Do not receive security updates
- Alpha clusters do not necessarily run "alpha" versions of GKE
- The term alpha cluster means that alpha APIs are enabled, both for Kubernetes and GKE, regardless of the version of Kubernetes the cluster runs
- Periodically, Google offers customers the ability to test GKE versions that are not generally available, for testing and validation
- Early-access GKE versions can be run as alpha clusters or as clusters without the Kubernetes alpha APIs enabled
- Most Kubernetes releases contain new Alpha features that can be tested in alpha clusters
- New Kubernetes features are introduced as preview or generally available
- To ensure stability and production quality, normal GKE clusters only enable features that are beta or higher
- Alpha features are not enabled on normal clusters because they are not production-ready or upgradeable
- Since GKE automatically upgrades the Kubernetes control plane, enabling alpha features in production could jeopardize the reliability of the cluster if there are breaking changes in a new version
-
Regional clusters
- By default, a cluster's control plane (master) and nodes all run in a single compute zone, specified when the cluster is created
- Regional clusters increase the availability of both a cluster's control plane (master) and its nodes by replicating them across multiple zones of a region
- Regional clusters provides the advantages of multi-zonal clusters
- If one or more (but not all) zones in a region experience an outage, the cluster's control plane remains accessible as long as one replica of the control plane remains available
- During cluster maintenance such as a cluster upgrade, only one replica of the control plane is unavailable at a time, and the cluster is still operational
- By default, the control plane and each node pool is replicated across three zones of a region, but users can customize the number of replicas
- It is not possible to modify whether a cluster is zonal, multi-zonal, or regional after creating the cluster
- Regional clusters replicate cluster masters and nodes across multiple zones within a single region
- In the event of an infrastructure outage, workloads continue to run, and nodes can be rebalanced manually or by using the cluster autoscaler
- Regional clusters are available across a region rather than a single zone within a region
- If a single zone becomes unavailable, the Kubernetes control plane and resources are not impacted
- Regional clusters experience zero downtime master upgrades, master resize, and reduced downtime from master failures
- Regional clusters provide a high availability control plane, so users can access the control plane even during upgrades
- By default, regional clusters consist of nine nodes spread evenly across three zones in a region.
- This consumes nine IP addresses
- The number of nodes can be reduced down to one per zone, if desired
- Newly created Google Cloud accounts are granted only eight IP addresses per region, so it may be necessary to request an increase in quotas for regional in-use IP addresses, depending on the size of the regional cluster
- If there are too few available in-use IP addresses, cluster creation fails
- For regional clusters that run GPUs, users must choose a region or zones that have GPUs
- Node pools cannot be created in zones outside of the cluster's zones
- Changing a cluster's zones causes all new and existing nodes to span new zones
- Regional clusters are offered at no additional charge
- Using regional clusters requires more of a project's regional quotas than a similar zonal or multi-zonal cluster
- Understand quotas and Google Kubernetes Engine pricing before using regional clusters
- An Insufficient regional quota to satisfy request for resource error, implies the request exceeds the available quota in the current region
- Node-to-node traffic across zones is charged
- If a workload running in one zone needs to communicate with a workload in a different zone, the cross-zone traffic incurs cost
- Persistent storage disks are zonal resources
- When a persistent storage is added to a cluster, unless a zone is specified, GKE assigns the disk to a single zone
- GKE chooses the zone at random
- When using a StatefulSet, the provisioned persistent disks for each replica are spread across zones
- Once a persistent disk is provisioned, any Pods referencing the disk are scheduled to the same zone as the disk
- A read-write persistent disk cannot be attached to multiple nodes
- To maintain capacity in the unlikely event of zonal failure, users can allow GKE to overprovision scaling limits, to guarantee a minimum level of availability even when some zones are unavailable
- This is accomplished by specifying a maximum of six nodes per zone rather than four. If one zone fails, the cluster scales to 12 nodes in the remaining zones
- Similarly, if a two-zone cluster is overprovisioned to 200%, 100% of traffic is rerouted if half of the cluster's capacity is lost
-
Private clusters
- Private clusters provides the ability to isolate nodes from having inbound and outbound connectivity to the public internet
- Private clusters is achieved due to the nodes having internal RFC 1918 IP addresses only
- Outbound internet access for certain private nodes can be achieved using Cloud NAT or a self managed NAT gateway
- Even though the node IP addresses are private, external clients can reach Services in the cluster
- A Service of type LoadBalancer can be used to enable external clients to call the IP address of the load balancer
- A Service of type NodePort can be used to create an Ingress
- GKE uses information in the Service and the Ingress to configure an HTTP(S) load balancer
- External clients can call the external IP address of the HTTP(S) load balancer
- By default, Private Google Access is enabled
- Private Google Access provides private nodes and their workloads with limited outbound access to Google Cloud APIs and services over Google's private network
- Private Google Access makes it possible for private nodes to pull container images from Google Container Registry, and to send logs to the Cloud Operations stack
- Every GKE cluster has a Kubernetes API server called the master
- The master is in a Google-owned project that is separate from the user project
- The master runs on a VM that is in a VPC network in the Google-owned project
- A regional cluster has multiple masters, each of which runs on its own VM
- In private clusters, the master's VPC network is connected to the cluster's VPC network with VPC Network Peering
- VPC network contains the cluster nodes, and a separate Google Cloud VPC network contains the cluster's master
- The master's VPC network is located in a project controlled by Google
- The user's VPC network and the master's VPC network are connected using VPC Network Peering
- Traffic between nodes and the master is routed entirely using internal IP addresses
- All newly created private clusters automatically reuse existing VPC Network Peering connections
- The first zonal or regional private cluster generates a new VPC Network Peering connection
- Additional private clusters in the same zone or region and network can use the same peering, without the need to create any additional VPC Network Peering connections
- The master for a private cluster has a private endpoint in addition to a public endpoint
- The master for a non-private cluster only has a public endpoint
- The private endpoint is an internal IP address in the master's VPC network
- In a private cluster, nodes always communicate with the master's private endpoint
- Depending on the configuration, the cluster can be managed with tools like kubectl that connect to the private endpoint
- Any VM that uses the same subnet that the private cluster uses can also access the private endpoint
- Public endpoint is the external IP address of the master
- By default, tools like kubectl communicate with the master on its public endpoint
- Access to the master endpoing can be controlled using master authorized networks or users can disable access to the public endpoint
- Disabling public endpoint access is the most secure option as it prevents all internet access to the master
- This is a good choice where the on-premises network has been connected to Google Cloud using Cloud Interconnect and Cloud VPN
- Cloud Interconnect and Cloud VPN connect a company network to the VPC without the traffic having to traverse the public internet
- With public endpoint access disabled, master authorized networks must be configured for the private endpoint
- Without master authorized networks, users can only connect to the private endpoint from cluster nodes or VMs in the same subnet as the cluster
- Master authorized networks must be RFC 1918 IP addresses
- Even if a customer disables access to the public endpoint, Google can use the master's public endpoint for cluster management purposes, such as scheduled maintenance and automatic master upgrades
- Options are available to enable public endpoint and master authorized networks access
- Using private clusters with master authorized networks enabled provides restricted access to the master from source IP addresses
- Master authorized networks is a good choice where there is no existing VPN infrastructure, remote users or branch offices that connect over the public internet
- Public endpoint access enabled, master authorized networks disabled is the default and least restrictive option
- Since master authorized networks are not enabled, the cluster can be administered from any source IP address as long as the user is authenticated
-
Scalability
- In a Kubernetes cluster, scalability refers to the ability of the cluster to grow while staying within its service-level objectives (SLOs)
- Kubernetes also has its own set of SLOs
- Regional clusters are better suited for high availability
- Regional clusters have multiple master nodes across multiple compute zones in a region
- Zonal clusters have one master node in a single compute zone
- If a zonal cluster is upgraded, the single master VM experiences downtime during which the Kubernetes API is not available until the upgrade is complete
- In regional clusters, the control plane remains available during cluster maintenance like rotating IPs, upgrading master VMs, or resizing clusters or node pools
- When upgrading a regional cluster, two out of three master VMs are always running during the rolling upgrade, so the Kubernetes API is still available
- A single-zone outage won't cause any downtime in the regional control plane
- Changes to the cluster's configuration take longer because they must propagate across all masters in a regional cluster instead of the single control plane in zonal clusters
- If VMs cannot be created in one of the zones, whether from a lack of capacity or other transient problem, clusters cannot be created or upgraded
- Use zonal clusters to create or upgrade clusters rapidly when availability is less of a concern
- Use regional clusters when availability is more important than flexibility
- Carefully select the cluster type when creating a cluster because it cannot change it after the cluster is created
- Migrating production traffic between clusters is possible but difficult at scale
- Use regional clusters for production workloads clusters as they offer higher availability than zonal clusters
- To achieve high availability, the Kubernetes control plane and its nodes need to be spread across different zones
- GKE offers zonal and multi-zonal node pools
- To deploy a highly available application, distribute workloads across multiple compute zones in a region by using multi-zonal node pools which distribute nodes uniformly across zones
- When using cluster autoscaler with multi-zonal node pools, nodes are not guaranteed to be spread equally among zones
- If all nodes are in the same zone, Pods can’t be scheduled if that zone becomes unreachable
- GPUs are available only in specific zones. It may not be possible to get them in all zones in the region
- Round-trip latency between locations within a single region is expected to stay below 1ms on the 95th percentile
- The difference in traffic latency between zonal and intrazonal traffic should be negligible
- The price of egress traffic between zones in the same region is available on the Compute Engine pricing page
- Kubernetes workloads require networking, compute, and storage
- Enough CPU and memory is required to run Pods
- There are more parameters of underlying infrastructure that can influence performance and scalability of a GKE cluster
- GKE offers route-based and a newer VPC-native
- With routes-based cluster, each time a node is added, a custom route is added to the routing table in the VPC network
- GKE clusters with route-based networking can not scale above 2000 nodes
- In the VPC-native cluster mode, the VPC network has a secondary range for all Pod IP addresses
- Each node is assigned a slice of the secondary range for its own Pod IP addresses
- This allows the VPC network to natively understand how to route traffic to Pods without relying on custom routes
- VPC-native clusters are the networking default and are recommended to accommodate large workflows
- They scale to a larger number of nodes and allow better interaction with other Google Cloud products
- A VPC-native cluster uses the primary IP range for nodes and two secondary IP ranges for Pods and Services
- The maximum number of nodes in VPC-native clusters can be limited by available IP addresses
- The number of nodes is determined by both the primary range (node subnet) and the secondary range (Pod subnet)
- The maximum number of Pods and Services is determined by the size of the cluster's secondary ranges, Pod subnet and Service subnet, respectively
- The Pod secondary range defaults to /14 (262,144 IP addresses)
- Each node has /24 range assigned for its Pods (256 IP addresses for its Pods)
- The node's subnet is /20 (4092 IP addresses)
- There must be enough addresses in both ranges (node and Pod) to provision a new node
- With defaults, only 1024 can be created due to the number of Pod IPs
- By default there can be a maximum of 110 Pods per node, and each node in the cluster has allocated /24 range for its Pods.
- This results in 256 Pod IPs per node
- By having approximately twice as many available IP addresses as possible Pods, Kubernetes is able to mitigate IP address reuse as Pods are added to and removed from a node
- For certain applications which plan to schedule a smaller number of Pods per node, is wasteful
- The Flexible Pod CIDR feature allows per-node CIDR block size for Pods to be configured and use fewer IP addresses
- By default, the secondary range for Services is set to /20 (4,096 IP addresses), limiting the number of Services in the cluster to 4096
- Secondary ranges cannot be changed after creation
- When a cluster is created, ensure the ranges chosen are large enough to accommodate anticipated growth
- GKE nodes are regular Google Cloud virtual machines
- Parameters such as the number of cores or size of disk, can influence how GKE clusters perform
- In Google Cloud, the number of cores allocated to the instance determines its network capacity.
- In Google Cloud, the size of persistent disks determines the IOPS and throughput of the disk
- GKE typically uses Persistent Disks as boot disks and to back Kubernetes' Persistent Volumes
- Increasing disk size increases both IOPS and throughput, up to certain limits
- Each persistent disk write operation contributes to the virtual machine instance's cumulative network egress cap
- IOPS performance of disks, especially SSDs, depends on the number of vCPUs in the instance in addition to disk size
- Lower core VMs have lower write IOPS limits due to network egress limitations on write throughput
- If a virtual machine instance has insufficient CPUs, the application won't be able to get close to IOPS limit
- Use larger and fewer disks to achieve higher IOPS and throughput
- Workloads that require high capacity or large numbers of disks need to consider the limits of how many PDs can be attached to a single VM
- For regular VMs, that limit is 128 disks with a total size of 64 TB, while shared-core VMs have a limit of 16 PDs with a total size of 3 TB
- Google Cloud enforces this limit, not Kubernetes
- Kubernetes, as any other system, has limits which needs to be taken into account while designing applications and planning their growth
- Kubernetes supports up to 5000 nodes in a single cluster
- The number of nodes is only one of many dimensions on which Kubernetes can scale
- Other dimensions include the total number of Pods, Services, or backends behind a Service
- Do not stretch more than one dimension at the time
- This can cause problems even in smaller clusters
- For example, trying to schedule 100 Pods per node in a 5k node cluster likely won't succeed because the number of Pods, the number of Pods per node, and the number of nodes would be stretched too far
- Extending Kubernetes clusters with webhooks or CRDs is common and can constrain the ability to scale the cluster
- Most limits are not enforced, so users can go above them
- Exceeding limits won't make the cluster instantly unusable
- Performance degrades (sometimes shown by failing SLOs) before failure
- Some of the limits are given for largest possible cluster
- In smaller clusters, limits are proportionally lower
- The performance of iptables degrades if there are too many services or if there is a high number of backends behind a Service
-
Dashboard
- Cloud Console offers useful dashboards for project's GKE clusters and their resources
- Dashboards can be used to view, inspect, manage, and delete resources in clusters
- Deployments can be created from the Workloads dashboard
- In conjunction with the gcloud and kubectl command-line tools, the GKE dashboards are helpful for DevOps workflows and troubleshooting issues
- Dashboards can be used to get information about all resources in every cluster quickly and easily.
- Kubernetes clusters displays cluster's name, compute zone, cluster size, total cores, total memory, node version, outstanding notifications, and labels
- Workloads displays workloads (Deployments, StatefulSets, DaemonSets, Jobs, and Pods) deployed to clusters in current project
- Information includes each workload's name, status, type, number of running and total desired Pods, namespace, and cluster
- A YAML-based text editor is available for inspecting and editing deployed resources, and a Deploy mechanism for deploying stateless applications
- Services display project's Service and Ingress resources, with the resource's name, status, type, endpoints, number of running and total desired Pods, namespace, and cluster
- Configuration displays project's Secret and ConfigMap resources
- Storage displays PersistentVolumeClaim and StorageClass resources associated with clusters
- Object Browser lists all of the objects running in every cluster in a given project
- Kubernetes clusters shows every Kubernetes cluster created in a project
- Dashboard can be used to inspect details about clusters, make changes to their settings, connect to them using Cloud Shell, and delete them
- Clusters and node versions can be upgraded from this dashboard
- When a new upgrade is available, the dashboard displays a notification for the relevant cluster
- Select a cluster to view details displays the current settings for the cluster and its node pool
- Storage displays the persistent volumes and storage classes provisioned for the cluster's nodes
- Nodes lists all of the cluster's nodes and their requested CPU, memory, and storage resources
- Use the Workloads dashboard to inspect, manage, edit, and delete workloads deployed to clusters
- Deploy stateless applications using the menu's Deploy mechanism
- Select a workload to displays the current settings for the workload, including its usage metrics, labels and selectors, update strategy, Pods specification, and active revisions
- Managed pods lists the Pods that are managed by the workload.
- Select a Pod from the list to view that Pod's details, events, logs, and YAML configuration file
- Revision history lists each revision of the workload, including the active revision
- Events lists human-readable messages for each event affecting the workload
- YAML displays the workload's live configuration
- Use the YAML-based text editor provided in this menu to make changes to the workload
- Copy and download the configuration from this menu
- Menus might appear differently depending on the type of workload you're viewing
- Use the dashboard's filter search to list only specific workloads
- By default, Kubernetes system objects are filtered out
- Some workloads have an Actions menu with convenient buttons for performing common operations
- Autoscale, update, and scale a Deployment from its Actions menu
- Services displays the load-balancing Service and traffic-routing Ingress objects associated with your project
- It also displays the default Kubernetes system objects associated with networking, such as the Kubernetes API server, HTTP backend, and DNS
- Select a resource from the list to display information about the resource, including its usage metrics, IP, and ports
- Events lists human-readable messages for each event affecting the resource
- YAML displays the resource's live configuration
- Use the YAML-based text editor provided in this menu to make changes to the resource
- Copy and download the configuration from this menu
- Configuration displays configuration files, Secrets, ConfigMaps, environment variables, and other configuration resources associated with project
- It also displays Kubernetes system-level configuration resources, such as tokens used by service accounts
- Select a resource from this dashboard to view a detailed page about that resource
- Sensitive data stored in Secrets is not displayed in the console
- Storage lists the storage resources provisioned for your clusters
- PersistentVolumeClaim or StorageClass resource to be used by a cluster's nodes appear in this dashboard
- Persistent volume claims list all PersistentVolumeClaim resources in the clusters
- PersistentVolumeClaims are used with StatefulSet workloads to have those workloads claim storage space on a persistent disk in the cluster
- Storage classes list all StorageClass resources associated with nodes
- StorageClasses are used as "blueprints" for using space on a disk
- The disk's provisioner, parameters (such as disk type and compute zone), and reclaim policy are specified
- StorageClass resources can also be used for dynamic volume provisioning to create storage volumes on demand
- Select a resource from these dashboards to view a detailed page for that resource
- Object Browser lists all of the objects running in all of the clusters in current project
- List and filter resources by specific API groups and Resource Kinds
- Preview YAML file for any resource by navigating to its details page
- The Kubernetes Dashboard add-on is disabled by default on GKE
- Cloud Console provides dashboards to manage, troubleshoot, and monitor GKE clusters, workloads, and applications