-
Overview
- Multi-tenancy in Google Kubernetes Engine (GKE) refers to one or more clusters that are shared between tenants
- In Kubernetes, a tenant can be defined as a team responsible for developing and operating one or more workloads, or a set of related workloads, whether operated by one or more teams
- A tenant can be a single workload, such as a Deployment
- Cluster multi-tenancy is often implemented to reduce costs or to consistently apply administration policies across tenants
- Incorrectly configuring a GKE cluster or its associated GKE resources can result in unachieved cost savings, incorrect policy application, or destructive interactions between different tenants' workloads
- Each tenant is a single team developing a single workload
- The platform team owns the clusters and defines the amount of resources each tenant team can use; each tenant can request more
- Each tenant team should be able to deploy their application through the Kubernetes API without having to communicate with the platform team
- Each tenant should not be able to affect other tenants in the shared cluster, except via explicit design decisions like API calls, shared data sources, etc.
-
Multitenancy
- Cluster multi-tenancy is an alternative to managing many single-tenant clusters
- A multi-tenant cluster is shared by multiple users and/or workloads which are referred to as "tenants"
- Includes clusters shared by different users at a single organization, and clusters that are shared by per-customer instances of a SaaS application
- Operators of multi-tenant clusters must isolate tenants from each other to minimize the damage a malicious tenant can do to other tenants
- Cluster resources must be fairly allocated among tenants
- When planning a multi-tenant architecture, consider the layers of resource isolation in Kubernetes: cluster, namespace, node, pod, and container
- Consider the security implications of sharing different types of resources among tenants
- There might be a need to prevent certain workloads from being colocated
- It might not be recommended to allow untrusted code from outside of an organization to run on the same node as sensitive workloads
- Although Kubernetes cannot guarantee perfect secure isolation between tenants, it offer features that may be sufficient for specific use cases
- Kubernetes allows users to separate each tenant and their Kubernetes resources into their own namespaces
- Policies can be used to enforce tenant isolation
- Policies are usually scoped by namespace and can be used to restrict API access, constrain resource usage, and restrict containers priviledges
- The tenants of a multi-tenant cluster share extensions, controllers, add-ons, and custom resource definitions
- Cluster operations, security, and auditing are centralized in the cluster control plane
- Operating a multi-tenant cluster reduces management overhead and resource fragmentation
- With a multi-tenant cluster, there is no need to wait for cluster creation to create new tenants
- In an enterprise environment, the tenants of a cluster are distinct teams within the organization
- Typically, each tenant has a corresponding namespace
- Alternative models of multi-tenancy with a tenant per cluster, or a tenant per Google Cloud project, are harder to manage
- Kubernetes network policy can be used to require network traffic between namespaces to be explicitly whitelisted
- Cluster administrator role is for administrators of the entire cluster, who manage all tenants
- Cluster administrators can create, read, update, and delete any policy object
- Cluster administrators can create namespaces and assign them to namespace administrators.
- Namespace administrator role is for administrators of specific, single tenants
- A namespace administrator can manage the users in their namespace
- Developer role can create, read, update, and delete namespaced non-policy objects like Pods, Jobs, and Ingresses
- Developers only have privileges in the namespaces they have access to
- The tenants of a SaaS provider's cluster are the per-customer instances of the application, and the SaaS's control plane
- To take advantage of namespace-scoped policies, application instances should be organized into their own namespaces
-
Resource organization
- For enterprise organizations deploying multi-tenant clusters, configuration is needed to manage the additional complexity
- Project configuration is needed to isolate administrative concerns as well as mapping organization structure to cloud identities and accounts
- Controls are needed to manage additional Google Cloud resources, such as databases, logging and monitoring, storage, and networking
- Folders and projects can be used to enforce separation of concerns
- Folders allow teams to set policies that cascade across multiple projects
- Projects can be used to segregate production vs. staging environments and teams from each other
- Control access to Google Cloud resources through Cloud Identity and Access Management (Cloud IAM) policies
- Start by identifying the groups needed for the organization and their scope of operations, then assign the appropriate Cloud IAM role to the group
- Use Google Groups to efficiently assign and manage Cloud IAM for users
- If resources cannot be supported by a single cluster, create more clusters
- To ease deployments across multiple environments that are hosted in different clusters, standardize namespace naming convention
- Avoid tying the environment name to the namespace name and instead use the same name across environments.
- Using the same name avoids having to change the config files across environments
- Create a tenant-specific Google service account for each distinct workload in a tenant namespace
- This ensures that tenants can manage service accounts for the workloads that they own/deploy in their respective namespaces
- The Kubernetes service account for each namespace can be mapped to one Google service account by using Workload Identity
- To ensure all tenants that share a cluster have fair access to the cluster resources, enforce resources quotas
- Create a resource quota for each namespace based on the number of Pods deployed by each tenant, and the amount of memory and CPU required by Pods
-
Networking
- To maintain centralized control over network resources, such as subnets, routes, and firewalls, use Shared VPC networks
- Resources in a Shared VPC can communicate with each other securely and efficiently across project boundaries using internal IPs
- Each Shared VPC network is defined and owned by a centralized host project, and can be used by one or more service projects
- Using Shared VPC and Cloud IAM, users can separate network administration from project administration
- Separating network administration from project administration helps implement the principle of least privilege
- When setting up a Shared VPC, configure the subnets and their secondary IP ranges in the VPC
- To determine the subnet size, the expected number of tenants should be considered, the number of Pods and Services expected to run, and the maximum and average Pod size
- Calculating the total cluster capacity needed requires an understanding of the desired instance size, and total node count
- With the total number of nodes, the total IP space consumed can be calculated to determine the desired subnet size
- The Node, Pod, and Services IP ranges must all be unique.
- A subnet's primary and secondary IP address ranges cannot overlap
- The maximum number of Pods and Services for a given GKE cluster is limited by the size of the cluster's secondary ranges
- The maximum number of nodes in the cluster is limited by the size of the cluster's subnet's primary IP address range and the cluster's Pod address range
- For flexibility and control over IP address management, configure the maximum number of Pods that can run on a node
- By reducing the number of Pods per node, the CIDR range allocated per node is reduced, requiring fewer IP addresses
- To calculate subnets for clusters, use the GKE IPAM calculator open source tool
- IP Address Management (IPAM) enables efficient use of IP space/subnets and avoids having overlaps in ranges
- Tenants that require further isolation for resources that run outside the shared clusters may use their own VPC, which is peered to the Shared VPC
- This provides security at the cost of increased complexity and numerous other limitations
-
Security
- Create one cluster per project to reduce the risk of project-level configurations adversely affecting many clusters ("blast radius"), and to provide separation for quota and billing
- Make the production cluster private to disable access to the nodes and manage access to the control plane
- Use private clusters for development and staging environments
- Ensure the control plane for the cluster is regional to provide high availability for multi-tenancy; any disruptions to the control plane will impact tenants
- Create an HTTP(s) load balancer to allow a single ingress per cluster, where each tenant's Services are registered with the cluster's Ingress resource
- Create a Kubernetes Ingress resource to define how traffic reaches Services and how the traffic is routed to a tenant's application
- By registering Services with the Ingress resource, the Services' naming convention becomes consistent, accessible via a single ingress
- To control network communication between Pods in each of cluster's namespaces, create network policies based on tenants' requirements
- As an initial recommendation, block traffic between namespaces that host different tenants' applications
- Cluster administrator can apply a deny-all network policy to deny all ingress traffic to avoid Pods from one namespace accidentally sending traffic to Services or databases in other namespaces
- Clusters that run untrusted workloads are more exposed to security vulnerabilities than other clusters
- Use GKE Sandbox to harden the isolation boundaries between workloads for multi-tenant environments
- For security management, Google recommends starting with GKE Sandbox and then using Pod security policies to fill in any gaps
- GKE Sandbox is based on gVisor, an open source container sandboxing project, and provides additional isolation for multi-tenant workloads by
- GKE Sandbox adds an extra layer between your containers and host OS
- Container runtimes often run as a privileged user on the node and have access to most system calls into the host kernel
- In a multi-tenant cluster, one malicious tenant can gain access to the host kernel and to other tenant's data
- GKE Sandbox mitigates these threats by reducing the need for containers to interact with the host by shrinking the attack surface of the host and restricting the movement of malicious actors
- GKE Sandbox provides a user-space kernel, written in Go, that handles system calls and limits interaction with the host kernel
- Each Pod has its own isolated user-space kernel
- The user-space kernel also runs inside namespaces and seccomp filtering system calls
- To prevent Pods from running in a cluster, create a Policy Controller constraint which specifies conditions that Pods must meet in a cluster
- Authorize the use of policies for a Pod by binding the Pod's serviceAccount to a role that has access to use the policies
- Google recommends defining the most restrictive policy bound to system:authenticated and more permissive policies bound as needed for exceptions
- To ensure that no child process of a container can gain more privileges than its parent, set the allowPrivilegeEscalation parameter to false
- To disallow escalation privileges outside of the container, disable access to the components of the Host namespaces (hostNetwork, hostIPC, and hostPID)
- This also blocks snooping on network activity of other Pods on the same node
- To securely grant workloads access to Google Cloud services, enable Workload Identity in the cluster
- Workload Identity helps administrators manage Kubernetes service accounts that Kubernetes workloads use to access Google Cloud services
- When a user creates a cluster with Workload Identity enabled, an Identity Namespace is established for the project that the cluster is housed in
- To protect the control plane, restrict access to authorized networks
- In GKE, when master authorized networks is enabled, users can whitelist up to CIDR ranges and allow IP addresses only in those ranges to access the control plane
- GKE uses Transport Layer Security (TLS) and authentication to provide secure access to the cluster master endpoint from the public internet
- By using authorized networks, users can further restrict access to specified sets of IP addresses
- To host a tenant's non-cluster resources, create a service project for each tenant
- These service projects contain logical resources specific to the tenant applications (for example, logs, monitoring, storage buckets, service accounts, etc.)
- All tenant service projects are connected to the Shared VPC in the tenant host project
- Define finer-grained access to cluster resources for tenants by using Kubernetes RBAC
- On top of the read-only access initially granted with Cloud IAM to tenant groups, define namespace-wide Kubernetes RBAC roles and bindings for each tenant group
- In addition to creating RBAC roles and bindings that assign Google Workspace or Cloud Identity groups various permissions inside their namespace, Tenant admins often require the ability to manage users in each of those groups
- To efficiently manage tenant permissions in a cluster, bind RBAC permissions to Google Groups
- The membership of those groups are maintained by Google Workspace administrators, so cluster administrators do not need detailed information about users
- To provide a logical isolation between tenants that are on the same cluster, implement namespaces
- As part of the Kubernetes RBAC process, the cluster admin creates namespaces for each tenant group
- The Tenant admin manages users (tenant developers) within their respective tenant namespace
- Tenant developers are then able to use cluster and tenant specific resources to deploy their applications
-
Availability
- There are cost implications with running regional clusters
- Ensure the nodes in the cluster span at least three zones to achieve zonal reliability
- There are cost implications for egress between zones in the same region
- To accommodate the demands of tenants, automatically scale nodes in the cluster by enabling autoscaling
- Autoscaling helps systems appear responsive and healthy when heavy workloads are deployed by various tenants in their namespaces, or to respond to zonal outages
- When enabling autoscaling, specify the minimum and maximum number of nodes in a cluster based on the expected workload sizes
- By specifying the maximum number of nodes, users can ensure there is enough space for all Pods in the cluster, regardless of the namespace they run in
- Cluster autoscaling rescales node pools based on the min/max boundary, helping to reduce operational costs when the system load decreases
- Cluster autoscaling avoid Pods going into a pending state when there aren't enough available cluster resources
- To determine the maximum number of nodes, identify the maximum total amount of CPU and memory that each tenant requires
- Using the maximum number of nodes, users can choose instance sizes and counts, taking into consideration the IP subnet space made available to the cluster
- Use Pod autoscaling to automatically scale Pods based on resource demands
- Unlike Vertical Pod Autoscaler, Horizontal Pod Autoscaler does not modify the workload's configured requests
- Horizontal Pod Autoscaler scales only the number of replicas
- Vertical Pod Autoscaling (VPA) is used to scale CPU/memory to existing Pods and Horizontal Pod Autoscaler (HPA) scales the number of Pod replicas based on CPU/memory utilization or custom metrics
- Do not use VPA with HPA on the same Pods unless scaling is based on different metrics
- The sizing of the cluster is dependent on the type of workloads
- If workloads have greater density, the cost efficiency is higher but there is also a greater chance for resource contention
- The minimum size of a cluster is defined by the number of zones it spans: one node for a zonal cluster and three nodes for a regional cluster
- To reduce downtimes during cluster/node upgrades and maintenance, schedule maintenance windows to occur during off-peak hours
- During upgrades, there can be temporary disruptions when workloads are moved to recreate nodes
- To ensure minimal impact of such disruptions, schedule upgrades for off-peak hours and design application deployments to handle partial disruptions seamlessly, if possible
-
Metering
- To obtain cost breakdowns on individual namespaces and labels in a cluster, enable GKE usage metering
- GKE usage metering tracks information about resource requests and resource usage of a cluster's workloads, which can be further broken down by namespaces and labels
- With GKE usage metering, users can approximate the cost breakdown for departments/teams that are sharing a cluster.
- GKE usage metering enables users to understand the usage patterns of individual applications (or even components of a single application)
- GKE usage metering help cluster admins triage spikes in usage, and provide better capacity planning and budgeting
- When GKE usage metering is enabled on the multi-tenant cluster, resource usage records are written to a BigQuery table
- Tenant-specific metrics can be exported to BigQuery datasets in the corresponding tenant project, which auditors can then analyze to determine cost breakdowns
- Auditors can visualize GKE usage metering data by creating dashboards with plug-and-play Google Data Studio templates
- Tenants can be provided with logs data specific to their project workloads by using Stackdriver Kubernetes Engine Monitoring
- Cloud Monitoring manages both the Monitoring and Logging services together and provides a dashboard customized for GKE clusters
- To create tenant- specific logs, the cluster admin creates a sink to export log entries to BigQuery datasets, filtered by tenant namespace
- The exported data in BigQuery can then be accessed by the tenants
- To provide tenant-specific monitoring, the cluster admin can use a dedicated namespace that contains a Prometheus to Stackdriver adapter (prometheus-to-sd) with a per namespace config
- This configuration ensures tenants can only monitor their own metrics in their projects
- However, the downside to this design is the extra cost of managing Prometheus deployment(s)
- Alternatively, teams can accept shared tenancy within the Monitoring environment and allow tenants to have visibility into all metrics in the projects
- A single Grafana instance can be deployed per tenant, which communicates with the shared Monitoring environment.
- Configure the Grafana instance to only view the metrics from a particular namespace
- The downside to this option is the cost and overhead of managing these additional deployments of Grafana
-
Implementation
-
Organizational setup
- Define resource hierarchy
- Create folders based on organizational hierarchy and environmental needs
- Create host and service projects for clusters and tenants
-
Identity and access management
- Identify and create a set of Google Groups for organization
- Assign users and Cloud IAM policies to the groups
- Refine tenant access with namespace-scoped roles and role bindings
- Grant tenant admin access to manage tenant users
-
Networking
- Create per-environment Shared VPC networks for the tenant and cluster networks
-
High availability and reliability
- Create one cluster per project to reduce the "blast radius"
- Create the cluster as a private cluster
- Ensure the control plane for the cluster is regional
- Span nodes for the cluster over at least three zones
- Enable cluster autoscaling and Pod autoscaling
- Specify maintenance windows to occur during off-peak hours
- Create an HTTP(s) load balancer to allow a single ingress per multi-tenant cluster
-
Security
- Create namespaces to provide isolation between tenants that are on the same cluster
- Create network policies to restrict communication between Pods
- Mitigate threats by running workloads on GKE Sandbox
- Create Pod Security Policies to constrain how Pods operate on clusters
- Enable Workload Identity to manage Kubernetes service accounts and access
- Enable master authorized networks to restrict access to the control plane
-
Logging and monitoring
- Enforce resource quotas for each namespace
- Track usage metrics with GKE usage metering
- Set up tenant-specific logging with Kubernetes Engine Monitoring
- Set up tenant-specific monitoring