Kubernetes Engine Security

Overview
1. Protecting workloads in Google Kubernetes Engine involves many layers of the stack, including the contents of container image, the container runtime, the cluster network, and access to the cluster API server
2. Take a layered approach to protecting clusters and workloads
3. Apply the principle of least privilege to the level of access provided to users and the application
4. In each layer there may be different tradeoffs that must be made that allow the right level of flexibility and security for the organization to securely deploy and maintain their workloads
5. User accounts are accounts that are known to Kubernetes, but are not managed by Kubernetes
6. Service accounts are accounts that are created and managed by Kubernetes, but can only be used by Kubernetes-created entities, such as pods
7. In a Google Kubernetes Engine cluster, Kubernetes user accounts are managed by Google Cloud, and may be Google Accounts or Google Cloud service accounts
8. Once authenticated, authorize these identities to create, read, update or delete Kubernetes resources
9. Kubernetes service accounts and Google Cloud service accounts are different entities
10. Kubernetes service accounts are part of the cluster in which they are defined and are typically used within that cluster
11. Google Cloud service accounts can be granted permissions both within clusters and to Google Cloud project clusters themselves, as well as to any Google Cloud resource using Cloud Identity and Access Management (Cloud IAM)
12. Google Cloud service accounts are more powerful than Kubernetes service accounts
13. In order to follow the security principle of least privilege, consider using Google Cloud service accounts only when their capabilities are required
14. To configure more granular access to Kubernetes resources at the cluster level or within Kubernetes namespaces, use Role-Based Access Control (RBAC)
15. RBAC allows users to create detailed policies that define which operations and resources to allow users and service accounts to access
16. With RBAC, users can control access for Google Accounts, Google Cloud service accounts, and Kubernetes service accounts
17. Use Kubernetes RBAC and Cloud IAM as the sources of truth
18. In Google Kubernetes Engine, the Kubernetes master components are managed and maintained by Google
19. The master components host the software that runs the Kubernetes control plane, including the API server, scheduler, controller manager and the etcd database where Kubernetes configuration is persisted
20. By default, the master components use a public IP address
21. Protect the Kubernetes API server by using master authorized networks, and private clusters, which allow users to assign a private IP address to the master and disable access on the public IP address
22. Handle cluster authentication in Google Kubernetes Engine by using Cloud IAM as the identity provider
23. For enhanced authentication security, disabled Basic Authentication by setting an empty username and password for the MasterAuth configuration
24. Disable the client certificate which ensures that there is one less key to think about when locking down access to the cluster
25. Another way to help secure a Kubernetes master is to perform credential rotation on a regular basis
26. When credential rotation is initiated, the SSL certificates and cluster certificate authority are rotated
27. This process is automated by Google Kubernetes Engine and also ensures that the master IP address rotates
28. Google Kubernetes Engine deploys workloads on Compute Engine instances running in the Google Cloud project
29. These instances are attached to Google Kubernetes Engine cluster as nodes
30. By default, Google Kubernetes Engine nodes use Google's Container-Optimized OS as the operating system on which to run Kubernetes and its components
31. Container-Optimized OS implements locked-down firewall
32. Container-Optimized OS implements read-only filesystem where possible
33. Container-Optimized OS implements limited user accounts and disabled root login
34. A best practice is to patch OS on a regular basis
35. From time to time, security issues in the container runtime, Kubernetes itself, or the node operating system might require an upgrade to the nodes more urgently
36. When the node is upgraded, the node's software is upgraded to their latest versions
37. Users can manually upgrade the nodes in the cluster, but Google Kubernetes Engine also allows users to enable automatic upgrades
38. For clusters that run unknown or untrusted workloads, a good practice is to protect the operating system on the node from the untrusted workload running in a Pod
39. Multi-tenant clusters such as software-as-a-service (SaaS) providers often execute unknown code submitted by their users
40. Enable GKE Sandbox on clusters to isolate untrusted workloads in sandboxes on the node
41. GKE Sandbox is built using gVisor, an open source project
42. Google Kubernetes Engine nodes run as Compute Engine instances, and as such they have access to instance metadata by default
43. Instance metadata is used to provide nodes with credentials and configurations used in bootstrapping and connecting to the Kubernetes master nodes
44. A Pod running on a node does not necessarily need this information, which contains sensitive data, like the node's service account key
45. Users can lock down sensitive instance metadata paths by disabling legacy APIs and by using metadata concealment
46. Metadata concealment ensures that Pods running in a cluster are not able to access sensitive data by filtering requests to fields such as the kube-env
47. Most workloads running in Google Kubernetes Engine need to communicate with other services that could be running either inside or outside of the cluster
48. Use several different methods to control what traffic is allowed to flow through clusters and their Pods
49. By default, all Pods in a cluster can be reached over the network via their Pod IP address
50. By default, egress traffic allows outbound connections to any address accessible in the VPC into which the cluster was deployed
51. Cluster administrators and users can lock down the ingress and egress connections created to and from the Pods in a namespace by using network policies
52. By default, when there are no network policies defined, all ingress and egress traffic is allowed to flow into and out of all Pods
53. Network policies allow users to use tags to define the traffic flowing through Pods
54. Once a network policy is applied in a namespace, all traffic is dropped to and from Pods that don't match the configured labels
55. As part of the creation of clusters and/or namespaces, users can apply the default deny traffic to both ingress and egress of every Pod to ensure that all new workloads added to the cluster must explicitly authorize the traffic they require
56. To load balance Kubernetes Pods with a network load balancer, create a Service of type LoadBalancer that matches Pod's labels
57. With the Service created, there will be an external-facing IP that maps to ports on the Kubernetes Pods
58. Filtering authorized traffic is achieved at the node level by kube-proxy, which filters based on IP address
59. To configure filtering, use the loadBalancerSourceRanges configuration of the Service object
60. With this configuration parameter, provide a list of CIDR ranges that to whitelist for access to the Service
61. If loadBalancerSourceRanges is not configured, all addresses are allowed to access the Service via its external IP
62. For cases in which access to the Service is not required, consider using an internal load balancer
63. The internal load balancer also respects the loadBalancerSourceRanges when it is necessary to filter out traffic from inside of the VPC
64. Kubernetes allows users to quickly provision, scale, and update container-based workloads
65. Limiting the privileges of containerized processes is important for the overall security of a cluster
66. Google Kubernetes Engine allows users to set security-related options via the Security Context on both Pods and containers
67. Security Context settings allow users to change security settings of processes such as user and group to run as, available Linux capabilities and ability to escalate privileges
68. In order to change Security Context settings at the cluster level rather than at the Pod or container, users need to implement a PodSecurityPolicy
69. Cluster administrators can use PodSecurityPolicies to ensure that all Pods in a cluster adhere to a minimum baseline policy
70. The Google Kubernetes Engine node operating systems, both Container-Optimized OS and Ubuntu, apply the default Docker AppArmor security policies to all containers started by Kubernetes
71. The simplest and most secure way to authorize Pods to access Google Cloud resources is with Workload Identity
72. Workload identity allows a Kubernetes service account to run as a Google Cloud service account
73. Pods that run as the Kubernetes service account have the permissions of the Google Cloud service account
74. Workload Identity can be used with GKE Sandbox
75. Pods can also authenticate to Google Cloud using the Kubernetes clusters' service account credentials from metadata
76. However, these credentials can be reached by any Pod running in the cluster
77. Create and configure a custom service account that has the minimum Cloud IAM roles that are required by all the Pods running in the cluster
78. This approach is not compatible with GKE Sandbox because GKE Sandbox blocks access to the metadata server
79. A third way to grant credentials for Google Cloud resources to applications is to manually use the service account's key
80. This approach is strongly discouraged because of the difficulty of securely managing account keys
81. Application-specific GCP service accounts should be used to provide credentials so that applications have the minimal necessary permissions
82. Each service account is assigned only the Cloud IAM roles that are needed for its paired application to operate successfully
83. Keeping the service account application-specific makes it easier to revoke its access in the case of a compromise without affecting other applications
84. Once a service account has been assigned the correct Cloud IAM roles, a JSON service account key can be created and then mounted into a Pod using a Kubernetes Secret
85. Binary Authorization is a service on Google Cloud that provides software supply-chain security for applications that run in the Cloud
86. Binary Authorization works with images that deploy to GKE from Container Registry or another container image registry
87. With Binary Authorization, users can ensure that internal processes that safeguard the quality and integrity of software have successfully completed before an application is deployed to your production environment
88. Audit logging provides a way for administrators to retain, query, process, and alert on events that occur in Google Kubernetes Engine environments
89. Administrators can use the logged information to do forensic analysis, real-time alerting, or for cataloging how a fleet of Google Kubernetes Engine clusters are being used and by whom
90. By default, Google Kubernetes Engine logs Admin Activity logs
91. Users can optionally also log Data Access events, depending on the types of operations they are interested in inspecting
Control Plane
1. The control plane includes the Kubernetes API server, etcd, and a number of controllers
2. Google is responsible for securing the control plane, though users might be able to configure certain options based on their requirements
3. Users are responsible for securing their nodes, containers, and Pods.
4. GKE control plane components run on Container-Optimized OS, which is a security-hardened operating system designed by Google
5. In a GKE cluster, the control plane components run on Compute Engine instances owned by Google, in a Google-managed project
6. Each instance runs these components for only one customer
7. Authentication to the Kubernetes API server and etcd is done the same way it's done for other Google Cloud services
8. Application-layer Transport Security (ALTS) protects these communications
9. SSH sessions by Google Site Reliability Engineers are audit logged through Google's internal audit infrastructure, which is available for forensics and security response
10. In Google Cloud, customer content is encrypted at the filesystem layer by default
11. Disks that host etcd storage for GKE clusters are encrypted at the filesystem layer
12. In a regional cluster, communication between etcd servers to establish a quorum is encrypted by mutual TLS
13. Each cluster has its own root certificate authority (CA)
14. An internal Google service manages root keys for this CA
15. Each cluster also has its own CA for etcd
16. Root keys for the etcd CA are distributed to the metadata of the VMs that run the Kubernetes API server
17. Communication between nodes and the Kubernetes API server is protected by TLS
18. GKE adheres to Google standards for testing, qualifying, and gradually rolling out changes to the control plane
19. GKE control plane components are managed by a team of Google site reliability engineers, and are kept up to date with the latest security patches
20. This includes patches to the host operating system, Kubernetes components, and containers running on the control plane VMs
21. GKE applies new kernel, OS, and Kubernetes-level fixes promptly to control plane VMs
22. When these contain fixes for known vulnerabilities, additional information is available in the GKE Security Bulletins
23. GKE scans all Kubernetes system and GKE-specific containers for vulnerabilities using Container Registry Vulnerability Scanning, and keeps the containers patched, benefitting the whole Kubernetes ecosystem
24. Google engineers participate in finding, fixing, and disclosing Kubernetes security bugs
25. Google pays external security researchers, through the Google-wide vulnerability reward program, to look for security bugs
26. In some instances, Google has been able to patch all running clusters before the vulnerability became public
27. Audit Logging is enabled by default
28. This provides a detailed record of calls made to the Kubernetes API server
29. Users can view the log entries on the Logs page in the GCP console
30. Users can also use BigQuery to view and analyze logs
31. By default, the Kubernetes API server uses a public IP address
32. Protect the Kubernetes API server by using master authorized networks and private clusters, which
33. This allow users to assign a private IP address to the Kubernetes API server and disable access on the public IP address
34. Users can handle cluster authentication in GKE by using Cloud Identity and Access Management (Cloud IAM) as the identity provider
35. Basic Authentication should be disabled by setting an empty username and password for the MasterAuth configuration
36. Disable the client certificate, which ensures there is one less key to think about when locking down access to clusters
37. Enhance the security of the control plane by doing credential rotation on a regular basis
38. When credential rotation is initiated, the TLS certificates and cluster certificate authority are rotated automatically
39. GKE also rotates the IP address of Kubernetes API server
Trust
1. The master communicates with a node for managing containers
2. When the master sends a request to the node, for example, kubectl logs, that request is sent over an SSH tunnel, and furthermore protected with unauthenticated TLS, providing integrity and encryption
3. When a node sends a request to the master, for example, kubelet to API server, that request is authenticated and encrypted using mutual TLS
4. A node may communicate with another node as part of a specific workload
5. When the node sends a request to another node, that request is authenticated, and will be encrypted if that connection crosses a physical boundary controlled by Google
6. Note that no Kubernetes components require node-to-node communication
7. A Pod may communicate with another Pod as part of a specific workload
8. When the Pod sends a request to another Pod, that request is neither authenticated nor encrypted
9. Note that no Kubernetes components require Pod-to-Pod communication
10. Pod-to-Pod traffic can be restricted with a Network Policy, and can be encrypted using a service mesh like Istio or otherwise implementing application-layer encryption
11. An instance of etcd may communicate with another instance of etcd to keep state updated
12. When an instance of etcd sends a request to another instance, that request is authenticated and encrypted using mutual TLS
13. The traffic never leaves a GKE-owned network protected by firewalls
14. Master-to-etcd communication is entirely over localhost, and is not authenticated or encrypted
15. The cluster root Certificate Authority (CA) is used to validate the API server and kubelets' client certificates that is, masters and nodes have the same root of trust
16. Any kubelet within the cluster node pool can request a certificate from this CA using the certificates.k8s.io API, by submitting a certificate signing request
17. A separate per-cluster etcd CA is used to validate etcd's certificates
18. The API server and kubelets rely on Kubernetes' cluster root CA for trust
19. In GKE, the master API certificate is signed by the cluster root CA
20. Each cluster runs its own CA, so that if one cluster's CA were to be compromised, no other cluster CA would be affected
21. An internal Google service manages root keys for this CA, which are non-exportable
22. This service accepts certificate signing requests, including those from the kubelets in each GKE cluster
23. Even if the API server in a cluster were compromised, the CA would not be compromised, so no other clusters would be affected
24. Each node in the cluster is injected with a shared Secret at creation, which it can use to submit certificate signing requests to the cluster root CA and obtain kubelet client certificates
25. These certificates are then used by the kubelet to authenticate its requests to the API server
26. Note that this shared Secret is reachable by Pods, unless metadata concealment is enabled.
27. The API server and kubelet certs are valid for five years, but they can be manually rotated sooner by performing a credential rotation
28. etcd relies on a separate per-cluster etcd CA for trust in GKE
29. Root keys for the etcd CA are distributed to the metadata of each VM on which the master runs
30. Any code executing on master VMs, or with access to compute metadata for these VMs, can sign certificates as this CA
31. Even if etcd in a cluster were compromised, the CA is not shared between clusters, so no other clusters would be affected
32. The etcd certs are valid for five years
33. To rotate all cluster's API server and kubelet certificates, perform a credential rotation
34. There is no way to trigger a rotation of the etcd certificates; this is managed in GKE
35. Performing a credential rotation causes GKE to upgrade all node pools to the closest supported node version, and causes brief downtime for the cluster API
Shielded Nodes
1. Shielded GKE Nodes are built on top of Compute Engine Shielded VMs
2. Shielded GKE Nodes provide Node OS provenance check, a cryptographically verifiable check to make sure the node OS is running on a virtual machine in a Google data center
3. Shielded GKE Nodes provide an enhanced rootkit and bootkit protection against gaining persistence in the node, using secure and measured boot, virtual trusted platform module (vTPM), UEFI firmware and integrity monitoring
4. Shielded GKE Nodes can be used with GPUs
5. There is no additional cost to run Shielded GKE Nodes
6. Shielded GKE Nodes are available in all zones and regions
7. Shielded GKE Nodes can be used with Container-Optimized OS (COS), COS with containerd, and Ubuntu node images
8. After Shielded GKE Nodes is enabled for a cluster, any nodes created in a node pool without Shielded GKE Nodes enabled or created outside of any node pool aren't able to join the cluster
Sandbox
1. GKE Sandbox provides an extra layer of security to prevent untrusted code from affecting the host kernel on cluster nodes
2. A container runtime such as docker or containerd provides some degree of isolation between the container's processes and the kernel running on the node
3. The container runtime can run as a privileged user on the node and has access to most system calls into the host kernel
4. Multi-tenant clusters and clusters whose containers run untrusted workloads are more exposed to security vulnerabilities than other clusters
5. Examples include SaaS providers, web-hosting providers, or other organizations that allow their users to upload and run code
6. A flaw in the container runtime or in the host kernel could allow a process running within a container to "escape" the container and affect the node's kernel, potentially bringing down the node
7. The potential also exists for a malicious tenant to gain access to and exfiltrate another tenant's data in memory or on disk, by exploiting such a defect
8. An untrusted workload could potentially access other Google Cloud services or cluster metadata
9. gVisor is a userspace re-implementation of the Linux kernel API that does not need elevated privileges
10. In conjunction with a container runtime such as containerd , the userspace kernel re-implements the majority of system calls and services them on behalf of the host kernel
11. Direct access to the host kernel is limited
12. From the container's point of view, gVisor is nearly transparent, and does not require any changes to the containerized application
13. When GKE Sandbox is enabled on a node pool, a sandbox is created for each Pod running on a node in that node pool
14. In addition, nodes running sandboxed Pods are prevented from accessing other Google Cloud services or cluster metadata
15. Pods that do not run in a sandbox are called regular Pods
16. Each sandbox uses its own userspace kernel
17. Decisions can be made about how to group containers into Pods, based on the level of isolation required and the characteristics of applications
18. GKE Sandbox is an especially good fit for untrusted or third-party applications using runtimes such as Rust, Java, Python, PHP, Node.js, or Golan
19. GKE Sandbox is a good fit for web server front-ends, caches, or proxies
20. GKE Sandbox is a good fit for applications processing external media or data using CPUs
21. GKE Sandbox is a good fit for machine-learning workloads using CPUs
22. GKE Sandbox is a good fit for CPU-intensive or memory-intensive applications
23. It is highly recommended that users specify resource limits on all containers running in a sandbox
24. This protects against the risk of a defective or malicious application starving the node of resources and negatively impacting other applications or system processes running on the node
25. GKE Sandbox works well with many applications, but not all
26. GKE Sandbox protects cluster from untrusted or third-party workloads
27. There is generally no advantage to running trusted first-party workloads in a sandbox
28. GKE Sandbox cannot be enabled on the default node pool
29. When using GKE Sandbox, the cluster must have at least two node pools
30. There must be at least one node pool where GKE Sandbox is disabled
31. This node pool must contain at least one node, even if all workloads are sandboxed
32. Nodes running sandboxed Pods are prevented from accessing cluster metadata at the level of the operating system on the node
33. Regular Pods can run on a node with GKE Sandbox enabled
34. By default regular Pods cannot access Google Cloud services or cluster metadata
35. Use Workload Identity to grant Pods access to Google Cloud services
36. gVisor nodes have Hyper-Threading disabled by default to mitigate Microarchitectural Data Sampling (MDS) vulnerabilities announced by Intel
37. By default, the container is prevented from opening raw sockets, to reduce the potential for malicious attacks
38. Certain network-related tools such as ping and tcpdump, create raw sockets as part of their core functionality
39. To enable raw sockets, explicitly add the NET_RAW capability to the container's security context
40. Untrusted code running inside the sandbox may be allowed to reach external services such as database servers, APIs, other containers, and CSI drivers
41. These services are running outside the sandbox boundary and need to be individually protected
42. An attacker can try to exploit vulnerabilities in these services to break out of the sandbox
43. Consider the risk and impact of these services being reachable by the code running inside the sandbox, and apply the necessary measures to secure them
44. This includes file system implementations for container volumes such as ext4 and CSI drivers
45. CSI drivers run outside the sandbox isolation and may have privileged access to the host and services
46. An exploit in these drivers can affect the host kernel and compromise the entire node
47. Recommend that users run the CSI driver inside a container with the least amount of permissions required, to reduce the exposure in case of an exploit
48. Compute Engine Persistent Disk CSI driver is supported to be used with GKE Sandbox
49. Imposing an additional layer of indirection for accessing the node's kernel comes with performance trade-offs
50. GKE Sandbox provides the most tangible benefit on large multi-tenant clusters where isolation is important
51. Keep the following guidelines in mind when testing workloads with GKE Sandbox
52. GKE Sandbox might not be a good fit where direct access to the host kernel on the node is needed
Risk Management
1. To detect potential incidents, Google recommends setting up a process that collects and monitors workload's logs
2. Google sets up alerts based on abnormal events detected from logs
3. Alerts notify the security team when something unusual is detected
4. The security team can then review the potential incident
5. Alerts can be customized based on specific metrics or actions
6. Alerting on high CPU usage on GKE nodes may indicate they are compromised for cryptomining
7. Alerts should be generated where the user aggregates logs and metrics
8. Use GKE's Audit Logging in combination with logs-based alerting in Cloud Logging
9. After a user has been alerted to an incident, they should take action
10. Fix the vulnerability if possible
11. If the root cause of the vulnerability is not known or a fix ready is not available, apply mitigations
12. The mitigations might depend on the severity of the incident and certainty that the issue has been identified
13. A snapshot of the host VM's disk lets users perform forensics on the VM state at the time of the anomaly after the workload has been redeployed or deleted
14. Connecting to the host VM or workload container can provide information about the attacker's actions
15. Redeploying a container kills currently running processes in the affected container and restarts them
16. Deleting the workload kills currently running processes in the affected container without a restart
17. Before taking any of the actions, consider if there will be a negative reaction from the attacker if they are discovered
18. The attacker may decide to delete data or destroy workloads
19. If the risk is too high, consider more drastic mitigations such as deleting a workload before performing further investigation
20. Creating a snapshot of the VM's disk allows forensic investigation after the workload has been redeployed or deleted
21. Snapshots can be created while disks are attached to running instances
22. Snapshots only capture state written to disk.
23. Snapshots do not capture contents of the VM's memory
24. In severe incidents, workloads on the same node or the same cluster may also be compromised
25. This is known as a container escape
26. Monitor all of workloads for abnormal behavior and take appropriate actions
27. Consider what access an attacker may have before taking action
28. If a user suspects a container has been compromised and are concerned about informing the attacker, connect to the container and inspect it
29. Inspecting is useful for quick investigation before taking more disruptive actions
30. Inspecting is also the least disruptive approach to the workload, but it doesn't stop the incident
31. To avoid logging into a machine with a privileged credential, analyze workloads by setting up live forensics (such as GRR Rapid Response), on-node agents, or network filtering
32. For more information on suggested forensics tools, see Security controls and forensic analysis for GKE apps
33. By cordoning, draining, and limiting network access to the VM hosting a compromised container, partially isolate the compromised container from the rest of the cluster
34. Limiting access to the VM reduces risk but does not prevent an attacker from moving laterally in an environment if they take advantage of a critical vulnerability
35. Cordoning and draining a node moves workloads colocated with the compromised container to other VMs in the cluster
36. Cordoning and draining reduces an attacker's ability to impact other workloads on the same node
37. Cordoning and draining does not necessarily prevent them from inspecting a workload's persistent state
38. Google recommended blocking both internal and external traffic from accessing the host VM
39. Allow inbound connections from a specific VM on your network or VPC to connect to the quarantined VM
40. The first step is to abandon the VM from the Managed Instance Group that owns it
41. Abandoning the VM prevents the node from being marked unhealthy and auto-repaired (re-created) before the investigation is complete
42. Creating a firewall between the affected container and other workloads in the same network helps prevent an attacker from moving into other parts of the environment while further analysis is conducted
43. Firewalling a VM prevents new outbound connections to other VMs in your cluster using an egress rule.
44. Firewalling a VM prevents inbound connections to the compromised VM using an ingress rule
45. Adding firewall rules doesn't close existing connections
46. Removing the VM's external IP address breaks existing connections from the external internet, although not from inside the network
47. An attacker who compromises a privileged container or breaks out of an unprivileged container can access the VM's metadata
48. Google recommend using Shielded GKE Nodes to remove the privileged bootstrap keys from the metadata server
49. By redeploying a container, start a fresh copy of the container and delete the compromised container
50. Redeploy a container by deleting the Pod that hosts it
51. If the Pod is managed by a higher-level Kubernetes construct (for example, a Deployment or DaemonSet), deleting the Pod schedules a new Pod
52. This Pod runs new containers
Audit policy
1. In a Kubernetes Engine cluster, the Kubernetes API server writes audit log entries to a backend that is managed by Kubernetes Engine
2. As Kubernetes Engine receives log entries from the Kubernetes API server, it writes them to the project's Admin Activity log and Data Access log
3. The Kubernetes audit policy defines rules for which events are recorded as log entries, and what data the log entries should include
4. The Kubernetes Engine audit policy determines which entries are written to the Admin Activity log and which are written to the Data Access log
5. The Kubernetes API server follows a policy that is specified in the --audit-policy-file flag of the kube-apiserver command
6. When Kubernetes Engine starts the Kubernetes API server, it supplies an audit policy file by setting the value of the --audit-policy-file flag
7. The configure-helper.sh script in the open-source Kubernetes repository generates the audit policy file
8. When a person or component makes a request to the Kubernetes API server, the request goes through one or more stages
9. Each stage of a request generates an event, which is processed according to a policy
10. The policy specifies whether the event should be recorded as a log entry and if so, what data should be included in the log entry
11. The Kubernetes audit policy file contains a list of rules
12. In the policy file, the first rule that matches an event sets the audit level for the event
13. A rule can specify audit levels
Metadata
1. GKE uses instance metadata to configure node VMs, but some of this metadata is potentially sensitive and should be protected from workloads running on the cluster
2. Because each node's service account credentials will continue to be exposed to workloads, ensure that a service account is configured with the minimal permissions it needs
3. Attach service account to nodes, so that an attacker cannot circumvent GKE's metadata protections by using the Compute Engine API to access the node instances directly
4. Do not use a service account that has compute.instances.get permission, the Compute Instance Admin role, or other similar permissions, as they allow potential attackers to obtain instance metadata using the Compute Engine API
5. The best practice is to restrict the permissions of a node VM by using service account permissions, not access scopes
6. Legacy Compute Engine metadata endpoints are disabled by default on new clusters
7. GKE's metadata concealment protects some potentially sensitive system metadata from user workloads running on clusters
8. Enable metadata concealment to prevent user Pods from accessing certain VM metadata for your cluster's nodes, such as Kubelet credentials and VM instance information
9. Metadata concealment protects access to kube-env (which contains Kubelet credentials) and the VM's instance identity token
10. Metadata concealment firewalls traffic from user Pods (Pods not running on HostNetwork) to the cluster metadata server, only allowing safe queries
11. The firewall prevents user Pods from using Kubelet credentials for privilege escalation attacks, or from using VM identity for instance escalation attacks
12. Metadata concealment only protects access to kube-env and the node's instance identity token
13. Metadata concealment does not restrict access to the node's service account
14. Metadata concealment does not restrict access to other related instance metadata
15. Metadata concealment does not restrict access to other legacy metadata APIs