Kubernetes Engine Storage

Overview
1. Use Google Cloud Storage for object storage
2. Store private Docker container images in Artifact Registry
3. Use Filestore where application requires managed Network Attached Storage
4. For POSIX-compatible file storage, use a file server on Compute Engine
5. Where application requires block storage, use Persistent Disks
6. Persistent Disks can be manually provisioned, or dynamically by Kubernetes
7. Kubernetes storage abstractions provide filesystem and block-based storage to Pods
8. Kubernetes storage abstractions are not used with managed databases or Cloud Storage
9. Volumes are a storage unit accessible to containers in a Pod
10. Some Volume types are backed by ephemeral storage
11. Ephemeral storage types (emptyDir, configMap, and secrets) do not persist after the Pod ceases to exist
12. Ephemeral storage types are useful for storing configuration information and as scratch space for applications
13. Local ephemeral storage resources can be managed similarly to how CPU and memory resources are managed
14. Non Ephemeral storage volume types are backed by durable storage
15. Persistent Volumes are cluster resources that Pods can use for durable storage
16. PersistentVolumesClaims can be used to dynamically provision Persistent Volumes backed by Compute Engine persistent disks for use in clusters
17. PersistentVolumeClaims can be used to provision NFS backing storage
Volumes
1. On-disk files in a container are lost when the container crashes or stops for any reason
2. Files within a container are inaccessible to other containers running in the same Pod
3. Kubernetes Volume is a directory which is accessible to all of the containers in a Pod
4. The volume source declared in the Pod specification determines how the directory is created, the storage medium used, and the directory's initial contents
5. A Pod specifies what Volumes it contains and the path where containers mount the Volume
6. Ephemeral Volume types have the same lifetimes as their enclosing Pods
7. These Volumes are created when the Pod is created, and they persist through container restarts
8. When the Pod terminates or is deleted its Volumes go with it
9. Unlike ephemeral volumes, data in a Volume backed by durable storage is preserved when the Pod is removed
10. The volume is merely unmounted and the data can be handed off to another Pod
11. PersistentVolume resources should be used to manage the lifecycle of durable storage types, rather than directly specifying them
12. emptyDir is an ephemeral volume type that provides an empty directory that containers in the Pod can read and write from
13. When the Pod is removed from a node for any reason, the data in the emptyDir is deleted forever
14. emptyDirs are useful for scratch space and sharing data between multiple containers in a Pod
15. Set emptyDir.medium field to Memory on Linux node pools to tell Kubernetes to mount a tmpfs (RAM- backed filesystem)
16. configMap is used to make configuration data accessible to applications
17. Files in a configMap Volume are specified by a ConfigMap resource
18. Secret is used to make sensitive data, such as passwords, OAuth tokens, and SSH keys available to applications
19. DownwardAPI is used to make Downward API data available to applications
20. This data includes information about the Pod and container in which an application is running in
21. A Pod can be configured to expose a DownwardAPI Volume File to applications that includes the Pod's namespace and IP address
22. With persistentVolumeClaim, cluster operators can provision durable storage to be used by applications
23. A Pod uses a PersistentVolumeClaim to mount a Volume that is backed by this durable storage
PV & PD
1. PersistentVolume resources are used to manage durable storage in a cluster
2. On GKE, PersistentVolumes are typically backed by Compute Engine persistent disks
3. PersistentVolumes can also be used with other storage types like NFS
4. Unlike Volumes, the PersistentVolumes lifecycle is managed by Kubernetes
5. PersistentVolumes can be dynamically provisioned; the user does not have to manually create and delete the backing storage
6. PersistentVolumes are cluster resources that exist independently of Pods
7. This means that the disk and data represented by a PersistentVolume continue to exist as the cluster changes and as Pods are deleted and recreated
8. PersistentVolume resources can be provisioned dynamically through PersistentVolumeClaims, or they can be explicitly created by a cluster administrator
9. A PersistentVolumeClaim is a request for and claim to a PersistentVolume resource
10. PersistentVolumeClaim objects request a specific size, access mode, and StorageClass for the PersistentVolume
11. If a PersistentVolume that satisfies the request exists or can be provisioned, the PersistentVolumeClaim is bound to that PersistentVolume
12. Pods use claims as Volumes
13. The cluster inspects the claim to find the bound Volume and mounts that Volume for the Pod
14. Portability is another advantage of using PersistentVolumes and PersistentVolumeClaims
15. The same Pod specification can be used across different clusters and environments because PersistentVolume is an interface to the actual backing storage
16. Volume implementations such as gcePersistentDisk are configured through StorageClass resources
17. GKE creates a default StorageClass which uses the standard persistent disk type (ext4)
18. The default StorageClass is used when a PersistentVolumeClaim doesn't specify a StorageClassName
19. The provided default StorageClass can be replaced
20. If using a cluster with Windows node pools, the StorageClassName must be provided since the default StorageClass is not supported with Windows
21. Users can create StorageClass resources to describe different classes of storage
22. Classes might map to quality-of-service levels, or to backup policies
23. This concept is sometimes called "profiles" in other storage systems
24. Most of the time, there is no need to directly configure PersistentVolume objects or create Compute Engine persistent disks
25. Kubernetes automatically provisions a persistent disk when a PersistentVolumeClaim is configured
26. Kubernetes dynamically creates a corresponding PersistentVolume object
27. Assuming the GKE default storage class has not been replaced, this PersistentVolume is backed by a new, empty Compute Engine persistent disk
28. The disk is used in a Pod by using the claim as a volume
29. When the PVC is deleted, the corresponding PersistentVolume object as well as the provisioned Compute Engine persistent disk are also deleted
30. To prevent deletion of dynamically provisioned persistent disks, set the reclaim policy of the PersistentVolume resource, or its StorageClass resource, to Retain
31. The user is charged for the persistent disk for as long as it exists even if there is no PersistentVolumeClaim consuming it
32. ReadWriteOnce: The Volume can be mounted as read-write by a single node
33. ReadOnlyMany: The Volume can be mounted read-only by many nodes
34. ReadWriteMany: The Volume can be mounted as read-write by many nodes
35. PersistentVolumes that are backed by Compute Engine persistent disks don't support the ReadWriteMany access mode
36. ReadWriteOnce is the most common use case for Persistent Disks and works as the default access mode for most applications
37. Compute Engine Persistent Disks also support ReadOnlyMany mode so that many applications or many replicas of the same application can consume the same disk at the same time
38. An example use case for ReadOnlyMany mode is serving static content across multiple replicas
39. You can't attach Persistent Disks in write mode on multiple nodes at the same time
40. Dynamically provisioned PersistentVolumes are empty when they are created
41. An existing Compute Engine persistent disk populated with data can be introduce into a cluster by manually creating a corresponding PersistentVolume resource
42. The persistent disk must be in the same zone as the cluster nodes
43. Persistent Volume Claims or Volume Claim Templates can be used in higher level controllers such as Deployments or StatefulSets respectively
44. Deployments are designed for stateless applications and therefore all replicas of a Deployment share the same Persistent Volume Claim
45. Since the replica Pods created will be identical to each other, only Volumes with modes ReadOnlyMany or ReadWriteMany can work in this setting
46. Even Deployments with one replica using a ReadWriteOnce Volume are not recommended
47. This is because the default Deployment strategy will create a second Pod before bringing down the first pod on a recreate
48. The Deployment may fail in deadlock as the second Pod can't start because the ReadWriteOnce Volume is already in use, and the first Pod wont be removed because the second Pod has not yet started
49. Instead, use a StatefulSet with ReadWriteOnce volumes
50. StatefulSets are the recommended method of deploying stateful applications that require a unique volume per replica
51. By using StatefulSets with Persistent Volume Claim Templates, applications can scale up automatically with unique Persistent Volume Claims associated to each replica Pod
52. Regional persistent disks replicate data between two zones in the same region, and can be used similarly to regular persistent disks
53. In the event of a zonal outage or if cluster nodes in one zone become unschedulable, Kubernetes can failover workloads using the volume to the other zone
54. Regional persistent disks can be used to build highly available solutions for stateful workloads on GKE
55. Users must ensure that both the primary and failover zones are configured with enough resource capacity to run the workload
56. Regional SSD persistent disks are an option for applications such as databases that require both high availability and high performance
57. As with regular persistent disks, regional persistent disks can be dynamically provisioned as needed or manually provisioned in advance by the cluster administrator