-
Overview
- Use Google Cloud Storage for object storage
- Store private Docker container images in Artifact Registry
- Use Filestore where application requires managed Network Attached Storage
- For POSIX-compatible file storage, use a file server on Compute Engine
- Where application requires block storage, use Persistent Disks
- Persistent Disks can be manually provisioned, or dynamically by Kubernetes
- Kubernetes storage abstractions provide filesystem and block-based storage to Pods
- Kubernetes storage abstractions are not used with managed databases or Cloud Storage
- Volumes are a storage unit accessible to containers in a Pod
- Some Volume types are backed by ephemeral storage
- Ephemeral storage types (emptyDir, configMap, and secrets) do not persist after the Pod ceases to exist
- Ephemeral storage types are useful for storing configuration information and as scratch space for applications
- Local ephemeral storage resources can be managed similarly to how CPU and memory resources are managed
- Non Ephemeral storage volume types are backed by durable storage
- Persistent Volumes are cluster resources that Pods can use for durable storage
- PersistentVolumesClaims can be used to dynamically provision Persistent Volumes backed by Compute Engine persistent disks for use in clusters
- PersistentVolumeClaims can be used to provision NFS backing storage
-
Volumes
- On-disk files in a container are lost when the container crashes or stops for any reason
- Files within a container are inaccessible to other containers running in the same Pod
- Kubernetes Volume is a directory which is accessible to all of the containers in a Pod
- The volume source declared in the Pod specification determines how the directory is created, the storage medium used, and the directory's initial contents
- A Pod specifies what Volumes it contains and the path where containers mount the Volume
- Ephemeral Volume types have the same lifetimes as their enclosing Pods
- These Volumes are created when the Pod is created, and they persist through container restarts
- When the Pod terminates or is deleted its Volumes go with it
- Unlike ephemeral volumes, data in a Volume backed by durable storage is preserved when the Pod is removed
- The volume is merely unmounted and the data can be handed off to another Pod
- PersistentVolume resources should be used to manage the lifecycle of durable storage types, rather than directly specifying them
- emptyDir is an ephemeral volume type that provides an empty directory that containers in the Pod can read and write from
- When the Pod is removed from a node for any reason, the data in the emptyDir is deleted forever
- emptyDirs are useful for scratch space and sharing data between multiple containers in a Pod
- Set emptyDir.medium field to Memory on Linux node pools to tell Kubernetes to mount a tmpfs (RAM- backed filesystem)
- configMap is used to make configuration data accessible to applications
- Files in a configMap Volume are specified by a ConfigMap resource
- Secret is used to make sensitive data, such as passwords, OAuth tokens, and SSH keys available to applications
- DownwardAPI is used to make Downward API data available to applications
- This data includes information about the Pod and container in which an application is running in
- A Pod can be configured to expose a DownwardAPI Volume File to applications that includes the Pod's namespace and IP address
- With persistentVolumeClaim, cluster operators can provision durable storage to be used by applications
- A Pod uses a PersistentVolumeClaim to mount a Volume that is backed by this durable storage
-
PV & PD
- PersistentVolume resources are used to manage durable storage in a cluster
- On GKE, PersistentVolumes are typically backed by Compute Engine persistent disks
- PersistentVolumes can also be used with other storage types like NFS
- Unlike Volumes, the PersistentVolumes lifecycle is managed by Kubernetes
- PersistentVolumes can be dynamically provisioned; the user does not have to manually create and delete the backing storage
- PersistentVolumes are cluster resources that exist independently of Pods
- This means that the disk and data represented by a PersistentVolume continue to exist as the cluster changes and as Pods are deleted and recreated
- PersistentVolume resources can be provisioned dynamically through PersistentVolumeClaims, or they can be explicitly created by a cluster administrator
- A PersistentVolumeClaim is a request for and claim to a PersistentVolume resource
- PersistentVolumeClaim objects request a specific size, access mode, and StorageClass for the PersistentVolume
- If a PersistentVolume that satisfies the request exists or can be provisioned, the PersistentVolumeClaim is bound to that PersistentVolume
- Pods use claims as Volumes
- The cluster inspects the claim to find the bound Volume and mounts that Volume for the Pod
- Portability is another advantage of using PersistentVolumes and PersistentVolumeClaims
- The same Pod specification can be used across different clusters and environments because PersistentVolume is an interface to the actual backing storage
- Volume implementations such as gcePersistentDisk are configured through StorageClass resources
- GKE creates a default StorageClass which uses the standard persistent disk type (ext4)
- The default StorageClass is used when a PersistentVolumeClaim doesn't specify a StorageClassName
- The provided default StorageClass can be replaced
- If using a cluster with Windows node pools, the StorageClassName must be provided since the default StorageClass is not supported with Windows
- Users can create StorageClass resources to describe different classes of storage
- Classes might map to quality-of-service levels, or to backup policies
- This concept is sometimes called "profiles" in other storage systems
- Most of the time, there is no need to directly configure PersistentVolume objects or create Compute Engine persistent disks
- Kubernetes automatically provisions a persistent disk when a PersistentVolumeClaim is configured
- Kubernetes dynamically creates a corresponding PersistentVolume object
- Assuming the GKE default storage class has not been replaced, this PersistentVolume is backed by a new, empty Compute Engine persistent disk
- The disk is used in a Pod by using the claim as a volume
- When the PVC is deleted, the corresponding PersistentVolume object as well as the provisioned Compute Engine persistent disk are also deleted
- To prevent deletion of dynamically provisioned persistent disks, set the reclaim policy of the PersistentVolume resource, or its StorageClass resource, to Retain
- The user is charged for the persistent disk for as long as it exists even if there is no PersistentVolumeClaim consuming it
- ReadWriteOnce: The Volume can be mounted as read-write by a single node
- ReadOnlyMany: The Volume can be mounted read-only by many nodes
- ReadWriteMany: The Volume can be mounted as read-write by many nodes
- PersistentVolumes that are backed by Compute Engine persistent disks don't support the ReadWriteMany access mode
- ReadWriteOnce is the most common use case for Persistent Disks and works as the default access mode for most applications
- Compute Engine Persistent Disks also support ReadOnlyMany mode so that many applications or many replicas of the same application can consume the same disk at the same time
- An example use case for ReadOnlyMany mode is serving static content across multiple replicas
- You can't attach Persistent Disks in write mode on multiple nodes at the same time
- Dynamically provisioned PersistentVolumes are empty when they are created
- An existing Compute Engine persistent disk populated with data can be introduce into a cluster by manually creating a corresponding PersistentVolume resource
- The persistent disk must be in the same zone as the cluster nodes
- Persistent Volume Claims or Volume Claim Templates can be used in higher level controllers such as Deployments or StatefulSets respectively
- Deployments are designed for stateless applications and therefore all replicas of a Deployment share the same Persistent Volume Claim
- Since the replica Pods created will be identical to each other, only Volumes with modes ReadOnlyMany or ReadWriteMany can work in this setting
- Even Deployments with one replica using a ReadWriteOnce Volume are not recommended
- This is because the default Deployment strategy will create a second Pod before bringing down the first pod on a recreate
- The Deployment may fail in deadlock as the second Pod can't start because the ReadWriteOnce Volume is already in use, and the first Pod wont be removed because the second Pod has not yet started
- Instead, use a StatefulSet with ReadWriteOnce volumes
- StatefulSets are the recommended method of deploying stateful applications that require a unique volume per replica
- By using StatefulSets with Persistent Volume Claim Templates, applications can scale up automatically with unique Persistent Volume Claims associated to each replica Pod
- Regional persistent disks replicate data between two zones in the same region, and can be used similarly to regular persistent disks
- In the event of a zonal outage or if cluster nodes in one zone become unschedulable, Kubernetes can failover workloads using the volume to the other zone
- Regional persistent disks can be used to build highly available solutions for stateful workloads on GKE
- Users must ensure that both the primary and failover zones are configured with enough resource capacity to run the workload
- Regional SSD persistent disks are an option for applications such as databases that require both high availability and high performance
- As with regular persistent disks, regional persistent disks can be dynamically provisioned as needed or manually provisioned in advance by the cluster administrator