-
Service
- The idea of a Service is to group a set of Pod endpoints into a single resource
- By default, a Service has a stable cluster IP address that clients inside the cluster can use to contact Pods in the Service
- A client sends a request to the stable IP address, and the request is routed to one of the Pods in the Service
- A Service identifies its member Pods with a selector
- For a Pod to be a member of the Service, the Pod must have all of the labels specified in the selector
- A label is an arbitrary key/value pair that is attached to an object
- In a Kubernetes cluster, each Pod has an internal IP address
- Pods in a Deployment come and go, and their IP addresses change
- It doesn't make sense to use Pod IP addresses directly
- A Service provides a stable IP address that lasts for the life of the Service, even as the IP addresses of the member Pods change
- A Service provides load balancing
- Clients call a single, stable IP address, and their requests are balanced across the Pods that are members of the Service
- ClusterIP service type (default) enables internal clients to send requests to a stable internal IP address
- NodePort service type enables clients to send requests to the IP address of a node on one or more nodePort values that are specified by the Service
- LoadBalancer service type enable clients to send requests to the IP address of a network load balancer
- ExternalName service type enables internal clients to use the DNS name of a Service as an alias for an external DNS name
- A headless service provides a Pod grouping, without a stable IP address
- The NodePort type is an extension of the ClusterIP type
- A Service of type NodePort has a cluster IP address
- The LoadBalancer type is an extension of the NodePort type
- A Service of type LoadBalancer has a cluster IP address and one or more nodePort values
- When a Service of type ClusterIP is created, Kubernetes creates a stable IP address that is accessible from nodes in the cluster
- Clients in the cluster call the Service using the cluster IP address, and the TCP port specified in the port field of the Service manifest
- The request is forwarded to one of the member Pods on the TCP port specified in the targetPort field
- When a Service of type NodePort is created, Kubernetes allocates a nodePort value
- The Service is accessible using the IP address of any node along with the nodePort value
- External clients call the Service using the external IP address of a node along with the TCP port specified by nodePort
- The request is forwarded to one of the member Pods on the TCP port specified by the targetPort field
- The NodePort Service type is an extension of the ClusterIP Service type
- Internal clients can call a Service using clusterIP and port, or a node's internal IP address and nodePort
- HTTP(S) load balancer is a proxy server, and is fundamentally different from the network load balancer
- A nodePort value in the 30000--32767 range can be specified
- It is best to omit the nodePort field and let Kubernetes allocate a nodePort that avoids collisions between Services
- When a Service of type LoadBalancer is created, a Google Cloud controller wakes up and configures a network load balancer.
- The load balancer has a stable IP address that is accessible from outside the project
- A network load balancer is not a proxy server
- A network load balancer forwards packets with no change to the source and destination IP addresses
- After a Service is created, kubectl get service -o yaml can be used to view its specification and see the stable external IP address
- External clients call the Service by using the load balancer's IP address and the TCP port specified by port
- The request is forwarded to one of the member Pods on the TCP port specified by targetPort
- The LoadBalancer Service type is an extension of the NodePort type, which is an extension of the ClusterIP type
- A LoadBalancer Service that provisions a network load balancer is given an external IP address and a load balancer
- Some load balancing resources incur charges
- A Service of type ExternalName provides an internal alias for an external DNS name
- Internal clients make requests using the internal DNS name, and the requests are redirected to the external name
- When a Service is created, Kubernetes creates a DNS name that internal clients can use to call the Service
- An example DNS name is my-xn-service.default.svc.cluster.local
- When an internal client makes a request to myservice.default.svc.cluster.local, the request gets redirected to an external domain, e.g. example.com
- The ExternalName Service type is fundamentally different from the other Service types
- A Service of type ExternalName is not associated with a set of Pods, and it does not have a stable IP address
- A Service of type ExternalName is a mapping from an internal DNS name to an external DNS name
- A Service is an abstraction in the sense that it is not a process that listens on some network interface
- Part of the Service abstraction is implemented in the iptables rules of the cluster nodes
- Depending on the type of the Service, other parts of the abstraction are implemented by Network Load Balancing or HTTP(S) load balancing
- The value of the port field in a Service manifest is arbitrary, but the value of targetPort is not arbitrary
- Each member Pod must have a container listening on targetPort
- The ports field of a Service is an array of ServicePort objects
- Where there are more than one ServicePort, each ServicePort must have a unique name
- When a Service is created, Kubernetes creates an Endpoints object that has the same name as the Service
- Kubernetes uses the Endpoints object to keep track of which Pods are members of the Service
-
StatefulSet
- StatefulSets represent a set of Pods with unique, persistent identities and stable hostnames that is maintained regardless of where they are scheduled
- The state information and other resilient data for any given StatefulSet Pod is maintained in persistent disk storage associated with the StatefulSet
- StatefulSets use an ordinal index for the identity and ordering of their Pods
- By default, StatefulSet Pods are deployed in sequential order and are terminated in reverse ordinal order
- A StatefulSet named web has its Pods named web-0, web-1, and web-2.
- When the web Pod specification is changed, its Pods are gracefully stopped and recreated in an ordered way; for example, web-2 is terminated first, then web-1, and so on
- podManagementPolicy: Parallel field can be used to have a StatefulSet launch or terminate all of its Pods in parallel, rather than waiting for Pods to become Running and Ready or to be terminated prior to launching or terminating another Pod
- StatefulSets use a Pod template, which contains a specification for its Pods
- Pod specification determines how each Pod should look: what applications should run inside its containers, which volumes it should mount, its labels and selectors, and more
- StatefulSets are designed to deploy stateful applications and clustered applications that save data to persistent storage, such as Google Compute Engine persistent disks
- StatefulSets are suitable for deploying Kafka, MySQL, Redis, ZooKeeper, and other applications needing unique, persistent identities and stable hostnames
- StatefulSet can be created using kubectl apply
- StatefulSet ensures that the desired number of Pods are running and available at all times
- StatefulSet automatically replaces Pods that fail or are evicted from their nodes, and automatically associates new Pods with the storage resources, resource requests and limits, and other configurations defined in the StatefulSet's Pod specification
- To help prevent data loss, PersistentVolumes and PersistentVolumeClaims are not deleted when a StatefulSet is deleted
- They must be manually deleted using kubectl delete pv and kubectl delete pvc
- StatefulSets can be updated by making changes to its Pod specification, which includes its container images and volumes
- StatefulSet object resource requests and limits, labels, and annotations can be updated using kubectl, the Kubernetes API, or the GKE Workloads menu in Google Cloud Console
- To decide how to handle updates, StatefulSets use a update strategy defined in spec: updateStrategy
- OnDelete strategy does not automatically delete and recreate Pods when the object's configuration is changed.
- Must manually delete the old Pods to cause the controller to create updated Pods
- RollingUpdate automatically deletes and recreates Pods when the object's configuration is changed
- New Pods must be in Running and Ready states before their predecessors are deleted
- Changing the Pod specification automatically triggers a rollout
- This is the default update strategy for StatefulSets
- StatefulSets update Pods in reverse ordinal order
- Update rollouts can be monitored by running the command kubectl rollout status statefulset [STATEFULSET_NAME]
- Rolling updates can be partitioned
- Partitioning is useful for staging an update, rolling out a canary, or performing a phased roll out
- When an update is partitioned, all Pods with an ordinal greater than or equal to the partition value are updated when the StatefulSet’s Pod specification is updated
- Pods with an ordinal less than the partition value are not updated and, even if they are deleted
- They are recreated using the previous version of the specification
- If the partition value is greater than the number of replicas, the updates are not propagated to the Pods
-
DaemonSet
- DaemonSets manage groups of replicated Pods
- DaemonSets attempt to adhere to a one-Pod-per-node model, either across the entire cluster or a subset of nodes
- As nodes are added to a node pool, DaemonSets automatically add Pods to the new nodes as needed
- DaemonSets use a Pod template, which contains a specification for its Pods
- The Pod specification determines how each Pod should look: what applications should run inside its containers, which volumes it should mount, its labels and selectors, and more
- DaemonSets are useful for deploying ongoing background tasks needed to run on all or certain nodes, and which do not require user intervention
- Examples of such tasks include storage daemons like ceph, log collection daemons like fluentd, and node monitoring daemons like collectd
- A DaemonSets can be configured for each type of daemon to run on all nodes, or multiple DaemonSets for a single type of daemon can use different configurations for different hardware types and resource needs
-
Pod
- Pods are the smallest, most basic deployable objects in Kubernetes
- A Pod represents a single instance of a running process in a cluster
- Pods contain one or more containers, such as Docker containers
- When a Pod runs multiple containers, the containers are managed as a single entity and share the Pod's resources
- Generally, running multiple containers in a single Pod is an advanced use case
- Pods contain shared networking and storage resources for their containers
- Pods are automatically assigned unique IP addresses
- Pod containers share the same network namespace, including IP address and network port
- Containers in a Pod communicate with each other inside the Pod on localhost
- Pods can specify a set of shared storage volumes that can be shared among the containers
- A Pod is a self-contained, isolated "logical host" that contains the systemic needs of the application it serves
- A Pod is meant to run a single instance of an application on a cluster
- It is not recommended to create individual Pods directly
- Generally, a set of identical Pods, called replicas are created to run an application
- A replicated set of Pods are created and managed by a controller, such as a Deployment
- Controllers manage the lifecycle of their constituent Pods and can also perform horizontal scaling, changing the number of Pods as necessary
- Although users might occasionally interact with Pods directly to debug, troubleshoot, or inspect them, it is highly recommended to use a controller to manage Pods
- Pods run on nodes in a cluster
- Once created, a Pod remains on its node until its process is complete, the Pod is deleted, the Pod is evicted from the node due to lack of resources, or the node fails
- If a node fails, Pods on the node are automatically scheduled for deletion
- Pods are ephemeral
- Pods are not designed to run forever, and when a Pod is terminated it cannot be brought back
- Pods do not disappear until they are deleted by a user or by a controller
- Pods do not "heal" or repair themselves
- If a Pod is scheduled on a node which later fails, the Pod is deleted
- If a Pod is evicted from a node for any reason, the Pod does not replace itself
- Each Pod has a PodStatus API object, which is represented by a Pod's status field
- Pods publish their phase to the status: phase field
- The phase of a Pod is a high-level summary of the Pod in its current state
- Pending: Pod has been created and accepted by the cluster, but one or more of its containers are not yet running
- Pending phase includes time spent being scheduled on a node and downloading images
- Running: Pod has been bound to a node, and all of the containers have been created
- In Running state, at least one container is running, is in the process of starting, or is restarting
- Succeeded: All containers in the Pod have terminated successfully
- Terminated Pods do not restart
- Failed: All containers in the Pod have terminated, and at least one container has terminated in failure
- A container "fails" if it exits with a non-zero status
- Unknown: The state of the Pod cannot be determined
- PodStatus contains an array called PodConditions, which is represented in the Pod manifest as conditions
- The field has a type and status field
- Conditions indicates more specifically the conditions within the Pod that are causing its current status
- The type field can contain PodScheduled, Ready, Initialized, and Unschedulable
- The status field corresponds with the type field, and can contain True, False, or Unknown
- Run kubectl get pod [POD_NAME] -o yaml to view the Pod's entire manifest, including the phase and conditions fields
- Because Pods are ephemeral, it is not necessary to create Pods directly
- Because Pods cannot repair or replace themselves, it is not recommended to create Pods directly
- Use a controller, such as a Deployment, which creates and manages Pods
- Controllers are useful for rolling out updates, such as changing the version of an application running in a container, because the controller manages the whole update process for you
- When a Pod starts running, it requests an amount of CPU and memory
- Requesting memory and CPU helps Kubernetes schedule the Pod onto an appropriate node to run the workload
- A Pod will not be scheduled onto a node that doesn't have the resources to honor the Pod's request
- A request is the minimum amount of CPU or memory that Kubernetes guarantees to a Pod
- Pod requests differ from and work in conjunction with Pod limits
- Configure the CPU and memory requests for a Pod, based on the application's needs
- Requests can be specified for individual containers running in the Pod
- The default request for CPU is 100m
- The default request for CPU is too small for many applications
- There is no default request for memory
- A Pod with no default memory request could be scheduled onto a node without enough memory to run the Pod's workloads
- Setting too small a value for CPU or memory requests could cause too many Pods or a suboptimal combination of Pods to be scheduled onto a given node and reduce performance
- Setting too large a value for CPU or memory requests could cause the Pod to be unschedulable and increase the cost of the cluster's resources
- In addition to, or instead of, setting a Pod's resources, users can specify resources for individual containers running in the Pod
- Where resources for the containers are only specified, the Pod's requests are the sum of the requests specified for the containers
- Where resources for the pods and containers are specified, the sum of requests for all containers must not exceed the Pod requests
- It is strongly recommended the requests for Pods are configured based on the requirements of the actual workloads
- By default, a Pod has no upper bound on the maximum amount of CPU or memory it can use on a node
- Limits can be set to control the amount of CPU or memory a Pod can use on a node
- A limit is the maximum amount of CPU or memory that Kubernetes guarantees to a Pod
- In addition to, or instead of, setting a Pod's limits, limits for individual containers running in the Pod can be specified
- Where only limits for the containers are specified, the Pod's limits are the sum of the limits specified for the containers
- Each container can only access resources up to its limit
- Where limits are specified on containers only, a limit must be specified for each container
- The sum of limits for all containers must not exceed the Pod limit
- A limit must always be greater than or equal to a request for the same type of resource
- If an attempt is made to set the limit below the request, the Pod's container cannot run and an error is logged
- Limits are not taken into consideration when scheduling Pods, but can prevent resource contention among Pods on the same node, and can prevent a Pod from causing system instability on the node by starving the underlying operating system of resources
- Pod limits differ from and work in conjunction with Pod requests
- It is strongly recommended that users configure limits for Pods, based on the requirements of the actual workloads
- Controller objects, such as Deployments and StatefulSets, contain a Pod template field
- Pod templates contain a Pod specification which determines how each Pod should run, including which containers should be run within the Pods and which volumes the Pods should mount
- Controller objects use Pod templates to create Pods and to manage their "desired state" within a cluster
- When a Pod template is changed, all future Pods reflect the new template, but all existing Pods do not
- By default, Pods run on nodes in the default node pool for the cluster
- The node pool a Pod selects explicitly or implicitly can be configured
- A Pod can be explicitly forced to be deployed to a specific node pool by setting a nodeSelector in the Pod manifest
- Setting a nodeSelector forces a Pod to run only on Nodes in that node pool
- Resource requests can be specified for the containers
- Pod will only run on nodes that satisfy the resource requests
- If the Pod definition includes a container that requires four CPUs, the Service will not select Pods running on Nodes with two CPUs
- The simplest and most common Pod pattern is a single container per pod, where the single container represents an entire application
- Pods with multiple containers are primarily used to support colocated, co-managed applications that share resources
- Colocated containers might form a single cohesive unit of service—one container serving files from a shared volume while another container refreshes or updates those files
- Pod wrap containers and storage resources together as a single manageable entity
- Each Pod is meant to run a single instance of a given application
- Replicated Pods are created and managed as a group by a controller, such as a Deployment
- Pods terminate gracefully when their processes are complete
- Kubernetes imposes a default graceful termination period of 30 seconds
- When deleting a Pod you can override the grace period to the number of seconds to wait for the Pod to terminate before forcibly terminating it
-
Deployment
- Deployments represent a set of multiple, identical Pods with no unique identities
- A Deployment runs multiple replicas of an application and automatically replaces any instances that fail or become unresponsive
- Deployments help ensure that one or more instances of an application are available to serve user requests
- Deployments are managed by the Kubernetes Deployment controller
- Deployments use a Pod template, which contains a specification for its Pods
- The Pod specification determines what applications should run inside its containers, which volumes the Pods should mount, its labels, and more
- When a Deployment's Pod template is changed, new Pods are automatically created one at a time
- Deployments are well-suited for stateless applications that use ReadOnlyMany or ReadWriteMany volumes mounted on multiple replicas, but are not well-suited for workloads that use ReadWriteOnce volumes
- For stateful applications using ReadWriteOnce volumes, use StatefulSets
- StatefulSets are designed to deploy stateful applications and clustered applications that save data to persistent storage, such as Compute Engine persistent disks
- StatefulSets are suitable for deploying Kafka, MySQL, Redis, ZooKeeper, and other applications needing unique, persistent identities and stable hostnames
- You can create a Deployment using the kubectl run, kubectl apply, or kubectl create commands
- Once created, the Deployment ensures that the desired number of Pods are running and available at all times
- The Deployment automatically replaces Pods that fail or are evicted from their nodes
- A Deployment can be updated by making changes to the Deployment's Pod template specification
- Making changes to the specification field automatically triggers an update rollout
- You can use kubectl, the Kubernetes API, or the GKE Workloads menu in Google Cloud Console
- By default, when a Deployment triggers an update, the Deployment stops the Pods, gradually scales down the number of Pods to zero, then drains and terminates the Pods
- Then, the Deployment uses the updated Pod template to bring up new Pods
- Old Pods are not removed until a sufficient number of new Pods are Running, and new Pods are not created until a sufficient number of old Pods have been removed
- To see in which order Pods are brought up and are removed, you can run kubectl describe deployments
- Deployments can ensure that at least one less than the desired number of replicas are running, with at most one Pod being unavailable
- Deployments can ensure that at most one more than the desired number of replicas are running, with at most one more Pod than desired running
- An updated can be roll back using the kubectl rollout undo command
- The kubectl rollout pause can be used to temporarily halt a Deployment
- Deployments can be in one of three states during its lifecycle: progressing, completed, or failed
- A progressing state indicates that the Deployment is in process of performing its tasks, like bringing up or scaling its Pods
- A completed state indicates that the Deployment has successfully completed its tasks, all of its Pods are running with the latest specification and are available, and no old Pods are still running
- A failed state indicates that the Deployment has encountered one or more issues that prevent it from completing its tasks
- Some causes include insufficient quotas or permissions, image pull errors, limit ranges, or runtime errors
- To investigate what causes a Deployment to fail, you can run kubectl get deployment [DEPLOYMENT+NAME] -o yaml and examine the messages in the status: conditions field
- Deployment's progress can be monitored or its progress checked using the kubectl rollout status command
-
Batch
- Batch on GKE (Batch) is a cloud-native solution for scheduling and managing batch workloads
- Batch can leverage the on-demand and flexible nature of cloud
- Batch is based on Kubernetes and containers so jobs are portable
- A BatchJob describe the work to be done and the compute resources, data, etc. needed to do it
- BatchQueues are the central organising construct in Batch
- A Queue specifies BatchJobConstraint, the jobs and policies acceptable in a given Queue
- A single BatchJobConstraint resource may be assigned to multiple Queues
- A Queue can specify a BatchBudget that defines how many resources a Queue may use or how much it can spend at a time or over a number of days
- Several Queues may consume the same budget
- A Queue may also specify a BatchPriority
- Admins may define multiple different priority levels in the system
- A Queue can have one priority level, but the same BatchPriority resource can be attached to multiple Queues