Backend service resource contain configuration for External HTTP(S), Internal HTTP(S), SSL Proxy, TCP Proxy and Internal TCP/UDP Load Balancing services
Network Load Balancing does not use a backend service
Load balancers use configuration information in the backend service resource to direct traffic to the correct backends
Traffic is distributed according to a balancing mode defined for each backend
Backend instance health is monitored using a health check designated in the backend service
Backend service can be used to maintain session affinity
Architecture
Each backend service contains one or more backends
All backends for a given service must either be instance groups or network endpoint groups
Managed and unmanaged instance groups can be associated with the same backend service
Instance groups and network endpoint groups cannot be associated with the same backend service
Settings
Load balancers use a hash algorithm to distribute requests among available instances
The hash is based on the source IP address, destination IP address, source port, destination port, and protocol (a 5-tuple hash)
Session affinity adjusts hash to send all requests from the same client to the same instance
Backend service timeout is a request/response timeout for HTTP(S) load balancer, except for connections using Websocket protocol
For WebSocket traffic, the backend service timeout is the maximum amount of time that a WebSocket, idle or active, can remain open
For SSL proxy or TCP proxy load balancer, the backend service timeout an idle timeout for all traffic
For internal TCP/UDP load balancer, backend service timeout parameter is ignored
Health checker polls instances attached to the backend service at configured intervals
Instances that pass the health check are allowed to receive new requests
Unhealthy instances are not sent requests until they are healthy again
Traffic distribution
Overview
Balancing mode dictates how the load balancing system determines when the backend is at full usage
If all backends for the backend service in a region are at full usage, new requests are automatically routed to the nearest region that can still handle requests
The balancing mode can be based on connections, backend utilization, or requests per second (rate)
Capacity is an additional control that interacts with the balancing mode setting
For instances to operate at a maximum of 80% backend utilization, set the balancing mode to backend utilization and capacity to 80%
To cut instance utilization in half, leave the capacity at 80% backend utilization and set capacity scaler to 0.5
To drain the backend service, set capacity scaler to 0 and leave the capacity as is
If the average utilization of all instances in backend instance groups connected to the same backend service is less than 10%, GCP might prefer specific zones
Zonal imbalance automatically resolves itself as more traffic is sent to the load balancer
Traffic Director uses backend services whose load balancing scheme is INTERNAL_SELF_MANAGED
For an internal self managed backend service, traffic distribution is accomplished by using a combination of a load balancing mode and a load balancing policy
The backend service directs traffic to a backend instance group or NEG according to the backend's balancing mode
Once a backend has been selected, Traffic Director distributes traffic according to a load balancing policy
External IP addresses
For HTTP(S), SSL Proxy and TCP Proxy load balancers, clients communicate with a Google Front End using the load balancer's external IP address
The GFE communicates with backend VMs using the internal IP addresses of their primary network interface
GFE is a proxy, so the backend VMs do not require external IP addresses
Network load balancers route packets using bidirectional network address translation
When backend VMs send replies to clients, they use external IP address of load balancer's forwarding rule as source IP address
Backend VMs for an internal load balancer do not need external IP addresses.
Backends
Overview
Multiple backends can be added to a single backend service
Each backend is a resource to which a Google Cloud load balancer distributes traffic
An instance group, a network endpoint group and a backend bucket can be backends
An instance group can be a managed instance group with or without autoscaling or an unmanaged instance group
A backend must be added and an instance group assigned to it
The instance group must be created before adding it to the backend
Different types of backends cannot be used with the same backend service
Backends for internal TCP/UDP load balancers only support instance group backends
If HTTP(S) load balancer has two or more backend services, instance groups can be used as backends for one backend service and NEGs as backends for the other backend service
Protocol to the backends
A backend service can only use one protocol.
The available protocols are HTTP, HTTPS, HTTP/2, SSL, TCP and UDP
Which protocol is valid depends on the type of load balancer, including its load balancing scheme
HTTP/2 is available for load balancing with Ingress
Autoscaled managed instance groups
Autoscaled managed instance groups automatically add or remove instances based on need
The autoscaling percentage works with the backend service balancing mode
New instances have a cool down period before they are considered part of the group
It is possible for traffic to exceed the backend service's backend utilization during that time
Once the instances are available, new traffic will be routed to them
If the number of instances reaches the maximum permitted by the autoscaler's settings, the autoscaler will stop adding instances no matter what the usage is
In this case, extra traffic will be load balanced to the next available region
Restrictions and guidance for instance groups
Do not put a virtual machine instance in more than one instance group
Do not delete an instance group if it is being used by a backend
Do not add the same instance group to two different backends
All instances in a managed or unmanaged instance group must be in the same VPC network and, if applicable, the same subnet
If using a managed instance group with autoscaling, do not use the maxRate balancing mode in the backend service
Use either the maxUtilization or maxRatePerInstance mode
Do not make an autoscaled managed instance group the target of two different load balancers
When resizing a managed instance group, the maximum size of the group should be smaller than or equal to the size of subnet
Network endpoint groups
Network endpoints represent services by their IP address and port, rather than referring to a particular VM
The default port for the NEG is automatically used as the port of the IP address:port pair
A network endpoint group (NEG) is a logical grouping of network endpoints
A backend service that uses network endpoint groups as its backends distributes traffic among applications or containers running within VM instances
Session affinity
Session affinity
Without session affinity, load balancers distribute new requests according to the balancing mode of the backend instance group or NEG
Applications such as stateful servers or services with heavy internal caching, need multiple requests from a given user to be directed to the same instance
Session affinity identifies TCP traffic from the same client based on parameters such as the client's IP address or the value of a cookie
Session affinity directs requests to the same backend instance if the backend is healthy and has capacity (according to its balancing mode)
Session affinity has little meaningful effect on UDP traffic, because a session for UDP is a single request and response
Session affinity can break if the instance becomes unhealthy or overloaded
For HTTP(S) Load Balancing, session affinity works best with the RATE balancing mode
Using client IP affinity
Client IP affinity directs requests from the same client IP address to the same backend instance based on a hash of the client's IP address
Client IP affinity is an option for every Google Cloud load balancer that uses backend services
The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy
Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address
This can cause incoming traffic to be routed onto the same backend instances
If a client moves from one network to another, its IP address changes, resulting in broken affinity
Generated cookie affinity
When generated cookie affinity is set, the load balancer issues a cookie on the first request
For each subsequent request with the same cookie, the load balancer directs the request to the same backend VM or endpoint
For external HTTP(S) load balancers, the cookie is named GCLB
For internal HTTP(S) load balancers and Traffic Director, the cookie is named GCILB
Cookie-based affinity can more accurately identify a client to a load balancer, compared to client IP-based affinity
Losing session affinity
A client can lose affinity with the instance if the instance group runs out of capacity, and traffic has to be routed to a different zone
This can be mitigated by ensuring instance groups have enough capacity to handle all local users
Autoscaling adds instances to, or removes instances from, the instance group
The backend service reallocates load, and the target may move
This can be mitigated by ensuring that the minimum number of instances provisioned by autoscaling is enough to handle expected load
A client can also lose affinity with the instance if the target instance fails health checks
Affinity can also be lost where the balancing mode is set to backend utilization
This may cause computed capacities across zones to change, sending some traffic to another zone within the region
This is more likely at low traffic when computed capacity is less stable
If client routing is designed so that the first and subsequent requests in a connection egress from different geographical locations, session affinity may be lost
Session affinity may be difficult to establish during periods of minimal traffic because of fluctuating computed capacities across zones
If session affinity is required during periods of low traffic, configure rate-based balancing for backend services
Also configure each backend service's set of instance groups so that each zone comprising the group has the same number of backend instances
Such a configuration is more likely to result in stable capacity estimates and allow session affinity to be established
Timeout setting
For longer-lived connections to the backend service from the load balancer, configure a timeout setting longer than the 30-second default
Named ports
For internal HTTP(S), external HTTP(S), SSL Proxy, and TCP Proxy load balancers, backend services must have an associated named port if their backends are instance groups
The named port informs the load balancer of the configured named port on the backend instance group, which translates that to a port number
The load balancer uses the port to connect to the backend VMs
The port can be different from the port that clients use to contact the load balancer itself
Named ports are key-value pairs representing a service name and a port number on which a service is running
The key-value pair is defined on an instance group
When a backend service uses an instance group as a backend, it can "subscribe" to the named port
Each backend service for an HTTP(S), SSL Proxy, or TCP Proxy load balancer using instance group backends can only "subscribe" to a single named port
When you specify a named port for a backend service, all of the backend instance groups must have at least one named port defined that uses that same name
Named ports cannot be used for NEG backends
NEGs define ports per endpoint, and there's no named port key-value pair associated with a NEG
Named ports cannot be used for internal TCP/UDP load balancers
Internal TCP/UDP load balancers are pass-through load balancers (not proxies), their backend services do not support setting a named port
Health checks
Each backend service must have a Health Check associated with it
The health check must exist before the backend service is created
A health check runs continuously and its results help determine which instances are able to receive new requests
Unhealthy instances do not receive new requests and continue to be polled
If an unhealthy instance passes a health check, it is deemed healthy and begins receiving new connections