-
Characteristics
-
Overview
- Backend service resource contain configuration for External HTTP(S), Internal HTTP(S), SSL Proxy, TCP Proxy and Internal TCP/UDP Load Balancing services
- Network Load Balancing does not use a backend service
- Load balancers use configuration information in the backend service resource to direct traffic to the correct backends
- Traffic is distributed according to a balancing mode defined for each backend
- Backend instance health is monitored using a health check designated in the backend service
- Backend service can be used to maintain session affinity
-
Architecture
- Each backend service contains one or more backends
- All backends for a given service must either be instance groups or network endpoint groups
- Managed and unmanaged instance groups can be associated with the same backend service
- Instance groups and network endpoint groups cannot be associated with the same backend service
-
Settings
- Load balancers use a hash algorithm to distribute requests among available instances
- The hash is based on the source IP address, destination IP address, source port, destination port, and protocol (a 5-tuple hash)
- Session affinity adjusts hash to send all requests from the same client to the same instance
- Backend service timeout is a request/response timeout for HTTP(S) load balancer, except for connections using Websocket protocol
- For WebSocket traffic, the backend service timeout is the maximum amount of time that a WebSocket, idle or active, can remain open
- For SSL proxy or TCP proxy load balancer, the backend service timeout an idle timeout for all traffic
- For internal TCP/UDP load balancer, backend service timeout parameter is ignored
- Health checker polls instances attached to the backend service at configured intervals
- Instances that pass the health check are allowed to receive new requests
- Unhealthy instances are not sent requests until they are healthy again
-
Traffic distribution
-
Overview
- Balancing mode dictates how the load balancing system determines when the backend is at full usage
- If all backends for the backend service in a region are at full usage, new requests are automatically routed to the nearest region that can still handle requests
- The balancing mode can be based on connections, backend utilization, or requests per second (rate)
- Capacity is an additional control that interacts with the balancing mode setting
- For instances to operate at a maximum of 80% backend utilization, set the balancing mode to backend utilization and capacity to 80%
- To cut instance utilization in half, leave the capacity at 80% backend utilization and set capacity scaler to 0.5
- To drain the backend service, set capacity scaler to 0 and leave the capacity as is
- If the average utilization of all instances in backend instance groups connected to the same backend service is less than 10%, GCP might prefer specific zones
- Zonal imbalance automatically resolves itself as more traffic is sent to the load balancer
- Traffic Director uses backend services whose load balancing scheme is INTERNAL_SELF_MANAGED
- For an internal self managed backend service, traffic distribution is accomplished by using a combination of a load balancing mode and a load balancing policy
- The backend service directs traffic to a backend instance group or NEG according to the backend's balancing mode
- Once a backend has been selected, Traffic Director distributes traffic according to a load balancing policy
-
External IP addresses
- For HTTP(S), SSL Proxy and TCP Proxy load balancers, clients communicate with a Google Front End using the load balancer's external IP address
- The GFE communicates with backend VMs using the internal IP addresses of their primary network interface
- GFE is a proxy, so the backend VMs do not require external IP addresses
- Network load balancers route packets using bidirectional network address translation
- When backend VMs send replies to clients, they use external IP address of load balancer's forwarding rule as source IP address
- Backend VMs for an internal load balancer do not need external IP addresses.
-
Backends
-
Overview
- Multiple backends can be added to a single backend service
- Each backend is a resource to which a Google Cloud load balancer distributes traffic
- An instance group, a network endpoint group and a backend bucket can be backends
- An instance group can be a managed instance group with or without autoscaling or an unmanaged instance group
- A backend must be added and an instance group assigned to it
- The instance group must be created before adding it to the backend
- Different types of backends cannot be used with the same backend service
- Backends for internal TCP/UDP load balancers only support instance group backends
- If HTTP(S) load balancer has two or more backend services, instance groups can be used as backends for one backend service and NEGs as backends for the other backend service
-
Protocol to the backends
- A backend service can only use one protocol.
- The available protocols are HTTP, HTTPS, HTTP/2, SSL, TCP and UDP
- Which protocol is valid depends on the type of load balancer, including its load balancing scheme
- HTTP/2 is available for load balancing with Ingress
-
Autoscaled managed instance groups
- Autoscaled managed instance groups automatically add or remove instances based on need
- The autoscaling percentage works with the backend service balancing mode
- New instances have a cool down period before they are considered part of the group
- It is possible for traffic to exceed the backend service's backend utilization during that time
- Once the instances are available, new traffic will be routed to them
- If the number of instances reaches the maximum permitted by the autoscaler's settings, the autoscaler will stop adding instances no matter what the usage is
- In this case, extra traffic will be load balanced to the next available region
-
Restrictions and guidance for instance groups
- Do not put a virtual machine instance in more than one instance group
- Do not delete an instance group if it is being used by a backend
- Do not add the same instance group to two different backends
- All instances in a managed or unmanaged instance group must be in the same VPC network and, if applicable, the same subnet
- If using a managed instance group with autoscaling, do not use the maxRate balancing mode in the backend service
- Use either the maxUtilization or maxRatePerInstance mode
- Do not make an autoscaled managed instance group the target of two different load balancers
- When resizing a managed instance group, the maximum size of the group should be smaller than or equal to the size of subnet
-
Network endpoint groups
- Network endpoints represent services by their IP address and port, rather than referring to a particular VM
- The default port for the NEG is automatically used as the port of the IP address:port pair
- A network endpoint group (NEG) is a logical grouping of network endpoints
- A backend service that uses network endpoint groups as its backends distributes traffic among applications or containers running within VM instances
-
Session affinity
-
Session affinity
- Without session affinity, load balancers distribute new requests according to the balancing mode of the backend instance group or NEG
- Applications such as stateful servers or services with heavy internal caching, need multiple requests from a given user to be directed to the same instance
- Session affinity identifies TCP traffic from the same client based on parameters such as the client's IP address or the value of a cookie
- Session affinity directs requests to the same backend instance if the backend is healthy and has capacity (according to its balancing mode)
- Session affinity has little meaningful effect on UDP traffic, because a session for UDP is a single request and response
- Session affinity can break if the instance becomes unhealthy or overloaded
- For HTTP(S) Load Balancing, session affinity works best with the RATE balancing mode
-
Using client IP affinity
- Client IP affinity directs requests from the same client IP address to the same backend instance based on a hash of the client's IP address
- Client IP affinity is an option for every Google Cloud load balancer that uses backend services
- The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy
- Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address
- This can cause incoming traffic to be routed onto the same backend instances
- If a client moves from one network to another, its IP address changes, resulting in broken affinity
-
Generated cookie affinity
- When generated cookie affinity is set, the load balancer issues a cookie on the first request
- For each subsequent request with the same cookie, the load balancer directs the request to the same backend VM or endpoint
- For external HTTP(S) load balancers, the cookie is named GCLB
- For internal HTTP(S) load balancers and Traffic Director, the cookie is named GCILB
- Cookie-based affinity can more accurately identify a client to a load balancer, compared to client IP-based affinity
-
Losing session affinity
- A client can lose affinity with the instance if the instance group runs out of capacity, and traffic has to be routed to a different zone
- This can be mitigated by ensuring instance groups have enough capacity to handle all local users
- Autoscaling adds instances to, or removes instances from, the instance group
- The backend service reallocates load, and the target may move
- This can be mitigated by ensuring that the minimum number of instances provisioned by autoscaling is enough to handle expected load
- A client can also lose affinity with the instance if the target instance fails health checks
- Affinity can also be lost where the balancing mode is set to backend utilization
- This may cause computed capacities across zones to change, sending some traffic to another zone within the region
- This is more likely at low traffic when computed capacity is less stable
- If client routing is designed so that the first and subsequent requests in a connection egress from different geographical locations, session affinity may be lost
- Session affinity may be difficult to establish during periods of minimal traffic because of fluctuating computed capacities across zones
- If session affinity is required during periods of low traffic, configure rate-based balancing for backend services
- Also configure each backend service's set of instance groups so that each zone comprising the group has the same number of backend instances
- Such a configuration is more likely to result in stable capacity estimates and allow session affinity to be established
-
Timeout setting
- For longer-lived connections to the backend service from the load balancer, configure a timeout setting longer than the 30-second default
-
Named ports
- For internal HTTP(S), external HTTP(S), SSL Proxy, and TCP Proxy load balancers, backend services must have an associated named port if their backends are instance groups
- The named port informs the load balancer of the configured named port on the backend instance group, which translates that to a port number
- The load balancer uses the port to connect to the backend VMs
- The port can be different from the port that clients use to contact the load balancer itself
- Named ports are key-value pairs representing a service name and a port number on which a service is running
- The key-value pair is defined on an instance group
- When a backend service uses an instance group as a backend, it can "subscribe" to the named port
- Each backend service for an HTTP(S), SSL Proxy, or TCP Proxy load balancer using instance group backends can only "subscribe" to a single named port
- When you specify a named port for a backend service, all of the backend instance groups must have at least one named port defined that uses that same name
- Named ports cannot be used for NEG backends
- NEGs define ports per endpoint, and there's no named port key-value pair associated with a NEG
- Named ports cannot be used for internal TCP/UDP load balancers
- Internal TCP/UDP load balancers are pass-through load balancers (not proxies), their backend services do not support setting a named port
-
Health checks
- Each backend service must have a Health Check associated with it
- The health check must exist before the backend service is created
- A health check runs continuously and its results help determine which instances are able to receive new requests
- Unhealthy instances do not receive new requests and continue to be polled
- If an unhealthy instance passes a health check, it is deemed healthy and begins receiving new connections