1. Characteristics
    1. Overview
      1. Backend service resource contain configuration for External HTTP(S), Internal HTTP(S), SSL Proxy, TCP Proxy and Internal TCP/UDP Load Balancing services
      2. Network Load Balancing does not use a backend service
      3. Load balancers use configuration information in the backend service resource to direct traffic to the correct backends
      4. Traffic is distributed according to a balancing mode defined for each backend
      5. Backend instance health is monitored using a health check designated in the backend service
      6. Backend service can be used to maintain session affinity
    2. Architecture
      1. Each backend service contains one or more backends
      2. All backends for a given service must either be instance groups or network endpoint groups
      3. Managed and unmanaged instance groups can be associated with the same backend service
      4. Instance groups and network endpoint groups cannot be associated with the same backend service
    3. Settings
      1. Load balancers use a hash algorithm to distribute requests among available instances
      2. The hash is based on the source IP address, destination IP address, source port, destination port, and protocol (a 5-tuple hash)
      3. Session affinity adjusts hash to send all requests from the same client to the same instance
      4. Backend service timeout is a request/response timeout for HTTP(S) load balancer, except for connections using Websocket protocol
      5. For WebSocket traffic, the backend service timeout is the maximum amount of time that a WebSocket, idle or active, can remain open
      6. For SSL proxy or TCP proxy load balancer, the backend service timeout an idle timeout for all traffic
      7. For internal TCP/UDP load balancer, backend service timeout parameter is ignored
      8. Health checker polls instances attached to the backend service at configured intervals
      9. Instances that pass the health check are allowed to receive new requests
      10. Unhealthy instances are not sent requests until they are healthy again
  2. Traffic distribution
    1. Overview
      1. Balancing mode dictates how the load balancing system determines when the backend is at full usage
      2. If all backends for the backend service in a region are at full usage, new requests are automatically routed to the nearest region that can still handle requests
      3. The balancing mode can be based on connections, backend utilization, or requests per second (rate)
      4. Capacity is an additional control that interacts with the balancing mode setting
      5. For instances to operate at a maximum of 80% backend utilization, set the balancing mode to backend utilization and capacity to 80%
      6. To cut instance utilization in half, leave the capacity at 80% backend utilization and set capacity scaler to 0.5
      7. To drain the backend service, set capacity scaler to 0 and leave the capacity as is
      8. If the average utilization of all instances in backend instance groups connected to the same backend service is less than 10%, GCP might prefer specific zones
      9. Zonal imbalance automatically resolves itself as more traffic is sent to the load balancer
      10. Traffic Director uses backend services whose load balancing scheme is INTERNAL_SELF_MANAGED
      11. For an internal self managed backend service, traffic distribution is accomplished by using a combination of a load balancing mode and a load balancing policy
      12. The backend service directs traffic to a backend instance group or NEG according to the backend's balancing mode
      13. Once a backend has been selected, Traffic Director distributes traffic according to a load balancing policy
    2. External IP addresses
      1. For HTTP(S), SSL Proxy and TCP Proxy load balancers, clients communicate with a Google Front End using the load balancer's external IP address
      2. The GFE communicates with backend VMs using the internal IP addresses of their primary network interface
      3. GFE is a proxy, so the backend VMs do not require external IP addresses
      4. Network load balancers route packets using bidirectional network address translation
      5. When backend VMs send replies to clients, they use external IP address of load balancer's forwarding rule as source IP address
      6. Backend VMs for an internal load balancer do not need external IP addresses.
  3. Backends
    1. Overview
      1. Multiple backends can be added to a single backend service
      2. Each backend is a resource to which a Google Cloud load balancer distributes traffic
      3. An instance group, a network endpoint group and a backend bucket can be backends
      4. An instance group can be a managed instance group with or without autoscaling or an unmanaged instance group
      5. A backend must be added and an instance group assigned to it
      6. The instance group must be created before adding it to the backend
      7. Different types of backends cannot be used with the same backend service
      8. Backends for internal TCP/UDP load balancers only support instance group backends
      9. If HTTP(S) load balancer has two or more backend services, instance groups can be used as backends for one backend service and NEGs as backends for the other backend service
    2. Protocol to the backends
      1. A backend service can only use one protocol.
      2. The available protocols are HTTP, HTTPS, HTTP/2, SSL, TCP and UDP
      3. Which protocol is valid depends on the type of load balancer, including its load balancing scheme
      4. HTTP/2 is available for load balancing with Ingress
    3. Autoscaled managed instance groups
      1. Autoscaled managed instance groups automatically add or remove instances based on need
      2. The autoscaling percentage works with the backend service balancing mode
      3. New instances have a cool down period before they are considered part of the group
      4. It is possible for traffic to exceed the backend service's backend utilization during that time
      5. Once the instances are available, new traffic will be routed to them
      6. If the number of instances reaches the maximum permitted by the autoscaler's settings, the autoscaler will stop adding instances no matter what the usage is
      7. In this case, extra traffic will be load balanced to the next available region
    4. Restrictions and guidance for instance groups
      1. Do not put a virtual machine instance in more than one instance group
      2. Do not delete an instance group if it is being used by a backend
      3. Do not add the same instance group to two different backends
      4. All instances in a managed or unmanaged instance group must be in the same VPC network and, if applicable, the same subnet
      5. If using a managed instance group with autoscaling, do not use the maxRate balancing mode in the backend service
      6. Use either the maxUtilization or maxRatePerInstance mode
      7. Do not make an autoscaled managed instance group the target of two different load balancers
      8. When resizing a managed instance group, the maximum size of the group should be smaller than or equal to the size of subnet
    5. Network endpoint groups
      1. Network endpoints represent services by their IP address and port, rather than referring to a particular VM
      2. The default port for the NEG is automatically used as the port of the IP address:port pair
      3. A network endpoint group (NEG) is a logical grouping of network endpoints
      4. A backend service that uses network endpoint groups as its backends distributes traffic among applications or containers running within VM instances
  4. Session affinity
    1. Session affinity
      1. Without session affinity, load balancers distribute new requests according to the balancing mode of the backend instance group or NEG
      2. Applications such as stateful servers or services with heavy internal caching, need multiple requests from a given user to be directed to the same instance
      3. Session affinity identifies TCP traffic from the same client based on parameters such as the client's IP address or the value of a cookie
      4. Session affinity directs requests to the same backend instance if the backend is healthy and has capacity (according to its balancing mode)
      5. Session affinity has little meaningful effect on UDP traffic, because a session for UDP is a single request and response
      6. Session affinity can break if the instance becomes unhealthy or overloaded
      7. For HTTP(S) Load Balancing, session affinity works best with the RATE balancing mode
    2. Using client IP affinity
      1. Client IP affinity directs requests from the same client IP address to the same backend instance based on a hash of the client's IP address
      2. Client IP affinity is an option for every Google Cloud load balancer that uses backend services
      3. The client IP address as seen by the load balancer might not be the originating client if it is behind NAT or makes requests through a proxy
      4. Requests made through NAT or a proxy use the IP address of the NAT router or proxy as the client IP address
      5. This can cause incoming traffic to be routed onto the same backend instances
      6. If a client moves from one network to another, its IP address changes, resulting in broken affinity
    3. Generated cookie affinity
      1. When generated cookie affinity is set, the load balancer issues a cookie on the first request
      2. For each subsequent request with the same cookie, the load balancer directs the request to the same backend VM or endpoint
      3. For external HTTP(S) load balancers, the cookie is named GCLB
      4. For internal HTTP(S) load balancers and Traffic Director, the cookie is named GCILB
      5. Cookie-based affinity can more accurately identify a client to a load balancer, compared to client IP-based affinity
    4. Losing session affinity
      1. A client can lose affinity with the instance if the instance group runs out of capacity, and traffic has to be routed to a different zone
      2. This can be mitigated by ensuring instance groups have enough capacity to handle all local users
      3. Autoscaling adds instances to, or removes instances from, the instance group
      4. The backend service reallocates load, and the target may move
      5. This can be mitigated by ensuring that the minimum number of instances provisioned by autoscaling is enough to handle expected load
      6. A client can also lose affinity with the instance if the target instance fails health checks
      7. Affinity can also be lost where the balancing mode is set to backend utilization
      8. This may cause computed capacities across zones to change, sending some traffic to another zone within the region
      9. This is more likely at low traffic when computed capacity is less stable
      10. If client routing is designed so that the first and subsequent requests in a connection egress from different geographical locations, session affinity may be lost
      11. Session affinity may be difficult to establish during periods of minimal traffic because of fluctuating computed capacities across zones
      12. If session affinity is required during periods of low traffic, configure rate-based balancing for backend services
      13. Also configure each backend service's set of instance groups so that each zone comprising the group has the same number of backend instances
      14. Such a configuration is more likely to result in stable capacity estimates and allow session affinity to be established
  5. Timeout setting
    1. For longer-lived connections to the backend service from the load balancer, configure a timeout setting longer than the 30-second default
  6. Named ports
    1. For internal HTTP(S), external HTTP(S), SSL Proxy, and TCP Proxy load balancers, backend services must have an associated named port if their backends are instance groups
    2. The named port informs the load balancer of the configured named port on the backend instance group, which translates that to a port number
    3. The load balancer uses the port to connect to the backend VMs
    4. The port can be different from the port that clients use to contact the load balancer itself
    5. Named ports are key-value pairs representing a service name and a port number on which a service is running
    6. The key-value pair is defined on an instance group
    7. When a backend service uses an instance group as a backend, it can "subscribe" to the named port
    8. Each backend service for an HTTP(S), SSL Proxy, or TCP Proxy load balancer using instance group backends can only "subscribe" to a single named port
    9. When you specify a named port for a backend service, all of the backend instance groups must have at least one named port defined that uses that same name
    10. Named ports cannot be used for NEG backends
    11. NEGs define ports per endpoint, and there's no named port key-value pair associated with a NEG
    12. Named ports cannot be used for internal TCP/UDP load balancers
    13. Internal TCP/UDP load balancers are pass-through load balancers (not proxies), their backend services do not support setting a named port
  7. Health checks
    1. Each backend service must have a Health Check associated with it
    2. The health check must exist before the backend service is created
    3. A health check runs continuously and its results help determine which instances are able to receive new requests
    4. Unhealthy instances do not receive new requests and continue to be polled
    5. If an unhealthy instance passes a health check, it is deemed healthy and begins receiving new connections