1. Overview
    1. Google Cloud external TCP/UDP Network Load Balancing (Network Load Balancing) is a regional, non-proxied load balancer
    2. Network Load Balancing distributes traffic among virtual machine (VM) instances in the same region in a Virtual Private Cloud (VPC) network
    3. A network load balancer directs TCP or UDP traffic across regional backends
    4. Use Network Load Balancing to load balance UDP, TCP, and SSL traffic on ports that are not supported by the TCP proxy load balancers and SSL proxy load balancers
  2. Characteristics
    1. Network Load Balancing is a managed service
    2. Network Load Balancing is implemented by using Andromeda virtual networking and Google Maglev
    3. The network load balancers are not proxies
    4. Responses from the backend VMs go directly to the clients, not back through the load balancer
    5. The industry term for this is direct server return
    6. The load balancer preserves the source IP addresses of packets
    7. The destination IP address for packets is the regional external IP address associated with the load balancer's forwarding rule
    8. Instances that participate as backend VMs for network load balancers must be running the appropriate Linux guest environment, Windows guest environment, or other processes that provide equivalent functionality
    9. The guest OS environment (or an equivalent process) is responsible for configuring local routes on each backend VM
    10. These routes allow the VM to accept packets that have a destination that matches the IP address of the load balancer's forwarding rule
    11. On the backend instances that accept load-balanced traffic, configure the software to bind to the IP address associated with the load balancer's forwarding rule (or to any IP address, 0.0.0.0/0)
  3. Protocols, scheme, and scope
    1. Each network load balancer supports either TCP or UDP traffic (not both)
    2. A network load balancer uses a target pool to contain the backend instances among which traffic is load balanced
    3. A network load balancer balances traffic originating from the internet
    4. You cannot use it to load balance traffic that originates within Google Cloud between instances
    5. The scope of a network load balancer is regional, not global
    6. A network load balancer cannot span multiple regions
    7. Within a single region, the load balancer services all zones
    8. Use Network Load Balancing to balance UDP traffic, or to load balance a TCP port that isn't supported by other load balancers
    9. Load balance UDP traffic, or to load balance a TCP port that isn't supported by other load balancers
    10. It is acceptable to have SSL traffic decrypted by backends instead of by the load balancer, as the network load balancer cannot perform this task
    11. When the backends decrypt SSL traffic, there is a greater CPU burden on the VMs
    12. Self-managing the load balancer's SSL certificates is acceptable
    13. Google-managed SSL certificates are only available for HTTP(S) Load Balancing and SSL Proxy Load Balancing
    14. To forward the original packets unproxied
    15. For an existing setup that uses a pass-through load balancer, and to migrate it without changes
  4. Architecture
    1. The network load balancers balance the load on systems based on incoming IP protocol data, such as address, port, and protocol type
    2. The network load balancer is a pass-through load balancer, so backends receive the original client request
    3. The network load balancer doesn't do any Transport Layer Security (TLS) offloading or proxying
    4. Traffic is directly routed to VMs
    5. When a forwarding rule is created for the load balancer, an ephemeral virtual IP address (VIP) is received or a VIP that originates from a regional network block needs to be reserved
    6. The forwarding rule is associated with the backends
    7. The VIP is anycasted from Google's global points of presence, but the backends for a network load balancer are regional
    8. The load balancer cannot have backends that span multiple regions
    9. Google Cloud firewalls can be used to control or filter access to the backend VMs
    10. The network load balancer examines the source and destination ports, IP address, and protocol to determine how to forward packets
    11. For TCP traffic, modify the forwarding behavior of the load balancer by configuring session affinity
  5. Load distribution algorithm
    1. By default, to distribute traffic to instances, the session affinity value is set to NONE
    2. Cloud Load Balancing picks an instance based on a hash of the source IP and port, destination IP and port, and protocol
    3. Incoming TCP connections are spread across instances, and each new connection may go to a different instance
    4. All packets for a connection are directed to the same instance until the connection is closed
    5. Established connections are not taken into account in the load balancing process
    6. Regardless of the session affinity setting, all packets for a connection are directed to the chosen instance until the connection is closed
    7. An existing connection has no impact on load balancing decisions for new incoming connections
    8. This can result in an imbalance among backends if long-lived TCP connections are in use
    9. A different session affinity setting can be chosen where multiple connections from a client need to go to the same instance
  6. Target pools
    1. A target pool resource defines a group of instances that should receive incoming traffic from forwarding rules
    2. When a forwarding rule directs traffic to a target pool, Cloud Load Balancing picks an instance from these target pools based on a hash of the source IP and port and the destination IP and port
    3. Target pools can only be used with forwarding rules that handle TCP and UDP traffic
    4. For all other protocols, create a target instance
    5. Create a target pool before it can be used with a forwarding rule.
    6. Each project can have up to 50 target pools
    7. For a target pool with a single VM instance, consider using the protocol forwarding feature instead
    8. Network Load Balancing supports Cloud Load Balancing Autoscaler, which allows users to perform autoscaling on the instance groups in a target pool based on backend utilization
  7. Forwarding rules
    1. Forwarding rules work in conjunction with target pools to support load balancing
    2. To use load balancing, create a forwarding rule that directs traffic to specific target pools
    3. It is not possible to load balance traffic without a forwarding rule
    4. Each forwarding rule matches a particular IP address, protocol, and optionally, port range to a single target pool
    5. When traffic is sent to an external IP address that is served by a forwarding rule, the forwarding rule directs that traffic to the corresponding target pool
  8. Multiple forwarding rules
    1. Users can configure multiple regional external forwarding rules for the same external TCP/UDP network load balancer
    2. Optionally, each forwarding rule can have a different regional external IP address, or multiple forwarding rules can have the same regional external IP address
    3. Configuring multiple regional external forwarding rules can be useful for configuring more than one external IP address for the same target pool
    4. To configure different port ranges or different protocols by using the same external IP address for the same target pool
    5. When using multiple forwarding rules, configure the software running on backend VMs so that it binds to all necessary IP addresses
    6. This is required because the destination IP address for packets delivered through the load balancer is the regional external IP address associated with the respective regional external forwarding rule
  9. Health checks
    1. Health checks ensure that Compute Engine forwards new connections only to instances that are up and ready to receive them
    2. Compute Engine sends health check requests to each instance at the specified frequency
    3. After an instance exceeds its allowed number of health check failures, it is no longer considered an eligible instance for receiving new traffic
    4. Existing connections are not actively terminated, which allows instances to shut down gracefully and close TCP connections
    5. The health checker continues to query unhealthy instances, and returns an instance to the pool when the specified number of successful checks occur
    6. If all instances are marked as UNHEALTHY, the load balancer directs new traffic to all existing instances
    7. Network Load Balancing relies on legacy HTTP health checks to determine instance health
    8. Even if the service does not use HTTP, a basic web server must be run on each instance that the health check system can query
  10. Return path
    1. Google Cloud uses special routes not defined in the VPC network for health checks
  11. Firewall rules
    1. Health checks for network load balancers are sent from specific IP ranges
    2. Create ingress allow firewall rules that permit traffic from those ranges
    3. In addition to the IP ranges for health check probes, backends might also receive health check traffic from their metadata servers, 169.254.169.254.
    4. Each backend VM can receive packets from its metadata server because that is always allowed traffic
    5. Network Load Balancing is a pass-through load balancer, which means that firewall rules must allow traffic from the client source IP addresses
    6. If service is open to the internet, it is easiest to allow traffic from all IP ranges
    7. To restrict access so that only certain source IP addresses are allowed, set up firewall rules to enforce that restriction, but allow access from the health check IP ranges
  12. Session affinity
    1. Network Load Balancing doesn't use backend services session affinity
    2. Instead, network load balancers use target pools for session affinity
    3. Load balancing and fragmented UDP packets
    4. Unfragmented packets are handled normally in all configurations
    5. UDP packets may become fragmented before reaching Google Cloud
    6. Intervening networks may wait for all fragments to arrive before forwarding them, causing delay, or may drop fragments
    7. Google Cloud does not wait for all fragments; it forwards each fragment as soon as it arrives
    8. Because subsequent UDP fragments do not contain the destination port, problems can occur if the target pool's session affinity is set to NONE (5-tuple affinity)
    9. The subsequent fragments may be dropped because the load balancer cannot calculate the 5-tuple hash
    10. If there is more than one UDP forwarding rule for the same load-balanced IP address, subsequent fragments may arrive at the wrong forwarding rule.
    11. For fragmented UDP packets, set session affinity to CLIENT_IP_PROTO or CLIENT_IP.
    12. Do not use NONE (5-tuple hashing).
    13. Because CLIENT_IP_PROTO and CLIENT_IP do not use the destination port for hashing
    14. They can calculate the same hash for subsequent fragments as for the first fragment.
    15. Use only one UDP forwarding rule per load-balanced IP address.
    16. This ensures that all fragments arrive at the same forwarding rule.
    17. With these settings, UDP fragments from the same packet are forwarded to the same instance for reassembly.