Load Balancing

Benefits
1. Google Cloud offers server-side load balancing to distribute incoming traffic across multiple virtual machine (VM) instances
2. Load balancing scales applications and supports heavy traffic
3. Detects and automatically removes unhealthy VM instances using health checks
4. Instances that become healthy again are automatically re-added
5. Routes traffic to the closest virtual machine
6. Uses forwarding rule resources to match certain types of traffic and forward it to a load balancer
7. Google Cloud load balancing is a managed service, with redundant, highly available components
8. If a load balancing component fails, it is restarted or replaced automatically and immediately
9. Google Cloud offers several different types of load balancing that differ in capabilities, usage scenarios, and configuration
Autoscaling
1. Compute Engine offers autoscaling to automatically add or remove VM instances from an instance group based on increases or decreases in load
2. Autoscaling enables apps to gracefully handle increases in traffic, and reduces cost when the need for resources is lower
3. After you define the autoscaling policy, the autoscaler performs automatic scaling based on the measured load
Policies
1. At least one autoscaling policy must be specified when an autoscaler is created
2. Autoscaling policy can be based on CPU utilization, load balancing serving capacity, or Cloud Monitoring metrics
3. If multiple policies are used, the autoscaler scales an instance group based on the policy that provides the largest number of VM instances in the group
4. CPU utilization is the most basic form of autoscaling
5. Autoscaling policy tells the autoscaler to watch the average CPU utilization of a group of VM instances and add or remove instances from the group to maintain desired utilization
6. Autoscaling policy is useful for configurations that are CPU intensive but might fluctuate in CPU usage
7. When an autoscaler is setup to scale based on load balancing serving capacity, the autoscaler watches the serving capacity of an instance group and scales when the VM instances are over or under capacity
8. The serving capacity of an instance can be defined in the load balancer's backend service and can be based on either utilization or requests per second
9. Autoscaling can be setup to collect data of a specific operations metric and perform scaling based on a desired utilization level
10. It is possible to scale based on standard metrics provided by Monitoring or by using any custom metrics