Cloud Run

Autoscaling
1. Autoscaling
  1. Each Cloud Run revision is automatically scaled to the number of container instances needed to handle all incoming requests
  2. The number of instances scheduled is impacted by the amount of CPU needed to process a request, the concurrency setting and maximum number of container instances setting
  3. It may be necessary to limit the total number of container instances that can be started, for cost control reasons, or for better compatibility with other resources used by the service
  4. Cloud Run service might interact with a database that can only handle a certain number of concurrent open connections
  5. The maximum container instances setting can be used to limit the total number of instances that can be started in parallel
2. Maximum instances and traffic spikes
  1. A revision scales up by creating new instances to handle incoming traffic load
  2. When a maximum instances limit is set, there may be insufficient instances to meet traffic load
  3. Incoming requests queue for up to 60 seconds
  4. During the 60 second window, if an instance finishes processing requests, it becomes available to process queued requests
  5. If no instances become available during the 60 second window, the request fails with a 429 error code on Cloud Run (fully managed)
  6. The maximum instances limit is an upper limit
  7. Setting a high limit does not mean that a revision will scale up to the specified number of container instances
  8. Setting a high limit only means that the number of container instances at any point in time should not exceed the limit
  9. During rapid traffic surges, Cloud Run may, for a short period of time, create slightly more container instances than the specified max instances value
  10. If a service cannot tolerate a temporary increase in instances beyond the max instance value, a safety margin needs to be factored in and a lower max instances value set
3. Idle instances and minimizing cold starts
  1. Users are only billed when an instance is handling a request
  2. Cloud Run does not always immediately shut down instances once they have handled all requests
  3. To minimize the impact of cold starts, Cloud Run may keep some instances idle
  4. Idle instances are ready to handle requests in case of a sudden traffic spike
  5. An idle container instance may persist resources, such as open database connections
  6. Cloud Run (fully managed), the CPU will not be available for an idle instance
4. Deployments
  1. When a new revision is deployed, Cloud Run gradually migrates traffic from the old revision to the new one
  2. The maximum instances limits set for each revision may be temporarily exceeded during the period after deployment
Domains
1. By default, Cloud Run for Anthos on Google Cloud uses example.com as the base domain, where the fully qualified domain name for a service is formatted as http://{route}.{namespace}.example.com
2. This default domain doesn't work "out of the box" as a URL for incoming requests
3. During development and testing, change the default domain to use a free wildcard DNS test sites. nip.io, sslip.io
4. Alternatively change the default domain to a custom domain, and set up domain registrar records to support DNS wildcards
Platform
1. Cloud Run provides the flexibility to run services on a fully managed environment or on Anthos
2. Cloud Run can easily deploy into a Anthos GKE cluster
3. Cloud users can easily switch from Cloud Run (fully managed) to Cloud Run for Anthos or vice versa; all without changing application code
4. The Cloud Run (fully managed) platform allows you to deploy stateless containers without having to worry about the underlying infrastructure
5. Workloads are automatically scaled up or down to zero depending on the traffic to the app
6. Only pay when app is running, billed to the nearest 100 milliseconds
7. Cloud Run for Anthos abstracts away complex Kubernetes concepts, allowing developers to easily leverage the benefits of Kubernetes and serverless together
8. It provides access to custom machine types, additional networking support, and Cloud Accelerators
9. Supports running workloads on-premises or on the Cloud
Concurrency
1. Each revision is automatically scaled to the number of container instances needed to handle all incoming requests
2. When more container instances are processing requests, more CPU and memory will be used, resulting in higher costs
3. When new container instances need to be started, requests might take more time to be processed, decreasing the performances of your service
4. Cloud Run provides a concurrency setting that specifies the maximum number of requests that can be processed simultaneously by a given container instance
5. By default Cloud Run container instances can receive many requests at the same time (up to a maximum of 80)
6. In comparison, Functions-as-a-Service (FaaS) solutions like Cloud Functions have a fixed concurrency of 1
7. Concurrency setting can be changed any time
8. The specified concurrency value is a maximum and Cloud Run might not send as many requests to a given container instance if the CPU of the instance is already highly utilized
9. Uses can limit concurrency so that only one request at a time will be sent to each running container instance
10. Consider limiting concurrency to 1 where each request uses most of the available CPU or memory
11. Consider limiting concurrency to 1 where container image is not designed for handling multiple requests at the same time, for example, container relies on global state that two requests cannot share
12. A concurrency of 1 is likely to negatively affect scaling performance, because many container instances will have to start up to handle a spike in incoming requests
Connecting
1. Some Google Cloud services work well with Cloud Run and others are not-yet supported for the fully managed version of Cloud Run
2. Cloud Run for Anthos on Google Cloud can use any service that Google Kubernetes Engine can use
3. Cloud Run (fully managed) can be used with the supported Google Cloud services using the client libraries they provide
4. No need to provide credentials manually inside Cloud Run (fully managed) container instances when using the Google Cloud client libraries
5. Cloud Run (fully managed) uses a default runtime service account that has the Project > Editor role, which means it is able to call all Google Cloud APIs and have read and write access on all resources in Google Cloud project
6. Users can restrict Cloud Run (fully managed) by assigning a service account with a minimal set of permissions to Cloud Run services
7. If Cloud Run service is only reading data from Firestore, assign it a service account that only has the Firestore User IAM role
TLS
1. Automatic TLS certificates feature provides TLS certificates and enables HTTPS connections using those TLS certificates
2. Automatic TLS feature is turned off by default for Cloud Run for Anthos on Google Cloud
3. Automatic TLS feature does not apply to Cloud Run (fully managed), which has the automatic TLS certificates ability built-in
4. To use the automatic TLS feature, the service must be exposed externally
5. Automatic TLS feature cannot be used for a cluster-local service
6. Automatic TLS feature only works with Istio as automatically installed when the cluster is setup for Cloud Run
7. Automatic TLS feature does not work with the Istio addon
8. Automatic TLS feature uses LetsEncrypt
9. LetsEncrypt has an initial quota limit of 50 TLS certificates per week per registered domain
Resource models
1. The service is the main resource of Cloud Run
2. Each service is located in a specific GCP region and cluster namespace
3. Services are automatically replicated across multiple zones
4. A GCP project can run many services in different regions or GKE clusters
5. Each service exposes a unique endpoint
6. Cloud Run automatically scales the underlying infrastructure to handle incoming requests
7. Each deployment to a service creates a revision
8. A revision consists of a specific container image, along with environment settings
9. Environment settings include environment variables, memory limits, or concurrency value
10. Revisions are immutable, once a revision has been created, it cannot be modified
11. When a container image is deployed to a new Cloud Run service, the first revision is created
12. Requests are automatically routed as soon as possible to the latest healthy service revision
13. Each revision receiving requests is automatically scaled to the number of container instances needed to handle all these requests
14. A container instance can receive many requests at the same time
15. Concurrency setting can be used to set the maximum number of requests that can be sent in parallel to a given container instance