-
Autoscaling
-
Autoscaling
- Each Cloud Run revision is automatically scaled to the number of container instances needed to handle all incoming requests
- The number of instances scheduled is impacted by the amount of CPU needed to process a request, the concurrency setting and maximum number of container instances setting
- It may be necessary to limit the total number of container instances that can be started, for cost control reasons, or for better compatibility with other resources used by the service
- Cloud Run service might interact with a database that can only handle a certain number of concurrent open connections
- The maximum container instances setting can be used to limit the total number of instances that can be started in parallel
-
Maximum instances and traffic spikes
- A revision scales up by creating new instances to handle incoming traffic load
- When a maximum instances limit is set, there may be insufficient instances to meet traffic load
- Incoming requests queue for up to 60 seconds
- During the 60 second window, if an instance finishes processing requests, it becomes available to process queued requests
- If no instances become available during the 60 second window, the request fails with a 429 error code on Cloud Run (fully managed)
- The maximum instances limit is an upper limit
- Setting a high limit does not mean that a revision will scale up to the specified number of container instances
- Setting a high limit only means that the number of container instances at any point in time should not exceed the limit
- During rapid traffic surges, Cloud Run may, for a short period of time, create slightly more container instances than the specified max instances value
- If a service cannot tolerate a temporary increase in instances beyond the max instance value, a safety margin needs to be factored in and a lower max instances value set
-
Idle instances and minimizing cold starts
- Users are only billed when an instance is handling a request
- Cloud Run does not always immediately shut down instances once they have handled all requests
- To minimize the impact of cold starts, Cloud Run may keep some instances idle
- Idle instances are ready to handle requests in case of a sudden traffic spike
- An idle container instance may persist resources, such as open database connections
- Cloud Run (fully managed), the CPU will not be available for an idle instance
-
Deployments
- When a new revision is deployed, Cloud Run gradually migrates traffic from the old revision to the new one
- The maximum instances limits set for each revision may be temporarily exceeded during the period after deployment
-
Domains
- By default, Cloud Run for Anthos on Google Cloud uses example.com as the base domain, where the fully qualified domain name for a service is formatted as http://{route}.{namespace}.example.com
- This default domain doesn't work "out of the box" as a URL for incoming requests
- During development and testing, change the default domain to use a free wildcard DNS test sites. nip.io, sslip.io
- Alternatively change the default domain to a custom domain, and set up domain registrar records to support DNS wildcards
-
Platform
- Cloud Run provides the flexibility to run services on a fully managed environment or on Anthos
- Cloud Run can easily deploy into a Anthos GKE cluster
- Cloud users can easily switch from Cloud Run (fully managed) to Cloud Run for Anthos or vice versa; all without changing application code
- The Cloud Run (fully managed) platform allows you to deploy stateless containers without having to worry about the underlying infrastructure
- Workloads are automatically scaled up or down to zero depending on the traffic to the app
- Only pay when app is running, billed to the nearest 100 milliseconds
- Cloud Run for Anthos abstracts away complex Kubernetes concepts, allowing developers to easily leverage the benefits of Kubernetes and serverless together
- It provides access to custom machine types, additional networking support, and Cloud Accelerators
- Supports running workloads on-premises or on the Cloud
-
Concurrency
- Each revision is automatically scaled to the number of container instances needed to handle all incoming requests
- When more container instances are processing requests, more CPU and memory will be used, resulting in higher costs
- When new container instances need to be started, requests might take more time to be processed, decreasing the performances of your service
- Cloud Run provides a concurrency setting that specifies the maximum number of requests that can be processed simultaneously by a given container instance
- By default Cloud Run container instances can receive many requests at the same time (up to a maximum of 80)
- In comparison, Functions-as-a-Service (FaaS) solutions like Cloud Functions have a fixed concurrency of 1
- Concurrency setting can be changed any time
- The specified concurrency value is a maximum and Cloud Run might not send as many requests to a given container instance if the CPU of the instance is already highly utilized
- Uses can limit concurrency so that only one request at a time will be sent to each running container instance
- Consider limiting concurrency to 1 where each request uses most of the available CPU or memory
- Consider limiting concurrency to 1 where container image is not designed for handling multiple requests at the same time, for example, container relies on global state that two requests cannot share
- A concurrency of 1 is likely to negatively affect scaling performance, because many container instances will have to start up to handle a spike in incoming requests
-
Connecting
- Some Google Cloud services work well with Cloud Run and others are not-yet supported for the fully managed version of Cloud Run
- Cloud Run for Anthos on Google Cloud can use any service that Google Kubernetes Engine can use
- Cloud Run (fully managed) can be used with the supported Google Cloud services using the client libraries they provide
- No need to provide credentials manually inside Cloud Run (fully managed) container instances when using the Google Cloud client libraries
- Cloud Run (fully managed) uses a default runtime service account that has the Project > Editor role, which means it is able to call all Google Cloud APIs and have read and write access on all resources in Google Cloud project
- Users can restrict Cloud Run (fully managed) by assigning a service account with a minimal set of permissions to Cloud Run services
- If Cloud Run service is only reading data from Firestore, assign it a service account that only has the Firestore User IAM role
-
TLS
- Automatic TLS certificates feature provides TLS certificates and enables HTTPS connections using those TLS certificates
- Automatic TLS feature is turned off by default for Cloud Run for Anthos on Google Cloud
- Automatic TLS feature does not apply to Cloud Run (fully managed), which has the automatic TLS certificates ability built-in
- To use the automatic TLS feature, the service must be exposed externally
- Automatic TLS feature cannot be used for a cluster-local service
- Automatic TLS feature only works with Istio as automatically installed when the cluster is setup for Cloud Run
- Automatic TLS feature does not work with the Istio addon
- Automatic TLS feature uses LetsEncrypt
- LetsEncrypt has an initial quota limit of 50 TLS certificates per week per registered domain
-
Resource models
- The service is the main resource of Cloud Run
- Each service is located in a specific GCP region and cluster namespace
- Services are automatically replicated across multiple zones
- A GCP project can run many services in different regions or GKE clusters
- Each service exposes a unique endpoint
- Cloud Run automatically scales the underlying infrastructure to handle incoming requests
- Each deployment to a service creates a revision
- A revision consists of a specific container image, along with environment settings
- Environment settings include environment variables, memory limits, or concurrency value
- Revisions are immutable, once a revision has been created, it cannot be modified
- When a container image is deployed to a new Cloud Run service, the first revision is created
- Requests are automatically routed as soon as possible to the latest healthy service revision
- Each revision receiving requests is automatically scaled to the number of container instances needed to handle all these requests
- A container instance can receive many requests at the same time
- Concurrency setting can be used to set the maximum number of requests that can be sent in parallel to a given container instance