- Integrated monitoring, logging, and trace managed services for applications and systems running on Google Cloud and beyond.
-
Cloud Monitoring
- Provides visibility into the performance, up time, and overall health of cloud powered applications
-
Platform, system, and application metrics
- Ingest data: Metrics, events, metadata
- Generates insights through dashboards, charts, alerts
-
SLI
- A metric that reflects how well a SLO is being met. Defines what you want to measure (i.e, latency, throughput, error rate)
-
SLO
- An agreed-upon target for a measurable attribute of a service that is specified in a SLA
-
SLA
- An agreement between a provider of a service and a customer using the service
- e.g, Maintain an [error rate (SLI)] of less than [0.3% (SLO)] for the billing system
-
Cloud Logging
- Fully managed, real-time log management with storage, search, analysis and alerting at exabyte scale
-
Collect
- Cloud events, configuration changes, and data from customer services
- Logs at various levels of the resource hierarchy
-
Analyze
- Log data in real time with the integrated Logs Explorer
- Exported logs from Cloud Storage or BigQuery
-
Export
- Export to Cloud Storage, Pub/Sub, or BigQuery
- Logs-based metrics for augmented monitoring
-
Retain
- Data access and services logs for 30 days and admin logs for 400 days
- Longer-term in Cloud Storage or BigQuery
-
Error Reporting
- Counts, analyzes, and aggregates the errors in your running cloud services
- Error notifications
- Error dashboard
- Available for App Engine, Apps Script, Compute Engine, Cloud Functions, Cloud Run, GKE, and Amazon EC2
- Processes Go, Java, .NET, Node.js, PHP, Python and Ruby
-
Cloud Trace
- Is a distributed tracing system that collects latency data from your applications and displays it in the Google Cloud console
- Displays data in near real-time
- Latency reporting
- Per-URL latency sampling
-
Collects latency data
- App Engine
- Google HTTP(S) load balancers
- Appliccations instrumented with the Cloud Trace SDKs
-
Cloud Profiler
- Continuosly analyze the performance of CPU or memory-intensive functions executed across an application
- Uses statistical techniques and extremely low-impact instrumentation
- Runs across all production instances
- Developers can analyze applications running anywhere (Google Cloud, other cloud platforms, on-premises) with support for Java, Go, Node.js, and Python