1. Integrated monitoring, logging, and trace managed services for applications and systems running on Google Cloud and beyond.
  2. Cloud Monitoring
    1. Provides visibility into the performance, up time, and overall health of cloud powered applications
    2. Platform, system, and application metrics
      1. Ingest data: Metrics, events, metadata
      2. Generates insights through dashboards, charts, alerts
    3. SLI
      1. A metric that reflects how well a SLO is being met. Defines what you want to measure (i.e, latency, throughput, error rate)
    4. SLO
      1. An agreed-upon target for a measurable attribute of a service that is specified in a SLA
    5. SLA
      1. An agreement between a provider of a service and a customer using the service
      2. e.g, Maintain an [error rate (SLI)] of less than [0.3% (SLO)] for the billing system
  3. Cloud Logging
    1. Fully managed, real-time log management with storage, search, analysis and alerting at exabyte scale
    2. Collect
      1. Cloud events, configuration changes, and data from customer services
      2. Logs at various levels of the resource hierarchy
    3. Analyze
      1. Log data in real time with the integrated Logs Explorer
      2. Exported logs from Cloud Storage or BigQuery
    4. Export
      1. Export to Cloud Storage, Pub/Sub, or BigQuery
      2. Logs-based metrics for augmented monitoring
    5. Retain
      1. Data access and services logs for 30 days and admin logs for 400 days
      2. Longer-term in Cloud Storage or BigQuery
  4. Error Reporting
    1. Counts, analyzes, and aggregates the errors in your running cloud services
    2. Error notifications
    3. Error dashboard
    4. Available for App Engine, Apps Script, Compute Engine, Cloud Functions, Cloud Run, GKE, and Amazon EC2
    5. Processes Go, Java, .NET, Node.js, PHP, Python and Ruby
  5. Cloud Trace
    1. Is a distributed tracing system that collects latency data from your applications and displays it in the Google Cloud console
    2. Displays data in near real-time
    3. Latency reporting
    4. Per-URL latency sampling
    5. Collects latency data
      1. App Engine
      2. Google HTTP(S) load balancers
      3. Appliccations instrumented with the Cloud Trace SDKs
  6. Cloud Profiler
    1. Continuosly analyze the performance of CPU or memory-intensive functions executed across an application
    2. Uses statistical techniques and extremely low-impact instrumentation
    3. Runs across all production instances
    4. Developers can analyze applications running anywhere (Google Cloud, other cloud platforms, on-premises) with support for Java, Go, Node.js, and Python