Google Cloud Managed Services

Dataflow
1. Is a managed service for executing a wide variety of data processing patterns
  1. Serverless, fully managed data processing
  2. Batch and stream processing with autoscale
  3. Open source programming using Apache Beam
Dataprep
1. Is an intelligent data service for visually exploring, cleaning and preparing structured and unstructured data for analysis reporting and machine learning
  1. Serverless, works at any scale
  2. Suggest ideal data transformation
  3. Focus on data analysis
  4. Integrated partner service operated by Trifacta
Dataproc
1. Fully managed cloud service for running Apache Spark and Apache Hadoop clusters
  1. Low cost (per-second, preemptible)
  2. Super fast to start, scale, and shut down
  3. Integrated with others Google Cloud services (BigQuery, Cloud Storage, Cloud Bigtable)
Dataproc or Dataflow
1. Can both be used for data processing
  1. If you have dependencies on specific tools or packages in the Apache Hadoop or Spark, use Dataproc
  2. If you prefer a hands-on or dev ops approach to operations, use Dataproc
  3. If you prefer a hands-off or serverless approach, use Dataflow
Data Catalog
1. Automatically catalogs metadata from Google Cloud sources as BigQuery, Vertex AI, Pub/Sub, Spanner, Bigtable
2. indexes table abd fileset metadata from Cloud Storage
3. Three main functions
  1. Searching for data entries for which you have access
  2. Tagging data entries with metadata
  3. Providing column-level security for BigQuery tables