Is a managed service for executing a wide variety of data processing patterns
Serverless, fully managed data processing
Batch and stream processing with autoscale
Open source programming using Apache Beam
Dataprep
Is an intelligent data service for visually exploring, cleaning and preparing structured and unstructured data for analysis reporting and machine learning
Serverless, works at any scale
Suggest ideal data transformation
Focus on data analysis
Integrated partner service operated by Trifacta
Dataproc
Fully managed cloud service for running Apache Spark and Apache Hadoop clusters
Low cost (per-second, preemptible)
Super fast to start, scale, and shut down
Integrated with others Google Cloud services (BigQuery, Cloud Storage, Cloud Bigtable)
Dataproc or Dataflow
Can both be used for data processing
If you have dependencies on specific tools or packages in the Apache Hadoop or Spark, use Dataproc
If you prefer a hands-on or dev ops approach to operations, use Dataproc
If you prefer a hands-off or serverless approach, use Dataflow
Data Catalog
Automatically catalogs metadata from Google Cloud sources as BigQuery, Vertex AI, Pub/Sub, Spanner, Bigtable
indexes table abd fileset metadata from Cloud Storage
Three main functions
Searching for data entries for which you have access
Tagging data entries with metadata
Providing column-level security for BigQuery tables