-
Overview
- Used to configure pipelines for batch and streaming data processing
- Suitable where data shows up in real time
- Pipelines can read data from a BigQuery table
- Pipelines can transform and write output to Cloud Storage
- Pipeline transforms can be map operations or reduce operations
- Can be used to build expressive pipelines
- Each step in the pipeline can be elastically scaled
- There is no need to launch and manage a cluster
- Provides compute resources needed on demand
- Has automated and optimized work partitioning built-in which can dynamically re-balance lagging work that reduces the need to worry about hotkeys
- Hotkeys refers to situations where a proportionately large chunks of input get mapped to the same cluster
- There is no need to spin up a cluster or to size instances
- Fully automates the management of processing resources required
- Frees users from performance optimization