Cloud Dataflow

Overview
1. Used to configure pipelines for batch and streaming data processing
2. Suitable where data shows up in real time
3. Pipelines can read data from a BigQuery table
4. Pipelines can transform and write output to Cloud Storage
5. Pipeline transforms can be map operations or reduce operations
6. Can be used to build expressive pipelines
7. Each step in the pipeline can be elastically scaled
8. There is no need to launch and manage a cluster
9. Provides compute resources needed on demand
10. Has automated and optimized work partitioning built-in which can dynamically re-balance lagging work that reduces the need to worry about hotkeys
11. Hotkeys refers to situations where a proportionately large chunks of input get mapped to the same cluster
12. There is no need to spin up a cluster or to size instances
13. Fully automates the management of processing resources required
14. Frees users from performance optimization