1. Sort
    1. Sorts the records based on the keys
    2. Important parameters
      1. keys
      2. max-core
        1. Default 100 MB
    3. Runtime behavior
      1. Reads the records from all the flows connected to the in port until it reaches the number of bytes specified in the max-core parameter
      2. Sorts the records and writes the results to a temporary file on disk
      3. Repeats this procedure until it has read all records
      4. Merges all the temporary files, maintaining the sort order
      5. Writes the result to the out port
    4. Note
      1. When connecting a fan-in or all-to-all flow to the in port of a Sort, you do not need to use a Gather because Sort can gather internally on its in port.
    5. Performance Note
      1. Sort is relatively expensive in terms of computing resources because it writes files to disk, thus breaking pipeline parallelism. Therefore, you should place it in a graph so that it processes the smallest number of records possible.
  2. Sort within groups
    1. refines the sorting of data records already sorted according to one key specifier: it sorts the records within the groups formed by the first sort according to a second key specifier.
    2. Parameters
      1. Major-key
        1. It assumes input is ordered based on this key
      2. minor-key
        1. sort within the groups using this key
      3. max-core
        1. 10MB - default
      4. allow-unsorted
        1. set input sorted according to major key or not
    3. Runtime behavior
      1. Sort within Groups assumes input records are sorted according to the major-key parameter.
      2. Sorts the records in the group according to the minor-key parameter
      3. Writes the results to the out port
      4. Repeats this procedure with the next group
      5. input is not sorted on major-key, allow-unsorted=false = out-of-records failure
  3. Checkpointed sort
    1. sorts and merges data records, inserting a checkpoint between the sorting and merging phases.
    2. Runtime Behavior
      1. Reads records from the in port until it reaches the number of bytes specified in the max-core parameter.
      2. Sorts those records.
      3. Writes the results to a temporary file on disk.
      4. Repeats this procedure until it processes all the records.
      5. Inserts a checkpoint, saving the sorted temporary files. This checkpoint is very inexpensive in terms of computing resources, since Checkpointed Sort has already saved the temporary files to disk.
      6. Merges all temporary files, maintaining the sort order.
      7. Writes the result to the out port.
      8. Checkpointed Sort stores temporary files in the working directories specified by its layout.
  4. Intrinsic sort component
    1. Find Splitters
      1. Refer partition mindmap
    2. Sample
      1. selects a specified number of data records at random from one or more input flows. The probability of any one record from the input flow appearing in the output flow is the same — it does not depend on the position of the record in the input flow.
      2. Important Parameter
        1. sample-size
        2. max-skew
        3. Seed
        4. serial-layout
  5. Partition By key and Sort
    1. repartitions data records by key values and then sorts the records within each partition; the number of input and output partitions can be different.