1. loop type
    1. do-all loops
      1. Their iterations are completely independent of one another
        1. They can be executed in parallel
    2. do-across loops
      1. There is data dependence between consecutive iterations
  2. loop unrolling
    1. Cons
      1. Long code size
    2. Pros
      1. Better hardware utilization
    3. Software pipelining is optimized version of loop unrolling
  3. How to make software pipelining
    1. locally optimized schedule is not a optimized code
    2. Unroll the loop
      1. Find each structure of software pipelining
        1. Structure
          1. Prologue
          2. Steady-state
          3. Epilogue
  4. Register allocation
    1. There are some cases for interfering between adjacent pairs of iterations
    2. We can use more registers to avoid interfering
  5. Software pipelining for do-across loop
    1. we can change the order of instruction unless it does affect the syntax
    2. Giving the more adder or multiplier machines can not make the loop faster.
      1. The throughput is limited by the chain of dependences across iterations
  6. Goals
    1. Minimize interval
      1. maximize the throughput of the long-running loop
      2. Keep the size of the code generated reasonably small
        1. Small steady-state of the pipeline
  7. Constraints
    1. Resources dependences
      1. Modular Resource Reservation
        1. The initiation interval must be no smaller than the ration of units needed of each resource and the units available on the machine
    2. Data dependences
      1. data-dependence cycles
        1. The initiation interval is further constrained by the sum of the delays in the cycle divided by the sum of the iteration differences
    3. The largest of these quantities defines a lower bound on the initiation interval
  8. algorithms
    1. Acyclic
    2. Cyclic
      1. Strongly connected components
        1. A set of nodes where every node in the component can be reached by every other node in the component
  9. Improvement