-
loop type
-
do-all loops
-
Their iterations are completely independent of one another
- They can be executed in parallel
-
do-across loops
- There is data dependence between consecutive iterations
-
loop unrolling
-
Cons
- Long code size
-
Pros
- Better hardware utilization
- Software pipelining is optimized version of loop unrolling
-
How to make software pipelining
- locally optimized schedule is not a optimized code
-
Unroll the loop
-
Find each structure of software pipelining
-
Structure
- Prologue
- Steady-state
- Epilogue
-
Register allocation
- There are some cases for interfering between adjacent pairs of iterations
- We can use more registers to avoid interfering
-
Software pipelining for do-across loop
- we can change the order of instruction unless it does affect the syntax
-
Giving the more adder or multiplier machines can not make the loop faster.
- The throughput is limited by the chain of dependences across iterations
-
Goals
-
Minimize interval
- maximize the throughput of the long-running loop
-
Keep the size of the code generated reasonably small
- Small steady-state of the pipeline
-
Constraints
-
Resources dependences
-
Modular Resource Reservation
- The initiation interval must be no smaller than the ration of units needed of each resource and the units available on the machine
-
Data dependences
-
data-dependence cycles
- The initiation interval is further constrained by the sum of the delays in the cycle divided by the sum of the iteration differences
- The largest of these quantities defines a lower bound on the initiation interval
-
algorithms
- Acyclic
-
Cyclic
-
Strongly connected components
- A set of nodes where every node in the component can be reached by every other node in the component
- Improvement