Software pipelining is a compile-time scheduling technique that overlaps consecutive loop iterations to disclose operation-level parallelism. A necessary issue with the development of adequate software pipelining algorithms is how to deal with loops with conditional branches.
Conditional branches raise the complexity and reduce the performance of software pipelining algorithms by presenting few possible execution paths into the scheduling scope. Software Pipelining is implemented either by techniques based on unrolling or by modulo scheduling as shown in the figure.
The basic idea of the techniques based on unrolling is quite simple − unroll the loop several times and arrange unrolled code in the most parallel fashion. Then look for a repetitive pattern in the arranged code and reroll the repetitive pattern, creating a new loop. This new loop along with the start-up (prologue) and finishing (epilogue) code sections, represents the software-pipelined instruction schedule.
It can show the unrolling method by one of its representatives called URPR (Unrolling, Pipelining, and Rerolling) as shown in the figure.
As shown in the figure, this variant of the unrolling technique first schedules the original loop body, then unrolls it k times, arranges it for the most parallel execution, and searches for a repetitive pattern. Finally, this pattern is rerolled, resulting in the new body. The outcome of the schedule is the new loop body preceded by the prologue code and succeeded by the epilogue code section.
The other major method of software pipelining is modulo scheduling. This technique was originally proposed by Rau and Glaeser in 1981 and has been implemented in several compilers for VLIW machines, such as the FPS-164 (Touzeau, 1984), CYDRA-5 (Rau et al, 1989), and iWARP (Lam, 1988).
Modulo scheduling is based on a type of list scheduling. In general, this method has an iterative character, and consists of three steps −
First, a guess is made concerning the minimal required length of the new loop body, commonly called in these methods the minimum initiation interval.
Next, a schedule for this interruption is attempted, taking into account information and resource dependencies.
If the attempted schedule is not feasible, the length of the new loop body is increased and a new schedule for the enlarged interval is prepared.
The designation ‘modulo scheduling’ emphasizes the repetitive character of the schedule concerning data and resource dependencies. Description of proposed or already implemented algorithms can be found in Rau and Glaser (1981), Touzeau (1984), and Lam (1988).