What is delayed branching?

When branches are processed by a pipeline simply, after each taken branch, at least one cycle remains unutilized. This is because of the assembly line-like apathy of pipelining. Instruction slots following branches are known as branch delay slots.

Delay slots can also appear following load instructions; these are defined load delay slots. Branch delay slots are wasted during traditional execution. However, when delayed branching is employed, these slots can be at least partly used.

Principle of Delayed branching





In the figure, it can transfer the add instruction of our program segment that initially preceded the branch into the branch delay slot. With delayed branching, the processor implements the add instruction first, but the branch will only be efficient later. Thus, in this example, delayed branching keep the initial execution sequence −

add r1, r2, r3;
b anywhere;
anywhere: sub

It defines an unconditional branch. Conditional branches cause the same or higher delays during an easy pipelined execution. This is because of the additionally needed operation of checking the particular condition.

Accordingly, instruction in the delay slot of an untaken branch will always be executed. Branching to the target instruction (sub) is executed with one pipeline cycle of delay. This cycle is used to execute the instruction in the delay slot (add). Thus delayed branching results in the following execution sequence −

a, add
b, b
c, sub

Delayed branching was first introduced in the MANIAC I in 1952, and was commonly used later in microprogramming (Patterson and Sequin, 1981). At the beginning of the 1980s, this scheme was ‘reinvented’ in the RISC-I (Patterson and Sequin 1981), and used subsequently in several RISC architecture emerging at that time, such as the MIPS (1982p), RISC-II (1983), MIPS-R-line (from 1987 on) and AMD 29000 (1987).

Disadvantage of Delayed Branching

There are various disadvantages of delayed branching which are as follows −

  • Delayed branching requires a redefinition of the architecture.

  • Delayed branching gives rise to a slight code expansion due to the NOPs to be inserted. For instance, it would have to insert 100∗fb∗(1−ff)=100∗ 0.2∗(1−0.6)=8NOPs per 100 instructions and thus would have 8% longer code than without delay branching.

  • Interrupt processing becomes more difficult. This is because interrupt requests caused by instructions in the delay slot have to be processed differently from those arising from ‘normal’ instructions. When a delay slot instruction initiates an interrupt, the preceding instruction namely the conditional branch has already been fetched but not yet processed. This situation is quite different from that which occurs in traditional instruction processing where all instructions preceding an instruction that causes an interrupt has already been completed.

  • Additional hardware is required to implement delayed branching.

Updated on: 23-Jul-2021

6K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started