# What is delayed branching?

Computer ArchitectureComputer ScienceNetwork

When branches are processed by a pipeline simply, after each taken branch, at least one cycle remains unutilized. This is because of the assembly line-like apathy of pipelining. Instruction slots following branches are known as branch delay slots.

Delay slots can also appear following load instructions; these are defined load delay slots. Branch delay slots are wasted during traditional execution. However, when delayed branching is employed, these slots can be at least partly used.

Principle of Delayed branching

titi+1ti+2ti+3ti+4
Bb
FDEWB

FDEWB
Csub

FD

BTAF

In the figure, it can transfer the add instruction of our program segment that initially preceded the branch into the branch delay slot. With delayed branching, the processor implements the add instruction first, but the branch will only be efficient later. Thus, in this example, delayed branching keep the initial execution sequence −

add r1, r2, r3;
b anywhere;
anywhere: sub

It defines an unconditional branch. Conditional branches cause the same or higher delays during an easy pipelined execution. This is because of the additionally needed operation of checking the particular condition.

Accordingly, instruction in the delay slot of an untaken branch will always be executed. Branching to the target instruction (sub) is executed with one pipeline cycle of delay. This cycle is used to execute the instruction in the delay slot (add). Thus delayed branching results in the following execution sequence −

a, add
b, b
c, sub

Delayed branching was first introduced in the MANIAC I in 1952, and was commonly used later in microprogramming (Patterson and Sequin, 1981). At the beginning of the 1980s, this scheme was ‘reinvented’ in the RISC-I (Patterson and Sequin 1981), and used subsequently in several RISC architecture emerging at that time, such as the MIPS (1982p), RISC-II (1983), MIPS-R-line (from 1987 on) and AMD 29000 (1987).