Branch processing has two methods, such as the layout of branch processing and its microarchitectural implementation of branch processing, as shown in the figure. Branch processing consists of three major subtasks such as the detecting branches, handling unresolved conditional branches, and accessing the branch target path.
The first aspect is branch detection. Initially, processors detected branches during instruction decoding. However, the earlier a processor identifies branches, the previous branch processing can be initiated and the less penalties there are. Thus, novel schemes try to encounter branches as early as possible.
The most advanced technique of branch detection prevents explicit decoding. Instead, branch detection is unified into the instruction fetch structure. This scheme is known as the integrated instruction fetch and branch detection.
The instruction fetching structure is continued such that it can identify whether the next instruction to be fetched is a branch or not. Each detected branch is guessed to be taken and rather than, or in addition to, the next sequential instruction, the target address of the branch or even the target instruction is also fetched in advance.
The next method of the layout is the handling of unresolved conditional branches. It can indicate a conditional branch unresolved if the determined condition is not yet accessible at the time when it is computed during branch processing. A conditional branch cannot be computed before the referenced conditional is accepted.
For instance, if the specified condition defines the sign of the result of the previous instruction, the precondition of the evaluation is that the previous instruction has been implemented. Until the referenced condition becomes known, the conditional branch is unresolved.
The final method of the layout of branch processing is how the branch target path is accessed. The branch penalty for ‘taken’ guesses is based heavily on how the branch target path is accessed. Current processors use one of four basic techniques: compute/fetch scheme, BTAC (Branch Target Access Cache) scheme, BTIC (Branch Target Instruction Cache) scheme, and the successor index in the I-cache scheme.
The next phase of branch processing is the microarchitectural implementation of branch processing. The branch processing contains basic functions, including instruction fetch, decode, and BTA computation, and possibly additional dedicated functions to speed up branch processing.
These dedicated functions can be early branch detection, branch prediction, or an advanced scheme for accessing target paths. The dedicated function is implemented using dedicated hardware, such as BTAC, BTIC, or BHT.