What is Multiway Branching?


Multiway branching is another possibility for reducing branch penalties. With multiway branching, both the sequential and the taken paths of an unresolved conditional branch are pursued, as shown in the figure. The multiway branching requires multiple program counters (PCs) referred to as IFA1 and IFA2 in the figure.

Once the specified condition is resolved, which of the paths is correct becomes evident. If the correct path is the sequential one, its execution will be confirmed and the taken path execution discarded, consequently, IFA1 contains the correct continuous address. In the opposite case, vice versa.

During speculative execution of a conditional branch, a second unresolved conditional branch instruction may occur. Consequently, a more advanced multiway branching scheme should allow multiple unresolved branches.

As displayed in the figure, it shows threefold multiway branching. In this case, four instruction fetch addresses are maintained concurrently (IFA1- IFA4). Only one of these is related to the correct path. After all conditional branches have been resolved, the single correct thread is ascertained and all the computations belonging to incorrect paths are cancelled.

Although multiway branching seems to be attractive as a means of increasing performance, especially for higher parallel ILP-processors (like VLIWs), this technique also has significant drawbacks. First, multiway branching has a substantially higher demand for hardware resources (in the first-line execution units) than speculative branch processing.

Furthermore, for multiple multiway branching, preserving sequential consistency and discarding superfluously executed computation becomes an increasingly complex and time-consuming task. Multiway branching has been used in only a few processors. For instance, in the Multiflow TRACE 500 VLIW architecture proposal (Wolfe and Shen, 1991) two paths can be pursued, and two sets of 14 functional units were planned.

A second example is the URPR-2, which is an experimental machine. It has nine functional units with separate IFAs, allowing multiple multiway branching. The last example is a novel computational model called XIMD and its first experimental implementation, the XIMD-1 (Wolfe and Shen, 1991).

This model enables concurrent execution of multiple paths (threads) and can operate in several different modes, one of which can address multiway branching. The experimental implementation has eight identical 32-bit RISC processors, each equipped with its own IFA.

There are some expectations that this brute-force approach to speeding up the processing of unresolved conditional branches still has a future (Brian, 1994). This opinion is based on the rapid development of technology allowing increasingly complex microarchitectures to be realized for approximately the same cost.

Updated on: 23-Jul-2021

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements