History bits are used to record branch history. Processors employ one of the four different schemes to implement history bits as shown in the figure. In the most straightforward scheme history bits are placed in the I-cache.
For instance, the α processors provide one (21064) or two (21064A) history bits in the I-cache for each instruction. In contrast, the UltraSparc maintains only two 2-bit entries for each cache line, which contains four instructions. The Sparc architecture maintains delay slots, thus not more than two branches (and two delay slots) can occur in four subsequent instructions.
In the PowerPC 604, there is a 512-entry BHT organized as 128 x 4 entries with two history bits per entry. The BHT is accessed by the instruction fetch address and delivers four entries belonging to the four instructions which are fetched at the same time from the I-cache and loaded into the Decode queue. The prediction logic evaluates the history bits corresponding to the branch instruction.
The PowerPC 604 employs both implicit prediction and explicit 2-bit prediction. Implicit prediction is a favored prediction technique since it allows a correctly guessed taken path to be accessed without any penalty and it is easy to implement.
The problem with implicit prediction, it requires a fully associative implementation. But the high cost of a fully associative structure restricts the size of the BTAC or BTIC in most current implementations to 32-256 entries. The processors using implicit predictions are also equipped with a more efficient but ‘slower’ prediction technique.
For example, let us discuss how multiple predictions are implemented in the PowerPC 604 and PowerPC 620 processors. In these processors, implicit prediction is combined with 2-bit prediction. The resulting guess is derived as shown in the table.
Combining implicit and 2-bit prediction, as implemented in the PowerPC 604 (1995) and 620 (1996) processors
|BTAC||Outcome of the 2 bit prediction||Overall Prediction|
|Miss||Not taken||Not taken|
The PowerPC 604 has a 64-entry BTAC and a 512-entry BHT, whereas the corresponding values for the PowerPC 620 are 256 entries in the BTAC and 2K entries in the BHT.
When there is an entry in the BTAC for a referenced fetch address, that is, the BTAC hits, the overall prediction is ‘taken’, regardless of the outcome of the 2-bit prediction. If there is no corresponding entry in the BTAC, there is a miss, the outcome of the 2-bit prediction is used as the overall guess.
When the overall ‘taken’ prediction results from a BTAC hit, the taken penalty is zero in both the 604 and 620. In contrast, if the overall ‘taken’ prediction is derived, in the case of a BTAC miss, from the 2-bit prediction, the PowerPC 604 has a ‘taken’ penalty of 1-2 cycles whereas the PowerPC 620 has a penalty of only 1 cycle.