This is the latest scheme introduced to access branch targets, employed in a few recently announced processors such as the Am29000 superscalar, K5, and UltraSparc. Here, the basic idea is to append, for each line in the I-cache, a successor index that points to the next line to be fetched as shown in the figure. In all the processors, each cache line can contain 16 bytes of instructions.
This means that a cache line holds in the AM29000 superscalar and the UltraSparc four instructions, whereas in the x86-compatible K5 a variable number of CISC instructions. The successor index is fetched in parallel with the instructions of the same line. It points either to the next sequential line or, if the present line contains a branch that is guessed to be taken, to the line which contains the first taken path instruction.
The I-cache contains a successor index that points to the next instruction cache entry to be fetched. The successor index points either to the next sequential line or in the case of a branch that is guessed to be taken, to the line which contains the first taken path instruction. Examples of processors using these schemes are Am29000 superscalar (1995), K5 (1995), UltraSparc (1995).
The UltraSparc is the first processor announced which implements the SPARC V9 ISA. It is a superscalar full 64-bit design with an issue rate of four. The UltraSparc includes a predecode unit. The main task of the predecode unit is to partially decode the instruction and to label them accordingly using 4-bit tags. These tags are stored along with each instruction in the I-cache and allow quick decoding. As instructions are loaded in the I-cache, the predecode unit detects branches, determines the corresponding BTAs, and makes a prediction using a hint bit delivered by the compiler.
This prediction is used to initialize the prediction bits available for every two instructions (two bits). The prediction bits are updated according to the branch history. The successor indices and prediction bits are held in an extra 2K buffer (called the Next field RAM), one successor index for every four instructions. If the prediction is taken, the successor index becomes the value of the determined BTA, otherwise, the next sequential address is taken as the successor index.
The successor index is then used as the next instruction fetch address. Thus, for taken prediction, the successor index redirects execution to the taken path. The UltraSparc employs a 2-bit dynamic prediction with the history bits placed in the Next Field RAM.
In contrast, the K5 initializes all successor indices to the next sequential value and rewrites the indices only if the execution reveals that the actual value is incorrect. The R8000 also employs the successor index scheme. In the R8000 the next sequential address is not stored in the successor index field of the I-cache but is computed always on the fly.