What is the BTAC scheme?


This scheme employs a more cache, known as the branch target address cache (BTAC), for speeding up access to branch targets as shown in the figure. The BTAC includes a group of currently used branch addresses and branch target addresses and is accessed relatively.

When the actual instruction fetch address is a branch address, and there is an equivalent entry in the BTAC, the branch target address is fetched along with the branch instruction in a similar cycle. This BTA is then used to access the branch target instruction in the next cycle.

The Branch Target Address Cache (BTAC) includes branch target addresses (BTAs). These BTAs are read from the BTAC at the same time as the branch instruction is fetched.

In this way branch target instructions (BTIs) may be fetched immediately in succession to the branch instructions, that is without any idle cycles. Furthermore, the BTAC scheme even has the potential to implement zero-cycle branching. With zero-cycle branching, the first target instruction can be fetched immediately after the last sequential instruction preceding a branch without any delay.

For zero-cycle branching, the branch target address (BTA) must be accessed along with the instruction preceding the branch. Then the BTAC must contain instead of the branch address (BA), the instruction fetch address preceding the branch addresses. For a scalar processor with 4-byte instruction, this would be the address BA – 4.

The BTAC scheme was proposed by Lee and Smith (1984) and has been called Branch target buffer design. This scheme is implemented in some recent processors, as shown in the table. The number of BTAC entries varies from 32 to 4K.

Example of processors using the BTAC scheme

ProcessorNumber of BTAC entriesImplementation of the BTAC
ES/9000 520-based processors (1992p)4k2-way associative
Pentium (1994)256Fully associative
MC 68060 (1993)2564-way associative
PA 8000 (1995)32Fully associative
PowerPC 604 (1994)64Fully associative
PowerPC 620 (1995)256Fully associative

There are some differences in the implementation of the BTAC scheme, especially concerning the following issues −

  • Whether the BTAC is implemented as a 2-way, 4-way of fully-associative cache.

  • How the BTAC is initialized.

  • Whether entries are retained in the BTAC for all recent branches or only for recently taken branches (in the latter case the BTAC scheme also performs implicit dynamic prediction).

  • How to select the entry to be overwritten, if there is no room in the BTAC for a new entry.

  • If the processor uses predict bits, whether they are contained in the BTAC or a separate BHT.

Updated on: 23-Jul-2021

244 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements