What is the performance measure of branch processing in computer architecture?

Computer Architecture Computer Science Network

It can evaluate and compare different branch processing techniques, it can require a performance measure. Let us consider the execution of a branch instruction in a four-stage pipeline as shown in the figure. If the branch is processed straightforwardly, the branch target address (BTA) will be computed in cycle t_i+3.

Then the branch target instruction can be fetched in cycle t_i+4. Thus, the branch target instruction is fetched with a 3 cycles delay in comparison to the fetching of the branch instruction. This means a 2-cycle penalty compared to the sequential processing.

The performance of branch processing in a certain typical situation. Let us denote the penalties of ‘taken’ and ‘not taken’ branches by P_tand P_nt and the corresponding probabilities (frequencies) of ‘taken’ and ‘not taken’ branches as f_tand f_nt. Then the effective penalty of branch processing P is

P = f_t ∗ P_t+ f_nt ∗ P_nt

For example, it can calculate the effective penalties of the 80386 and i486 processors. For the 80386 the values of the taken and not-taken penalties are 8 and 2 cycles, respectively. When the probability of taken branches is assumed to be 0.75 (f_i=0.75), it can get the effective penalty of branches in the 80386 −

P₈₀₃₈₆=0.75 ∗ 8+0.25 ∗ 2=6.5 cycles

This means that the 80386 requires, on average, 6.5 additional cycles for each branch. In contrast, the i486 has a substantially enhanced branch mechanism. Its effective branch penalty is

P_i486=0.75 ∗ 2+0.25 ∗ 0=1.5

Which is considerably less than the penalty of 80386.

A further typical situation is when branch processing uses branch prediction. In this case, a prediction is made for each branch by guessing whether the branch in question will be taken or not. Let us consider the following notation −

P_tc − penalty for correctly predicted taken branches

P_tm − penalty for mispredicted taken branches

P_ntc − penalty for correctly predicted not-taken branches

P_ntm − penalty for mispredicted not-taken branches

f_tc − probability for correctly predicted taken branches

f_tm − probability for mispredicted taken branches

f_ntc − probability for correctly predicted not-taken branches

f_ntm − probability for mispredicted not-taken branches

Then, the effective penalty of branch processing can be expressed as −

P=f_tc∗P_tc+f_tm∗P_tm+f_ntc∗P_ntc+f_ntm∗Pntm

It can assume a straightforward case when the branch penalties for correctly predicted taken and not-taken branches, and for mispredicted taken and not-taken branches are equal. That is

P_tc=P_ntcand P_tm=P_ntm

Furthermore, let us designate the total probability of correctly predicted branches as f_c and that of mispredicted branches as f_m, that is

f_c=f_tc+f_ntcand f_m=f_tm+f_ntm

In this straightforward case, the effective branch penalty can be calculated as

P=f_c∗P_c+f_m∗P_m

Let us consider the Pentium processor which uses branch prediction. In this case, the penalty for correctly predicted branches is 0 cycles, whereas that for mispredicted branches equals wither 3 cycles (if the branch is processed by the U pipe) or 4 cycles (if the branch is executed in the V pipe).

For the calculation, let us suppose an average misprediction penalty of 3.5. When it assumes a branch prediction accuracy of 0.9 (that is, f_c=0.9 and f_m=0.1) we get the effective branch penalty of this processor

P_Pentium=0.9∗0+0.1∗3.5=0.35

That is Pentium requires, on average, only 0.35 additional cycles for branches.

Ginni

Updated on: 23-Jul-2021

335 Views

Kickstart Your Career

Get certified by completing the course

Get Started