- Trending Categories
- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies

- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who

# What is the performance measure of branch processing in computer architecture?

It can evaluate and compare different branch processing techniques, it can require a performance measure. Let us consider the execution of a branch instruction in a four-stage pipeline as shown in the figure. If the branch is processed straightforwardly, the branch target address (BTA) will be computed in cycle t_{i+3}.

Then the branch target instruction can be fetched in cycle t_{i+4}. Thus, the branch target instruction is fetched with a 3 cycles delay in comparison to the fetching of the branch instruction. This means a 2-cycle penalty compared to the sequential processing.

The performance of branch processing in a certain typical situation. Let us denote the penalties of ‘taken’ and ‘not taken’ branches by **P _{t}and P_{nt}** and the corresponding probabilities (frequencies) of ‘taken’ and ‘not taken’ branches as

**f**. Then the effective penalty of branch processing P is

_{t}and f_{nt}P = f_{t}∗ P_{t}+ f_{nt}∗ P_{nt}

For example, it can calculate the effective penalties of the 80386 and i486 processors. For the 80386 the values of the taken and not-taken penalties are 8 and 2 cycles, respectively. When the probability of taken branches is assumed to be 0.75 (f_{i}=0.75), it can get the effective penalty of branches in the 80386 −

P_{80386}=0.75 ∗ 8+0.25 ∗ 2=6.5 cycles

This means that the 80386 requires, on average, 6.5 additional cycles for each branch. In contrast, the i486 has a substantially enhanced branch mechanism. Its effective branch penalty is

P_{i486}=0.75 ∗ 2+0.25 ∗ 0=1.5

Which is considerably less than the penalty of 80386.

A further typical situation is when branch processing uses branch prediction. In this case, a prediction is made for each branch by guessing whether the branch in question will be taken or not. Let us consider the following notation −

P_{tc} − penalty for correctly predicted taken branches

P_{tm} − penalty for mispredicted taken branches

P_{ntc} − penalty for correctly predicted not-taken branches

P_{ntm} − penalty for mispredicted not-taken branches

f_{tc} − probability for correctly predicted taken branches

f_{tm} − probability for mispredicted taken branches

f_{ntc} − probability for correctly predicted not-taken branches

f_{ntm} − probability for mispredicted not-taken branches

Then, the effective penalty of branch processing can be expressed as −

P=f_{tc}∗P_{tc}+f_{tm}∗P_{tm}+f_{ntc}∗P_{ntc}+f_{ntm}∗Pntm

It can assume a straightforward case when the branch penalties for correctly predicted taken and not-taken branches, and for mispredicted taken and not-taken branches are equal. That is

P_{tc}=P_{ntc}and P_{tm}=P_{ntm}

Furthermore, let us designate the total probability of correctly predicted branches as f_{c} and that of mispredicted branches as f_{m}, that is

f_{c}=f_{tc}+f_{ntc}and f_{m}=f_{tm}+f_{ntm}

In this straightforward case, the effective branch penalty can be calculated as

P=f_{c}∗P_{c}+f_{m}∗P_{m}

Let us consider the Pentium processor which uses branch prediction. In this case, the penalty for correctly predicted branches is 0 cycles, whereas that for mispredicted branches equals wither 3 cycles (if the branch is processed by the U pipe) or 4 cycles (if the branch is executed in the V pipe).

For the calculation, let us suppose an average misprediction penalty of 3.5. When it assumes a branch prediction accuracy of 0.9 (that is, f_{c}=0.9 and f_{m}=0.1) we get the effective branch penalty of this processor

P_{Pentium}=0.9∗0+0.1∗3.5=0.35

That is Pentium requires, on average, only 0.35 additional cycles for branches.

- Related Articles
- What is design space of Branch processing in computer architecture?
- How to handle unresolved conditional branch processing policies in computer architecture?
- What is Vector Processing in Computer Architecture?
- What is the performance of Load-use delay in Computer Architecture?
- What is the Microarchitectural implementation of branch processing?
- Explain the performance of cache in computer architecture?
- What are the various approaches for branch handling in computer architecture?
- How does pipelining improve performance in computer architecture?
- What are the architecture of Parallel Processing?
- What is the Evolution of Computer Architecture?
- What is computer architecture?
- Explain the performance of Addition and Subtraction with signed magnitude data in computer architecture?
- What is the structure of Pipelining in Computer Architecture?
- What is the purpose of Complements in Computer Architecture?
- What is the Format of Microinstruction in Computer Architecture?