The PentiumPro is the flagship of Intel’s x86 line of processors. The Pentium Pro processor performs a dynamic implementation microarchitecture such as a specific set of multiple branch prediction, data flow analysis, and speculative implementation. The Pentium Pro processor has a decoupled, 12- phases, super pipelined implementation, trading less work per pipestage for more phases.
The Pentium Pro processor also has a pipestage time of 33 percent less than the Pentium processor, which supports obtain a higher clock value on any given process. The method utilized by the Pentium Pro processor eliminates the constraint of linear instruction sequencing between the traditional “fetch” and “execute” stage, and opens up a large instruction window utilizing an instruction pool.
This method enables the “execute” stage of the Pentium Pro processor to have greater perceptibility into the program’s instruction stream so that more scheduling can take place.
It needed the instruction “fetch/decode” stage of the Pentium Pro processor to be much smarter in the phase of predicting program flow. Optimized scheduling needed the basic “execute” phase to be restored by decoupled “dispatch/execute” and “retire” phases. This enables instructions to be initiated in any series but always be finished in the initial program order.
The main features of PentiumPro are as follows −
It is a superscalar CISC processor with a RISC core.
It issues up to three RISC operations per cycle and dispatches up to five RISC operations per cycle.
It has a unified central reservation station with 20 entries, used at the same time for all types of instructions, such as for FX and FP instructions and so on.
Strict sequential consistency is retained using a reorder buffer.
Renaming is implemented in the reorder buffer.
The PentiumPro has an extremely long pipeline of at least 14 stages for FX instructions. Like another superscalar CISC processor, the PentiumPro first converts the fetched CISC instruction internally into RISC ones, known as uops. Subsequently, a superscalar RISC core executes the uops. Finally, the back-end of the processor ensures the logical consistency of the execution.
Instructions are fetched in 128-bit chunks from the I-cache into the I-buffer. Instructions taken from the I-buffer must first be aligned because of the variable instruction length character of the CISC instruction.
Then, up to three CISC instructions are decoded and converted to RISC instruction in each cycle. The conversion is carried out by two simple decoders (D1 and D2), a general decoder (D3), and a microinstruction sequencer (MIS).
Both simple decoders can only accept instructions that are transformed into an individual uop. There are more complex instructions which translate into a maximum of four uops are translated by the generalized decoder (D3), and instructions resulting in more than four uops are converted by the MIS.