Pipelining defines the temporal overlapping of processing. Pipelines are emptiness greater than assembly lines in computing that can be used either for instruction processing or, in a more general method, for executing any complex operations. It can be used efficiently only for a sequence of the same task, much similar to assembly lines.
A pipeline includes several stages, one for each subtask as shown in the figure. The stages are decoupled from each other by registers known as latches. As each clock style ends, the latches gate in their inputs and forward them into the associated stage where the needed operation will be executed.
Each stage is implemented by several different EUs which cooperate in performing the required operations as shown in the figure. The latches are extended with multiplexers that select and transfer data from the outputs of preceding EUs to the inputs of subsequent EUs.
The structure and pipelined operation of the FX unit of the IBM Power1 (RS/6000) (Grohoski, 1990). This unit executes FX and logical instructions, typically in four cycles. In the fetch cycle, instructions are fetched from the on-chip instruction cache via the instruction buses into the instruction buffer.
The decode cycle decodes fetched instruction and accesses the referenced register values. Then, in the next cycle (execute cycle) the specified data manipulation is performed. This stage provides three EUs that performed the required data manipulation: an adder, a logic unit, and a multiply/divide unit.
The first two-unit need only one cycle to accomplish the result. The multiply/divide unit does not operate in a pipelined mode and requires a considerable number of cycles to complete a multiplication or a division. Subsequently, the result is written back into the register file via the T-Latch during the following write-back cycle.
It is worth noting that to shorten or eliminate define-use delays, the result of the execute stage can be directly returned (bypassed) to the input of the execute stage (the A-, B- and S Latches) through the result bus.
The FX unit provides an additional stage for processing load and stores instructions, which operates in the cache-access cycle. In this cycle, data can be written into or read from the cache.