There are two methods of exploiting parallelism in parallel computer architecture are as follows −
In pipelining, several functional units are working in sequence to implement a single computation. These functional units form an assembly line or pipeline. Each functional unit describes a specific phase of the computation and each computation goes through the entire pipeline.
If there is only a single computation to be executed, the pipeline cannot extract any parallelism. But, when the same computation is to be implemented multiple times, these computations can be overlapped through the functional units.
Assume that the pipeline consists of N functional units (stages) and that the slowest required time T to execute its function. Under such circumstances, a new computation can be started at every Tth moment. The pipeline is filled up when all the functional units are working on a different computation. Once the pipeline is filled up, a new computation is finished at every Tth moment.
The example of applying pipelining can be found in systolic arrays. However, here the processor of the array forms the pipeline in either one or two dimensions. The wavefront array is an asynchronous version of the systolic array, where data is transferred according to the dataflow principle but the pipeline mechanism of systolic systems is preserved.
A normal method of introducing parallelism to a computer is the replication of functional units such as a processor. Replicated functional units can implement a similar operation together on as can data components as there are replicated computational resources available. The classical example is the array processor which occupies a huge number of identical processors implementing a similar operation on multiple data components.
Wavefront arrays and two-dimensional systolic arrays also use replication alongside pipelining parallelism. All the MIMD architectures employ replication as their main parallel technique. Inside the processor, both VLIW and superscalar processors can apply it. Some multi-threaded processors are also designed to exploit replication.
However, not only processor units but also memory banks can be replicated. Interleaved memory design is a well-known technique to decrease memory latency and improve execution.
Similarly, I/O units can be advantageously replicated resulting in higher I/O throughput. The most superficial method of replication is the increase of address and data lines in processor buses. Microprocessor buses have developed from 8-bit buses to 64-bit buses and this process will not be stopped shortly.