Convex was the first device produce to commercialize a CC-NUMA machine, known as the SPP1000. SPP represents a Scalable Parallel Processor. The goals of the SPP Exemplar series are to make a family of high-implementation computers where the multiple processors can simply range from 10 to 1000 and the peak implementation would arrive at the TeraFLOPS.
The node of the SP1000 is symmetric multiprocessors, called hyper nodes. Each hypernode includes four functional blocks and an I/O subsystem. Each functional block includes two CPUs (HP PA-RISCs) sending an individual CPU agent, and a memory unit influencing hypernode private memory data, global memory data, and network cache data. A five-port crossbar interconnects the four functional blocks and the I/O subsystem.
The hyper nodes are connected by four SCI (Scalable Coherent Interface) point to point, unidirectional rings. SCI can support several kinds of interconnection network, the unidirectional ring is one of those sequential memory references to global memory are interleaved across the four rings.
This is accomplished using the ring in the same functional unit as the target memory because the memories are interleaved on a 64-byte basis. The four SCI rings are interleaved on this basis as well and 64 bytes is the network cache line size.
The global shared memory of the SPP1000 is distributed among the hyper nodes. The Convex Exemplar series is constructed using a hierarchical memory architecture containing four types of allocable memory differentiated by the way data is allocated and shared.
The existence of four different memories in allocation, sharing, or latency does not imply that there must be four distinct physical memories. All four memories, as well as the network cache, may be implemented by the same physical memory on each hypernode.
In the Exemplar the following types of memories are provided, listed in order of increasing memory latency −
CPU-private memory serves for data accessed only by a single CPU. The CPU-private memory is not physically performed. It is the operating framework that division hyper code-private memory used as CPU-private memory for each of the CPUs.
Hypernode private memory is provided for data shared only by CPUs within a single hypernode.
Near-shared memory is universally accessible from all hyper nodes. This memory is allocated from the global memory, Accessing a near-shared memory from the home hyper node requires less latency than accessing it from other nodes.
Far-shared memory is universally accessible from all hyper nodes with the same latency. It is allocated from the global memories of several hyper nodes. Coherent memory that is designed far-shared (shared by and distributed across multiple hyper nodes) is interleaved across the hyper nodes on a per-page basis (4 Kbyte) by operating system software allocation of the table entries.