What is CC-NUMA?


CC-NUMA stands for Cache-coherent non-uniform memory access machines. A CC-NUMA machine includes several processing nodes linked through a high-bandwidth low-latency interconnection network. Each processing node includes a high-implementation processor, the related cache, and an allocation of the global shared memory.

Cache coherence is preserved by a directory-based, write-invalidate cache coherence protocol. It can maintain all caches consistent, every processing node has a directory memory corresponding to its allocation of the shared physical memory.

For each memory line, the directory memory saves recognizes remote nodes caching that line. Thus, utilizing the directory, it is applicable for a node writing a location to send point-to-point messages to invalidate private copies of the equivalent cache line.

There is another essential attribute of the directory-based protocol is that it does not base on any definite interconnection network topology. Thus, some scalable networks, including a mesh, a hypercube, or a multi-stage network, cart be used to link the processing nodes.

All the CC-NUMA machines share the common goal of building a scalable shared-memory multiprocessor. The main difference among them is in the way the memory and cache coherence mechanisms are distributed among the processing nodes.

There is another design issue is the selection of the interconnection network between the nodes. They demonstrate progress from bus-based networks towards a more general interconnection network and the snoopy cache coherency protocol towards a directory scheme.

The Wisconsin multicube architecture is the closest generalization of a single bus-based multiprocessor. It completely relies on the snoopy cache protocol but in a hierarchical way. The Aquarius Multi-Multiarchitecture combines the snoopy cache protocol with a directory scheme, but the interconnection network strictly relies on the shared multibus. Both the Wisconsin multicube and the Aquarius Multi-Multi have single processor nodes.

The nodes of the Stanford Dash architecture are more complex. They are realized as single bus-based multiprocessors called clusters. The Dash architecture also combines the snoopy cache protocol and the directory scheme. A snooping scheme ensures the consistency of caches inside the clusters, while the directory scheme maintains consistency across clusters.

In the Dash, the directory protocol is independent of the type of interconnection network, and hence, any of the low-latency networks that were originally developed for multicomputer such as the mesh can be employed. The Stanford FLASH architecture is a further development of the Dash machine by the same research group.

The main goal of the FLASH design was the efficient integration of cache-coherent shared memory with high-performance message passing. Since the cluster concept of the Dash is replaced with one-processor nodes, FLASH applies only a directory scheme for maintaining cache coherence.

Updated on: 23-Jul-2021

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements