What is Cray T3D?


Cray T3D is the most recent NUMA machine that was designed to provide a highly scalable parallel supercomputer that can incorporate both the shared memory and the message-passing programming paradigms. As in other NUMA machines, the shared memory is distributed among the processing elements to avoid the memory access bottleneck and there is no hardware support for cache coherency. However, a special software package and programming model, called the CRAFT, manages coherence and guarantees the integrity of the data.

The Cray T3D hardware structure is divided into three parts are as follows −

  • Microarchitecture
  • Macroarchitecture

The microarchitecture is based on Digital’s 21064 Alpha AXP microprocessor which like other contemporary microprocessors, have three main weaknesses −

  • Limited address space
  • Little or no latency-hiding capability
  • Few or no synchronization primitives

The Cray T3D supports four synchronization mechanisms by hardware. The barrier hardware comprises 16 parallel logical AND trees that enable various barriers to be pipelined. When a processor reaches the barrier it must set the associated barrier bit to one. When all the processors have reached the barrier, the AND function is satisfied and clears the barrier bit of each participating processor by hardware, signalling them to continue.

The Cray T3D provides a specialized register set to realize fetch and increment hardware. The contents of these registers are automatically incremented whenever they are read. Messaging is supported by a predefined queue area in the memory of each processing node. Sending a message means a special cache-line-size writes to the queue area of the destination node.

Atomic swap registers are provided to exchange data between a register and a remote memory cell as an individual operation. The latency of an atomic swap can be hidden by using the prefetch technique.

The macroarchitecture defines how to connect and integrate the nodes of a parallel computer, while the microarchitecture specifies the node organization. One of the main design objectives was to maintain the same macroarchitecture even with varying microarchitectures which will always be designed around state-of-the-art commodity microprocessors.

There are two elements of macroarchitecture are the memory system and the interconnection network. The memory system realizes a distributed shared memory where several PE can precisely address any other PEs memory. The physical address has two components as a PE number and an offset inside the PE.

Each PE includes 16 or 64 Mbytes of local DRAM. The latency for accessing remote memory varies between 1 and 2 microseconds. The data cache is resident on Digital’s 21064 Alpha AXP microprocessor which applies a write-through, direct-mapped, read-allocate cache technique.

Updated on: 23-Jul-2021

75 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements