DirectX - Compute Shaders

In 3D programming, compute shader is a programmable shader stage which calls for the respective buffer. It expands the feature of Microsoft Direct3Dt3d version 11. The shader technology which is included is DirectCOMPUTE technology.

Like vertex and geometry which we discussed in the last chapter, a compute shader is designed and implemented with HLSL where many similarities can be tracked. A compute shader includes high-speed general purpose computing and includes advantage of various large numbers of parallel processors on the graphics processing unit (GPU).

The compute shader includes memory sharing and thread synchronization features which allows more effective parallel programming methods when developer calls the ID3D11DeviceContext::Dispatch or ID3D11DeviceContext::DispatchIndirect method to execute commands in a compute shader.

Implementation of Compute Shader on Direct3D 10.x Hardware

A compute shader from Microsoft Direct3D 10 is also considered as DirectCompute 4.x.

If a user calls Direct3D 11 API and updated drivers, feature level 10 and 10.1 Direct3D hardware can equally support the required form of DirectCompute that uses the cs_4_0 and cs_4_1 profiles.

When user computes DirectCompute on this hardware, following points should be considered in mind −

The maximum number of threads should be limited to GROUP (768) per group.
The X and Y dimension of numthreads is limited to to size of 768.
The Z dimension of numthreads is always limited to 1.
The Z dimension of dispatch is limited with respect to D3D11_CS_4_X_DISPATCH_MAX_THREAD_GROUPS_IN_Z_DIMENSION (1).
Only one shader view can be bound to the shader (D3D11_CS_4_X_UAV_REGISTER_COUNT is 1).

The respective buffers namely RWStructuredBuffers and RWByteAddressBuffers are usually available as unordered-access views. A thread can only access the required region with respect to group shared memory for writing.

SV_GroupIndex or SV_DispatchThreadID is used while accessing a particular group of elements for computing shader. Groupshared memory is usually limited to 16KB per group.

A single thread is limited to 256 byte region of groupshared memory for writing.

Compute Shader on Direct3D 11.x Hardware

A compute shader on Direct3D 11 is termed as DirectCompute 5.0.

When user uses a DirectCompute interface with cs_5_0 profiles, following points should be kept in mind −

The maximum number of threads is limited to (1024) per group. The count is increased 1024.
The X and Y dimension of numthreads is limited to 1024.
The Z dimension of numthreads is limited to 64.

The basic code snippet of creating a compute shader is given below −

ID3D11ComputeShader* g_pFinalPassCS = NULL;
pd3dDevice->CreateComputeShader( pBlobFinalPassCS->GetBufferPointer(),
   pBlobFinalPassCS->GetBufferSize(), NULL, &g_pFinalPassCS );