Teradata - Hashing Algorithm
A row is assigned to a particular AMP based on the primary index value. Teradata uses hashing algorithm to determine which AMP gets the row.
Following is a high level diagram on hashing algorithm.
Following are the steps to insert the data.
The client submits a query.
The parser receives the query and passes the PI value of the record to the hashing algorithm.
The hashing algorithm hashes the primary index value and returns a 32 bit number, called Row Hash.
The higher order bits of the row hash (first 16 bits) is used to identify the hash map entry. The hash map contains one AMP #. Hash map is an array of buckets which contains specific AMP #.
BYNET sends the data to the identified AMP.
AMP uses the 32 bit Row hash to locate the row within its disk.
If there is any record with same row hash, then it increments the uniqueness ID which is a 32 bit number. For new row hash, uniqueness ID is assigned as 1 and incremented whenever a record with same row hash is inserted.
The combination of Row hash and Uniqueness ID is called as Row ID.
Row ID prefixes each record in the disk.
Each table row in the AMP is logically sorted by their Row IDs.
How Tables are Stored
Tables are sorted by their Row ID (Row hash + uniqueness id) and then stored within the AMPs. Row ID is stored with each data row.
|Row Hash||Uniqueness ID||EmployeeNo||FirstName||LastName|
|2A01 2611||0000 0001||101||Mike||James|
|2A01 2612||0000 0001||104||Alex||Stuart|
|2A01 2613||0000 0001||102||Robert||Williams|
|2A01 2614||0000 0001||105||Robert||James|
|2A01 2615||0000 0001||103||Peter||Paul|