What are collision avoidance techniques(DBMS)?

DBMSDatabaseBig Data Analytics

Collision is a problem that occurs when two keys applied on a hash table map to the same location in the hash table.

There are two techniques that are used to avoid collision they are −

• Linear probing.
• Chaining.

Let us discuss each technique in detail.

Linear probing

Linear probing is a strategy for resolving collisions. In this the new key is placed in the closest following empty cell.

Here the elements are stored wherever the hash function maps into a hash table, if that cell is filled then the next consecutive location is searched to store that value. Here generally we use arrays.

Step 1 − Let us take a table T that stores all the records in memory.

Step 2 − If a memory location (h) is already filled then we store the record in the next empty location.

Step 3 − We apply linear search in table T to find an empty memory location T(h), T(h+1), T(h+2), ……..

Record: A, B, C, D, E, X, Y, Z

H(k) : 4, 8, 2, 11, 4, 11, 5, 1

The table for linear probing is given below −

 1 X 2 C 3 Z 4 A 5 E 6 Y 7 8 B 9 10 11 D

The advantage is that the linear probing is very fast, due to locality of reference usage.

The disadvantage is that the linear probing needs five-way independence in the hash function.

Methods to minimise Clustering

There are two methods which are used to minimize clustering. These methods are as follows −

Suppose a record has hash address h, is already filled then we search the memory locations with address h, h+1, h+4, h+9, h+16,……h+i2,…. to decrease the collision.

• Double hashing

The collision is resolved by hashing the hash address again. So hash function Hash(h)= h’, we search the memory location with address h, h+h’, h+2h’, h+3h’,….

• Double Hashing drastically reduces clustering.

• Double Hashing requires fewer comparisons.

• Smaller hash tables can be used.

• Double Hashing minimizes repeated collisions and the effects of clustering, it is free from problems seen in clustering.

• Double Hashing technique fills up the Hash table very frequently so we have performance degrades.

• Below thing makes the processing mechanism slower and de-grading the system.

Chaining

Chaining is known as the Chained Hash Table Mechanism. As the name illustrates, it holds the indexes into pointers to the head of linked-lists.

Here the linked list is used. Each record has two parts, which are as follows −

• Data part to store data.

• Next part is to link the records having the same hash address.

Example

The keys 25, 96, 102, 162, 197 stored in the hash table using the chaining method.

Here,

H(k) : k%5

H(26) =26 % 5= 1

H(44) = 44 % 5 = 4

H(38) = 38 % 5 = 3

H(29) = 29 % 5 =4

H(16) = 16 % 5 =1

The table for chaining will be as shown below −

 0 1 26 16 NULL 2 3 38 NULL 4 44 29 NULL

The advantages of chaining are as follows −

• Chained Hash Tables remain effectively even though the number of keys is stored in different shared locations.

• Collision Reduction