Lisp - Set using List vs Set using Hashtable



In this chapter, we're comparision the approaches of Set implemention in Lisp using List as well as Hashtable and their pros/cons.

Set using List

We can implement as Set using a List and can enforce uniqueness by checking if element is already present in the list.

Set Implementation using List

; function to check element in a Set
(defun contains (element set)
   (member element set))

; function to add element to set while checking for duplicate
(defun add (element set)
   (if (contains element set)
      set
      (cons element set)))

; create a Set as list
(defvar my-set '(1 2 3 4 5))

; check if 3 is present, prints (3 4 5) as true in Lisp's context
(print (contains 3 my-set)) 
; prints NIL as false
(print (contains 6 my-set))
; prints (6 1 2 3 4 5) 
(print (add 6 my-set))    
; prints (1 2 3 4 5)
(print (add 3 my-set))    

Output

When you execute the code, it returns the following result −

(3 4 5)
NIL
(6 1 2 3 4 5)
(1 2 3 4 5)

Characteristics of Set as List

  • Simple to Implement and Easy to Learn − We can use List manipulation functions directly for most of the Set operation without having much complexity.

  • Preserves Insertion Order − As we're inserting in a list, we maintains the order of insertion. Although in Set, it is not a necessity.

  • Inefficient Membership check − In order to check an element in Set, we may need to scan the complete list. On average, this leads to time complexity of O(n).

  • Inefficient Addition/Deletion Operations − While adding an element to the set, we're required to make a check for duplcate entry. Which means we may need to scan entire list. Similarly in case of delete operation, we need to scan list which leads to average time complexity as O(n).

  • Poor Space Efficiency − We're creating a set element as cons cells which may involve some overhead while creating a Set of large elements.

Set using HashTable

We can implement as Set using a hashtable which is highly efficient. The values stored in the hashtable are irrelevant as we'll treat elements of Set as keys of the hashtable. This helps in better lookup of key as membership test and most operations are of constant time complexity as O(1).

Set Implementation using HashTable

; Creating an empty hash set
(defvar my-set (make-hash-table))

; Adding elements to the hash set
(setf (gethash 1 my-set) t)
(setf (gethash 2 my-set) t)
(setf (gethash 3 my-set) t)
(setf (gethash 4 my-set) t)
(setf (gethash 5 my-set) t)

; Checking if element is present
; prints T
(print (gethash 3 my-set)) 
; prints NIL
(print (gethash 6 my-set)) 

; Adding an element, as key already exists handles the duplicate automatically
(setf (gethash 3 my-set) t)

; print size of the set as 5
(print (hash-table-count my-set)) 

; remove an element from set
(remhash 1 my-set)
; print size of the table as 4
(print (hash-table-count my-set))
; check removed entry
; prints NIL
(print (gethash 1 my-set))

Output

When you execute the code, it returns the following result −

T
NIL
5
4
NIL

Characteristics of Set as HashTable

  • Efficient Membership Check − To check an element in a Set, we just need to check if key is present in the hashtable which is having an average time complexity of O(1). In worst case, due to hash collision, it may lead to O(n).

  • Efficient Addition/Deletion − Adding and deleting elements to the set as hashtable are highly efficient of average time complexity of O(1).

  • No Order Preservance − HashTables do not preserve order. As Set also is not required to maintain order, no logic is needed to maintain insertion order.

  • Space Overhead − As hashing and table creation is part of creating a Set, it may have some space overhead and may not be space efficient for sets with small number of elements.

  • Good Hashing Algorithm − Operations of Hashtable relies heavily on hashing algorithm. So for custom object, we should deploy a good hashing algorithm.

Set as List vs Set as Hashtable

Following table summarizes the key differences.

Feature Set as List Set as Hashtable
Membership Check (Average Time Complexity) O(n) O(1)
Addition(Average Time Complexity) O(n) O(1)
Deletion(Average Time Complexity) O(n) O(1)
Insertion Order Preserving Yes No
Space Efficiency Good for smaller set Some overhead
Complexity Easy and Simple Complex
Advertisements