- DBMS - Home
- DBMS - Overview
- DBMS - Architecture
- DBMS - Data Models
- DBMS - Data Schemas
- DBMS - Data Independence
- DBMS - System Environment
- Centralized and Client/Server Architecture
- DBMS - Classification
- Relational Model
- DBMS - Codd's Rules
- DBMS - Relational Data Model
- DBMS - Relational Model Constraints
- DBMS - Relational Database Schemas
- DBMS - Handling Constraint Violations
- Entity Relationship Model
- DBMS - ER Model Basic Concepts
- DBMS - ER Diagram Representation
- Relationship Types and Relationship Sets
- DBMS - Weak Entity Types
- DBMS - Generalization, Aggregation
- DBMS - Drawing an ER Diagram
- DBMS - Enhanced ER Model
- Subclass, Superclass and Inheritance in EER
- Specialization and Generalization in Extended ER Model
- Data Abstraction and Knowledge Representation
- Relational Algebra
- DBMS - Relational Algebra
- Unary Relational Operation
- Set Theory Operations
- DBMS - Database Joins
- DBMS - Division Operation
- DBMS - ER to Relational Model
- Examples of Query in Relational Algebra
- Relational Calculus
- Tuple Relational Calculus
- Domain Relational Calculus
- Relational Database Design
- DBMS - Functional Dependency
- DBMS - Inference Rules
- DBMS - Minimal Cover
- Equivalence of Functional Dependency
- Finding Attribute Closure and Candidate Keys
- Relational Database Design
- DBMS - Keys
- Super keys and candidate keys
- DBMS - Foreign Key
- Finding Candidate Keys
- Normalization in Database Designing
- Database Normalization
- First Normal Form
- Second Normal Form
- Third Normal Form
- Boyce Codd Normal Form
- Difference Between 4NF and 5NF
- Structured Query Language
- Types of Languages in SQL
- Querying in SQL
- CRUD Operations in SQL
- Aggregation Function in SQL
- Join and Subquery in SQL
- Views in SQL
- Trigger and Schema Modification
- Storage and File Structure
- DBMS - Storage System
- DBMS - File Structure
- DBMS - Secondary Storage Devices
- DBMS - Buffer and Disk Blocks
- DBMS - Placing File Records on Disk
- DBMS - Ordered and Unordered Records
- Indexing and Hashing
- DBMS - Indexing
- DBMS - Single-Level Ordered Indexing
- DBMS - Multi-level Indexing
- Dynamic B- Tree and B+ Tree
- DBMS - Hashing
- Query Processing and Optimization
- Heuristics in Query Processing
- Transaction and Concurrency
- DBMS - Transaction
- DBMS - Scheduling Transactions
- DBMS - Testing Serializability
- DBMS - Conflict Serializability
- DBMS - View Serializability
- DBMS - Concurrency Control
- DBMS - Lock Based Protocol
- DBMS - Timestamping based Protocol
- DBMS - Phantom Read Problem
- DBMS - Dirty Read Problem
- DBMS - Thomas Write Rule
- DBMS - Deadlock
- Backup and Recovery
- DBMS - Data Backup
- DBMS - Data Recovery
- DBMS Useful Resources
- DBMS - Quick Guide
- DBMS - Useful Resources
- DBMS - Discussion
DBMS - Algorithms for Selection and Join Operations
When there are multiple tables in a database, we might need to join them sometimes for some operations. The selection operation extracts specific records based on given conditions. The join operation combines the records from different tables using matching attributes.
In this chapter, we will explain in detail the fundamental algorithms for selection and join operations with examples. In addition, we will also cover the real-world applications and performance considerations for these operations.
Selection and Join Operations in DBMS
Selection operations filter the records based on a condition. For example, retrieving employees records with salaries over Rs. 50,000.
Join operations, on the other hand, combine the records from two or more tables. Join operations link related data through common attributes. These operations form the backbone of SQL queries like WHERE, JOIN, and SELECT.
Selection and join are critical operations and they require efficient algorithms. Else, searching through millions of records sequentially would be too slow. Therefore, database systems implement optimized algorithms to reduce disk I/O and computation time.
Algorithms for Selection Operations
There are various algorithms that work behind the selection operations that fetch data from a table based on a specific set of criteria. Depending on the data structure, indexing, and file organization, databases choose appropriate selection methods.
Simple Selection Methods
Let us see the very basic common selection methods using real-world examples.
Linear Search (Brute Force)
Here the database scans every record in the file to see if it meets the condition. This is the simplest method, but the least efficient too.
- Example − Searching for an employee with SSN '123456789' requires checking every record.
- Use Case − This method is useful when no index exists or when working with small datasets.
Binary Search
If the file is sorted, we can apply binary searches. It divides the dataset into halves until the record is found.
- Example − Searching for an employee by SSN in a sorted list takes fewer steps because the search narrows down quickly.
- Best For − It works well for sorted files with unique keys like SSNs or employee IDs.
Primary Index Search
In this method, the system uses a primary index to jump directly to the relevant record.
- Example − Finding an employee with SSN '123456789' using a primary index is instant.
- Use Case − Works best when the search attribute is a unique key.
Hash Key Search
This hashing method uses a unique hash key to calculate the record's location.
- Example − An employee's SSN can be hashed to a specific memory location. It allows bypassing the need for a full search.
- Best For − When exact key-based retrieval is needed.
Advanced Selection Methods
Let us now take a look at some advanced selection methods and when they are used.
Using a Primary Index for Ranges
This is used for range queries. Range queries use a primary index to retrieve records with values greater than or less than a given number.
Example − Fetching departments with Dnumber > 5 using a primary index.
Clustering Index Search
If a clustering index exists for the tables on a non-key attribute, the database retrieves all the matching records.
Example − Finding all employees in department 5 using Dno = 5.
Secondary Index Search
This advanced searching method uses a secondary index. It allows searching based on non-unique fields. It works for both unique and non-unique attributes.
Example − Searching for employees by job title or last name, even if multiple employees share the same values.
Algorithms for Join Operations
The join operation combines the records from two or more tables using matching attributes. These algorithms are a little complex to represent in query execution.
Common Join Algorithms
Let us see some of join operations. We will explain different join algorithms using the EMPLOYEE and DEPARTMENT tables.
Nested-Loop Join
The simplest version of join algorithm is the nested loop join. Here the system reads each record from one table and compares it with every record in the second table. It is done by checking the matching conditions.
Example − Joining EMPLOYEE with DEPARTMENT by matching Dno with Dnumber.
Performance Insight − If the EMPLOYEE table has 2000 blocks and the DEPARTMENT table has 10 blocks, then choosing the smaller table as the outer loop reduces the number of read operations.
Access Cost Example
- If DEPARTMENT is the outer file, total reads = 4010
- If EMPLOYEE is the outer file, total reads = 6000
So, selecting DEPARTMENT as the outer file saves 1990 disk reads!
Single-Loop Join with Index
If an index exists, the system only searches only those records that match the join condition. It skips the unrelated data very efficiently.
Example − Joining DEPARTMENT and EMPLOYEE by matching Mgr_ssn with Ssn using an index on Ssn.
Calculation Example − Consider two tables with 6000 employees and 50 departments. Using the index only requires 260 block accesses compared to 20,000 without the index.
Sort-Merge Join
In this method, both the tables are sorted by the join attribute, and the records are merged efficiently in one pass.
Example − Joining sorted EMPLOYEE and DEPARTMENT tables by Dno and Dnumber.
This method is best for large datasets already sorted or when sorting can be done quickly.
Partition-Hash Join
In the partition-hash method, the records from both the tables are partitioned using a hash function. Matching partitions are compared to find the matching records.
- Example − Partitioning the EMPLOYEE table and the DEPARTMENT table by Dno and Dnumber and then merging the matching partitions.
- Best For − When the datasets are too large to fit into memory, and hashing speeds up comparisons.
The Join Selection Factor
The Join Selection Factor is a metric that indicates how many records in one table will match those in another. A higher selection factor means fewer records are checked, which makes the join operation more efficient.
Example − Consider the operation for joining the DEPARTMENT (50 records) table with the EMPLOYEE (6000 records) table. It results in a selection factor of 1, which essentially means every department has one matching employee.
Real-World Applications of Selection and Join Algorithms
Selection and join algorithms are used in different real-world applications as listed below −
- DBMS − SQL engines rely on selection and join algorithms for query optimization.
- Data Warehousing − Processing large-scale business data requires efficient joins and selections.
- Big Data Platforms − Technologies like Hadoop and Spark use similar concepts for processing massive datasets.
Conclusion
In this chapter, we explored various algorithms for selection and join operations. We covered simple selection methods like linear search, binary search, and index-based searches. We also covered join algorithms including nested-loop, single-loop with an index, sort-merge, and partition-hash joins. We presented a set of real-world examples to highlight performance calculations and how the databases combine and retrieve the records efficiently.