SQL - Self Join



The SQL Self Join

A SELF JOIN is a type of join in SQL where a table is joined with itself. This is used when you need to compare rows within the same table or find relationships among data stored in a single table. You can use aliases to distinguish between the two instances of the same table.

Self Join Explained with Diagram

Suppose an organization, while organizing a Christmas party, is choosing a Secret Santa among its employees based on some colors. It is designed to be done by assigning one color to each of its employees and having them pick a color from the pool of various colors. In the end, they will become the Secret Santa of an employee this color is assigned to.

As we can see in the figure below, the information regarding the colors assigned and a color each employee picked is entered into a table. The table is joined to itself using self join over the color columns to match employees with their Secret Santa.

Self Join

Syntax

Following is the basic syntax of SQL Self Join:

SELECT A.column1,B.column2
FROM table_name A
JOIN table_name B
ON A.common_column = B.common_column;

Example

Self Join only requires one table, so, let us create a CUSTOMERS table containing the customer details like their names, age, address and the salary they earn:

CREATE TABLE CUSTOMERS (
   ID INT NOT NULL,
   NAME VARCHAR (20) NOT NULL,
   AGE INT NOT NULL,
   ADDRESS CHAR (25),
   SALARY DECIMAL (18, 2),       
   PRIMARY KEY (ID)
);

Now, insert values into this table using the INSERT statement as follows:

INSERT INTO CUSTOMERS VALUES
(1, 'Ramesh', 32, 'Ahmedabad', 2000.00 ),
(2, 'Khilan', 25, 'Delhi', 1500.00 ),
(3, 'Kaushik', 23, 'Kota', 2000.00 ),
(4, 'Chaitali', 25, 'Mumbai', 6500.00 ),
(5, 'Hardik', 27, 'Bhopal', 8500.00 ),
(6, 'Komal', 22, 'Hyderabad', 4500.00 ),
(7, 'Muffy', 24, 'Indore', 10000.00 );

The table will be created as:

ID NAME AGE ADDRESS SALARY
1 Ramesh 32 Ahmedabad 2000.00
2 Khilan 25 Delhi 1500.00
3 Kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 Hyderabad 4500.00
7 Muffy 24 Indore 10000.00

Now, let us join this table using the following Self Join query. Our aim is to establish a relationship among the said Customers on the basis of their earnings. We are doing this with the help of the WHERE clause:

SELECT a.ID, b.NAME as EARNS_HIGHER, a.NAME 
as EARNS_LESS, a.SALARY as LOWER_SALARY
FROM CUSTOMERS a, CUSTOMERS b
WHERE a.SALARY < b.SALARY;

The resultant table displayed will list out all the customers that earn lesser than other customers:

ID EARNS_HIGHER EARNS_LESS LOWER_SALARY
2 Ramesh Khilan 1500.00
2 Kaushik Khilan 1500.00
6 Chaitali Komal 4500.00
3 Chaitali Kaushik 2000.00
2 Chaitali Khilan 1500.00
1 Chaitali Ramesh 2000.00
6 Hardik Komal 4500.00
4 Hardik Chaitali 6500.00
3 Hardik Kaushik 2000.00
2 Hardik Khilan 1500.00
1 Hardik Ramesh 2000.00
3 Komal Kaushik 2000.00
2 Komal Khilan 1500.00
1 Komal Ramesh 2000.00
6 Muffy Komal 4500.00
5 Muffy Hardik 8500.00
4 Muffy Chaitali 6500.00
3 Muffy Kaushik 2000.00
2 Muffy Khilan 1500.00
1 Muffy Ramesh 2000.00

Self Join with ORDER BY Clause

A SELF JOIN can be combined with the ORDER BY clause to sort the results after joining a table with itself. This is used when you want to organize hierarchical or relational data, such as listing employees along with their managers in a sorted order.

Syntax

Following is the syntax to use the ORDER BY clause in self join:

SELECT column_name(s)
FROM table1 a, table1 b
WHERE a.common_field = b.common_field
ORDER BY column_name;

Example

Let us join the CUSTOMERS table with itself using self join on a WHERE clause; then, arrange the records in an ascending order using the ORDER BY clause with respect to a specified column, as shown in the following query:

SELECT  a.ID, b.NAME as EARNS_HIGHER, a.NAME 
as EARNS_LESS, a.SALARY as LOWER_SALARY
FROM CUSTOMERS a, CUSTOMERS b
WHERE a.SALARY < b.SALARY
ORDER BY a.SALARY;

The resultant table is displayed as follows:

ID EARNS_HIGHER EARNS_LESS LOWER_SALARY
2 Ramesh Khilan 1500.00
2 Kaushik Khilan 1500.00
2 Chaitali Khilan 1500.00
2 Hardik Khilan 1500.00
2 Komal Khilan 1500.00
2 Muffy Khilan 1500.00
3 Chaitali Kaushik 2000.00
1 Chaitali Ramesh 2000.00
3 Hardik Kaushik 2000.00
1 Hardik Ramesh 2000.00
3 Komal Kaushik 2000.00
1 Komal Ramesh 2000.00
3 Muffy Kaushik 2000.00
1 Muffy Ramesh 2000.00
6 Chaitali Komal 4500.00
6 Hardik Komal 4500.00
6 Muffy Komal 4500.00
4 Hardik Chaitali 6500.00
4 Muffy Chaitali 6500.00
5 Muffy Hardik 8500.00

Not just the salary column, the records can be sorted based on the alphabetical order of names, numerical order of Customer IDs etc.

SQL Self Join for Hierarchical Data

A SELF JOIN can also be used to represent hierarchical data such as a manager-employee relationship, even when the information is stored in the same table. By joining the table with itself, we can identify the relationship between two rows in the same dataset.

Example

Let us assume that the CUSTOMERS table stores information about employees, and we use the ID column as the employee ID and add a new column MANAGER_ID to specify who manages them. For demonstration, we update the table as follows:

ALTER TABLE CUSTOMERS ADD MANAGER_ID INT;

UPDATE CUSTOMERS SET MANAGER_ID = NULL WHERE ID = 1; -- Ramesh is top-level
UPDATE CUSTOMERS SET MANAGER_ID = 1 WHERE ID IN (2,3); -- Khilan, Kaushik report to Ramesh
UPDATE CUSTOMERS SET MANAGER_ID = 2 WHERE ID = 4; -- Chaitali reports to Khilan
UPDATE CUSTOMERS SET MANAGER_ID = 2 WHERE ID = 5; -- Hardik reports to Khilan
UPDATE CUSTOMERS SET MANAGER_ID = 3 WHERE ID = 6; -- Komal reports to Kaushik
UPDATE CUSTOMERS SET MANAGER_ID = 4 WHERE ID = 7; -- Muffy reports to Chaitali

Now, let us write a Self Join query to display employees along with their managers:

SELECT a.NAME AS EMPLOYEE, b.NAME AS MANAGER
FROM CUSTOMERS a
LEFT JOIN CUSTOMERS b
ON a.MANAGER_ID = b.ID;

The resultant table is displayed as follows:

EMPLOYEE MANAGER
Ramesh NULL
Khilan Ramesh
Kaushik Ramesh
Chaitali Khilan
Hardik Khilan
Komal Kaushik
Muffy Chaitali

SQL Self Join vs. Subquery

A SELF JOIN and a subquery can sometimes be used to achieve similar results, but they work differently and are used for different scenarios.

  • SELF JOIN: A table is joined with itself to compare rows within the same table. It is useful for hierarchical data, such as finding employees and their managers or comparing rows based on certain conditions.
  • Subquery: A query inside another query that returns a single value or a set of values. It is used for filtering, aggregation, or checking conditions without directly joining the same table.

In short, a SELF JOIN is more helpful for comparing rows directly, while a subquery is preferred for filtering or calculating values before applying conditions in the main query.

Example

In this example, we use Self Join to find customers earning less than others:

SELECT a.NAME AS EARNS_LESS, a.SALARY, b.NAME AS EARNS_MORE
FROM CUSTOMERS a
JOIN CUSTOMERS b
ON a.SALARY < b.SALARY;

It produces the following table:

EARNS_LESS SALARY EARNS_MORE
Khilan 1500.00 Ramesh
Khilan 1500.00 Kaushik
Komal 4500.00 Chaitali
Kaushik 2000.00 Chaitali
Khilan 1500.00 Chaitali
Ramesh 2000.00 Chaitali
Komal 4500.00 Hardik
Chaitali 6500.00 Hardik
Kaushik 2000.00 Hardik
Khilan 1500.00 Hardik
Ramesh 2000.00 Hardik
Kaushik 2000.00 Komal
Khilan 1500.00 Komal
Ramesh 2000.00 Komal
Komal 4500.00 Muffy
Hardik 8500.00 Muffy
Chaitali 6500.00 Muffy
Kaushik 2000.00 Muffy
Khilan 1500.00 Muffy
Ramesh 2000.00 Muffy

Here, we use subquery for the same logic:

SELECT NAME AS EARNS_LESS, SALARY
FROM CUSTOMERS
WHERE SALARY < (SELECT MAX(SALARY) FROM CUSTOMERS);

Following is the table produced:

EARNS_LESS SALARY
Ramesh 2000.00
Khilan 1500.00
Kaushik 2000.00
Chaitali 6500.00
Hardik 8500.00
Komal 4500.00

We can see that the SELF JOIN explicitly compares each row with others, producing detailed pairwise results. Whereas, the subquery gives a summarized or filtered view without duplicating rows. In general, Self Join is better for comparisons between multiple rows, while Subqueries are useful for aggregated checks.

Why Do We Use Self Join in SQL?

The SELF JOIN is useful in the following scenarios:

  • Finding relationships in the same table: For example, identifying which customers earn less compared to others using salary values.
  • Working with hierarchical data: Representing manager-employee relationships within the CUSTOMERS table using the MANAGER_ID column.
  • Alternative to subqueries: When we want to compare rows directly instead of relying on aggregate functions.
  • Data exploration: Helps in checking dependencies, overlaps, or comparisons within a single table without needing multiple tables.
Advertisements