MySQL - Find Duplicate Records



Duplicate records in a table decrease the efficiency of a MySQL database (by increasing the execution time, using unnecessary space, etc.). Thus, locating duplicates becomes necessary to efficiently use the database.

We can, however, also prevent users from entering duplicate values into a table, by adding constraints on the desired column(s), such as PRIMARY KEY and UNIQUE constraints.

But, due to various reasons like, human error, an application bug or data extracted from external resources, if duplicates are still entered into the database, there are various ways to find the records. Using SQL GROUP BY and HAVING clauses is one of the common ways to filter records containing duplicates.

Finding Duplicate Records

Before finding the duplicate records in a table we need to define the criteria for which we need the duplicate records for. You can do this in two steps −

  • First of all, we need to group all the rows by the columns on which you want to check the duplicity on, using the GROUPBY clause.

  • Then Using the Having clause and the count function then, we need to verify whether any of the above formed groups have more than 1 entity.

Example

First of all, let us create a table with the name CUSTOMERS using the following query −

CREATE TABLE CUSTOMERS (
   ID INT NOT NULL,
   NAME VARCHAR (20) NOT NULL,
   AGE INT NOT NULL,
   ADDRESS CHAR (25),
   SALARY DECIMAL (18, 2),
   PRIMARY KEY (ID)
);

Now, let us insert some duplicate records into the above-created table using the INSERT IGNORE INTO statement as shown below −

INSERT INTO CUSTOMERS VALUES
(1, 'Ramesh', 32, 'Ahmedabad', 2000.00),
(2, 'Khilan', 25, 'Delhi', 1500.00),
(3, 'Kaushik', 23, 'Kota', 2000.00),
(4, 'Chaitali', 25, 'Mumbai', 6500.00),
(5, 'Hardik', 27, 'Bhopal', 8500.00),
(6, 'Komal', 22, 'Hyderabad', 4500.00),
(7, 'Muffy', 24, 'Indore', 10000.00);

The table is created as −

ID NAME AGE ADDRESS SALARY
1 Ramesh 32 Ahmedabad 2000.00
2 Khilan 25 Delhi 1500.00
3 Kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 Hyderabad 4500.00
7 Muffy 24 Indore 10000.00

On the following query, we are trying to return the count of duplicate records using the MySQL COUNT() function −

SELECT SALARY, COUNT(SALARY) 
AS "COUNT" FROM CUSTOMERS
GROUP BY SALARY 
ORDER BY SALARY;

Output

The output for the query above is produced as given below −

SALARY COUNT
1500.00 1
2000.00 2
4500.00 1
6500.00 1
8500.00 1
10000.00 1

With Having Clause

The HAVING clause in MySQL can be used to filter conditions for a group of rows in a table. Here, we are going to use the HAVING clause with the COUNT() function to find the duplicate values in one or more columns of a table.

Duplicates values in single column

Following are the steps to find the duplicate values in a single column of a table:

Step-1: Firstly, we need to use the GROUP BY clause to group all rows in the column that we want to check the duplicates.

Step-2: Then , to find duplicate groups, use COUNT() function in the HAVING clause to check if any group has more than one element.

Example

Using the following query, we can find all rows that have duplicate DOG_NAMES in the PETS table −

SELECT SALARY, COUNT(SALARY) 
FROM CUSTOMERS
GROUP BY SALARY
HAVING COUNT(SALARY) > 1;

Output

The output is as follows −

SALARY COUNT
2000.00 2

Duplicate Values in Multiple Columns

We can use the AND operator in the HAVING clause to find the duplicate rows in multiple columns. The rows are considered duplicate only when the combination of columns are duplicate.

Example

In the following query, we are finding rows in the PETS table with duplicate records in DOG_NAME, AGE, OWNER_NAME columns −

SELECT SALARY, COUNT(SALARY),
AGE, COUNT(AGE)
FROM CUSTOMERS
GROUP BY SALARY, AGE
HAVING  COUNT(SALARY) > 1
AND COUNT(AGE) > 1;

Output

The output is as follows −

SALARY COUNT AGE COUNT
2000.00 2 23 2

The ROW_NUMBER() function with PARTITION BY

In MySQL, the ROW_NUMBER() function and PARTITION BY clause can be used to find duplicate records in a table. The partition clause divides the table based on a specific column or multiple columns, then the ROW_NUMBER() function assigns a unique row number to each row within each partition. Rows with the same partition and row number are considered duplicates rows.

Example

In the following query, we are assigning a

SELECT *, ROW_NUMBER() OVER (
   PARTITION BY SALARY, AGE
   ORDER BY SALARY, AGE
) AS row_numbers
FROM CUSTOMERS;

Output

The output for the query above as follows −

ID NAME AGE ADDRESS SALARY row_numbers
2 Khilan 25 Delhi 1500.00 1
1 Ramesh 23 Ahmedabad 2000.00 1
3 Kaushik 23 Kota 2000.00 2
4 Chaitali 25 Mumbai 6500.00 1
5 Hardik 27 Bhopal 8500.00 1
6 Komal 22 Hyderabad 4500.00 1
7 Muffy 24 Indore 10000.00 1

Find Duplicate Records Using Client Program

We can also find duplicates records using Client Program.

Syntax

To find the duplicate records through a PHP program, we need to group all the rows by column using the GROUPBY clause and then use the COUNT function to count the duplicates. For this, we need to execute the SELECT statement using the mysqli function query() as follows −

$sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY";
$mysqli->query($sql);

To find the duplicate records through a JavaScript program, we need to group all the rows by column using the GROUPBY clause and then use the COUNT function to count the duplicates. For this, we need to execute the SELECT statement using the query() function of mysql2 library as follows −

sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY";
con.query(sql)

To find the duplicate records through a Java program, we need to group all the rows by column using the GROUPBY clause and then use the COUNT function to count the duplicates. For this, we need to execute the SELECT statement using the JDBC function executeQuery() as follows −

String sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY";
statement.executeQuery(sql);

To find the duplicate records through a Python program, we need to group all the rows by column using the GROUPBY clause and then use the COUNT function to count the duplicates. For this, we need to execute the SELECT statement using the execute() function of the MySQL Connector/Python as follows −

duplicate_records_query = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY"
cursorObj.execute(duplicate_records_query)

Example

Following are the programs −

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$db = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); //let's create a table $sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null)"; if($mysqli->query($sql)){ printf("Pets table created successfully...!\n"); } //now lets insert some duplicate records; $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(2, 'Harry', 2, 'Jack')"; if($mysqli->query($sql)){ printf("Third records inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(3, 'Sheero', 1, 'Rose')"; if($mysqli->query($sql)){ printf("Fourth record inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(4, 'Simba', 2, 'Rahul')"; if($mysqli->query($sql)){ printf("Fifth record inserted successfully...!\n"); } //display the table records $sql = "SELECT * FROM PETS"; if($result = $mysqli->query($sql)){ printf("Table records: \n"); while($row = mysqli_fetch_array($result)){ printf("ID: %d, DOG_NAME %s, AGE: %d,OWNER_NAME: %s ", $row['ID'], $row['DOG_NAME'], $row['AGE'], $row['OWNER_NAME']); printf("\n"); } } //now lets group the all rows to find duplicate records... $sql = "SELECT ID, DOG_NAME, AGE, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID"; if($result = $mysqli->query($sql)){ printf("Table duplicate records: \n"); while($row = mysqli_fetch_array($result)){ printf("ID: %d, DOG_NAME %s, AGE: %d, OWNER_NAME: %s ", $row['ID'], $row['DOG_NAME'], $row['AGE'], $row['OWNER_NAME'], $row['Count']); printf("\n"); } } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();

Output

The output obtained is as shown below −

Pets table created successfully...!
First record inserted successfully...!
Second record inserted successfully...!
Third records inserted successfully...!
Fourth record inserted successfully...!
Fifth record inserted successfully...!
Table records:
ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal
ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal
ID: 2, DOG_NAME Harry, AGE: 2,OWNER_NAME: Jack
ID: 3, DOG_NAME Sheero, AGE: 1,OWNER_NAME: Rose
ID: 4, DOG_NAME Simba, AGE: 2,OWNER_NAME: Rahul
Table duplicate records:
ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal
ID: 2, DOG_NAME Harry, AGE: 2,OWNER_NAME: Jack
ID: 3, DOG_NAME Sheero, AGE: 1,OWNER_NAME: Rose
ID: 4, DOG_NAME Simba, AGE: 2,OWNER_NAME: Rahul     

var mysql = require('mysql2');
var con = mysql.createConnection({
    host: "localhost",
    user: "root",
    password: "Nr5a0204@123"
});

// Connecting to MySQL
con.connect(function (err) {
    if (err) throw err;
    console.log("Connected!");
    console.log("--------------------------");

    // Create a new database
    sql = "Create Database TUTORIALS";
    con.query(sql);

    sql = "USE TUTORIALS";
    con.query(sql);

    //Creating TABLE table
    sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null);"
    con.query(sql);

    sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1,'Fluffy', 1, 'Micheal'),(1,'Fluffy', 1, 'Micheal'),(2,'Harry', 2, 'Jack'),(3,'Sheero', 1, 'Rose'),(4,'Simba', 2, 'Rahul'),(3,'Sheero', 1, 'Rose'),(3,'Sheero', 1, 'Rose');"
    con.query(sql);

    sql = "SELECT * FROM Pets;"
    con.query(sql, function(err, result){
      if (err) throw err
      console.log("**Records in Pets Table**");
      console.log(result);
      console.log("--------------------------");
    });

    sql = "SELECT ID, DOG_NAME, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID";
    con.query(sql, function(err, result){
      if (err) throw err
      console.log("**Count of duplicate records:**");
      console.log(result);
    });
});  

Output

The output obtained is as shown below −

 
Connected!
--------------------------
**Records in Pets Table**
[
  { ID: 1, DOG_NAME: 'Fluffy', AGE: 1, OWNER_NAME: 'Micheal' },
  { ID: 1, DOG_NAME: 'Fluffy', AGE: 1, OWNER_NAME: 'Micheal' },
  { ID: 2, DOG_NAME: 'Harry', AGE: 2, OWNER_NAME: 'Jack' },
  { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' },
  { ID: 4, DOG_NAME: 'Simba', AGE: 2, OWNER_NAME: 'Rahul' },
  { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' },
  { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' }
]
--------------------------
**Count of duplicate records:**
[
  { ID: 1, DOG_NAME: 'Fluffy', OWNER_NAME: 'Micheal', Count: 2 },
  { ID: 2, DOG_NAME: 'Harry', OWNER_NAME: 'Jack', Count: 1 },
  { ID: 3, DOG_NAME: 'Sheero', OWNER_NAME: 'Rose', Count: 3 },
  { ID: 4, DOG_NAME: 'Simba', OWNER_NAME: 'Rahul', Count: 1 }
]
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class FindDuplicates {
  public static void main(String[] args) {
    String url = "jdbc:mysql://localhost:3306/TUTORIALS";
    String user = "root";
    String password = "password";
    ResultSet rs;
    try {
      Class.forName("com.mysql.cj.jdbc.Driver");
            Connection con = DriverManager.getConnection(url, user, password);
            Statement st = con.createStatement();
            //System.out.println("Database connected successfully...!");
            String sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null)";
            st.execute(sql);
            System.out.println("Table Pets created successfully...!");
            //let's insert some records into it...
            String sql1 = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal'), (1, 'Fluffy', 1, 'Micheal'),  (3, 'Sheero', 1, 'Rose'), (4, 'Simba', 2, 'Rahul')";
            st.execute(sql1);
            System.out.println("Records inserted successfully....!");
            String sql2 = "SELECT * FROM PETS";
            rs = st.executeQuery(sql2);
            System.out.println("Table records: ");
            while(rs.next()) {
              String id = rs.getString("ID");
              String dog_name = rs.getString("DOG_NAME");
              String age = rs.getString("AGE");
              String owner_name = rs.getString("OWNER_NAME");
              System.out.println("Id: " + id + ", Dog_name: " + dog_name + ", Age: " + age + ", Owner_name: " + owner_name);
            }
            //lets find duplicate records
            String sql3 = "SELECT ID, DOG_NAME, AGE, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID";
            rs = st.executeQuery(sql3);
            System.out.println("Table records are(with duplicate counts): ");
            while(rs.next()) {
              String id = rs.getString("ID");
              String dog_name = rs.getString("DOG_NAME");
              String age = rs.getString("AGE");
              String owner_name = rs.getString("OWNER_NAME");
              String t_count = rs.getString("Count");
              System.out.println("Id: " + id + ", Dog_name: " + dog_name + ", Age: " + age + ", Owner_name: " + owner_name + ", T_count: " + t_count);
            }
    }catch(Exception e) {
      e.printStackTrace();
    }
  }
}

Output

The output obtained is as shown below −

Table Pets created successfully...!
Records inserted successfully....!
Table records: 
Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal
Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal
Id: 3, Dog_name: Sheero, Age: 1, Owner_name: Rose
Id: 4, Dog_name: Simba, Age: 2, Owner_name: Rahul
Table records are(with duplicate counts): 
Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal, T_count: 2
Id: 3, Dog_name: Sheero, Age: 1, Owner_name: Rose, T_count: 1
Id: 4, Dog_name: Simba, Age: 2, Owner_name: Rahul, T_count: 1  
import mysql.connector
# Establishing the connection
connection = mysql.connector.connect(
    host='localhost',
    user='root',
    password='password',
    database='tut'
)
# Creating a cursor object
cursorObj = connection.cursor()
# Creating the table 'Pets'
create_table_query = '''
CREATE TABLE Pets (
ID int,
DOG_NAME varchar(30) not null,
AGE int not null,
OWNER_NAME varchar(30) not null
);
'''
cursorObj.execute(create_table_query)
print("Table 'Pets' is created successfully!")
# Inserting records into 'Pets' table
sql = "INSERT IGNORE INTO Pets (ID, DOG_NAME, AGE, OWNER_NAME) VALUES (%s, %s, %s, %s);"
values = [
    (1, 'Fluffy', 1, 'Micheal'),
    (1, 'Fluffy', 1, 'Micheal'),
    (2, 'Harry', 2, 'Jack'),
    (3, 'Sheero', 1, 'Rose'),
    (4, 'Simba', 2, 'Rahul'),
    (3, 'Sheero', 1, 'Rose'),
    (3, 'Sheero', 1, 'Rose')
]
cursorObj.executemany(sql, values)
print("Values inserted successfully")
# Display table
display_table = "SELECT * FROM Pets;"
cursorObj.execute(display_table)
# Printing the table 'Pets'
results = cursorObj.fetchall()
print("\nPets Table:")
for result in results:
    print(result)
# Return the count of duplicate records
duplicate_records_query = """
SELECT ID, DOG_NAME, OWNER_NAME, COUNT(*) AS Count FROM Pets
GROUP BY ID, DOG_NAME, OWNER_NAME
ORDER BY ID;
"""
cursorObj.execute(duplicate_records_query)
dup_rec = cursorObj.fetchall()
print("\nDuplicate records:")
for record in dup_rec:
    print(record)
# Closing the cursor and connection
cursorObj.close()
connection.close()

Output

The output obtained is as shown below −

Table 'Pets' is created successfully!
Values inserted successfully

Pets Table:
(1, 'Fluffy', 1, 'Micheal')
(1, 'Fluffy', 1, 'Micheal')
(2, 'Harry', 2, 'Jack')
(3, 'Sheero', 1, 'Rose')
(4, 'Simba', 2, 'Rahul')
(3, 'Sheero', 1, 'Rose')
(3, 'Sheero', 1, 'Rose')

Duplicate records:
(1, 'Fluffy', 'Micheal', 2)
(2, 'Harry', 'Jack', 1)
(3, 'Sheero', 'Rose', 3)
(4, 'Simba', 'Rahul', 1)
Advertisements