MySQL - Handling Duplicates



Tables or result sets in a database usually contain duplicate records. While duplicates are generally allowed, there are situations where it is necessary to prevent them. In such cases, it becomes essential to identify and remove duplicate records from a database table.

Importance of Handling MySQL Duplicates

There are various reasons why handling duplicates in a database becomes necessary. One of the main reasons is that the existence of duplicates in an organizational database will lead to logical errors. In addition to it, we need to handle redundant data to prevent the following consequences −

  • Duplicate data occupies storage space, reducing the efficiency of database usage and increasing storage costs.
  • Dealing with duplicate records consumes additional resources, driving up the overall cost of maintaining the database.
  • Duplicates in a database can lead to logical errors in data, affecting the integrity and reliability of the information stored.

Preventing Duplicate Entries

You can use a PRIMARY KEY or a UNIQUE Index on a table with the appropriate fields to prevent duplicate record entries into a table.

Example

The following table contains no such index or primary key, so it would allow duplicate records for first_name and last_name.

CREATE TABLE CUSTOMERS (
   first_name CHAR(20),
   last_name CHAR(20),
   sex CHAR(10)
);

To prevent multiple records with the same first and last name values from being created in this table, add a PRIMARY KEY to its definition. When you do this, it is also necessary to declare the indexed columns to be NOT NULL, because a PRIMARY KEY does not allow NULL values −

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

Using INSERT IGNORE Query −

The existence of a unique index in a table normally causes an error when attempting to insert a record that duplicates an existing record in the indexed column(s).

To handle this situation without generating an error, you can use the "INSERT IGNORE" command. When a record is not a duplicate, MySQL inserts it as usual. However, if the record is duplicate, the "IGNORE" keyword instructs MySQL to discard it without producing an error.

The provided example does not result in an error, and it also ensures that duplicate records are not inserted −

INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) 
VALUES ('Jay', 'Thomas'), ('Jay', 'Thomas');

We get the following output −

Query OK, 1 row affected, 1 warning (0.01 sec)
Records: 2  Duplicates: 1  Warnings: 1

Using REPLACE Query −

Instead of using the INSERT command, consider using the REPLACE command. When dealing with a new record, it is inserted just as with INSERT. However, if it is a duplicate, the new record replaces the old one.

REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) 
VALUES ( 'Ajay', 'Kumar'), ( 'Ajay', 'Kumar');

Following is the output of the above code −

Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

Your choice between the INSERT IGNORE and REPLACE commands should depend on the specific duplicate-handling behaviour you wish to achieve. The INSERT IGNORE command retains the first set of duplicated records and discards the remaining. On the other hand, the REPLACE command keeps the last set of duplicates and removes any earlier instances.

Using UNIQUE Constraint −

Another way to enforce uniqueness is to add a UNIQUE index rather than a PRIMARY KEY to a table −

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   UNIQUE (last_name, first_name)
);

Counting and Identifying Duplicates

You can use the COUNT function and GROUP BY clause to count and identify duplicate records based on specific columns.

Example

Following is the query to count duplicate records with first_name and last_name in a table −

SELECT COUNT(*) as repetitions, last_name, first_name
FROM CUSTOMERS
GROUP BY last_name, first_name
HAVING repetitions > 1;

This query will return a list of all the duplicate records in the CUSTOMERS table. In general, to identify sets of values that are duplicated, follow the steps given below.

  • Determine which columns may contain duplicated values.

  • Include those columns in the column selection list, along with COUNT(*).

  • List the columns in the GROUP BY clause as well.

  • Apply a HAVING clause to filter unique values by requiring the group counts to be greater than one.

Eliminating Duplicates from a Query Result

You can use the DISTINCT command along with the SELECT statement to find out unique records available in a table.

SELECT DISTINCT last_name, first_name
FROM CUSTOMERS
ORDER BY last_name;

An alternative to the DISTINCT command is to add a GROUP BY clause that specifies the columns you are selecting. This approach eliminates duplicates and retrieves only the unique combinations of values from the specified columns.

SELECT last_name, first_name
FROM CUSTOMERS
GROUP BY (last_name, first_name);

Removing Duplicates Using Table Replacement

If you have duplicate records in a table and you want to remove all the duplicate records from that table, then follow the procedure given below −

CREATE TABLE tmp AS 
SELECT DISTINCT last_name, first_name, sex
FROM CUSTOMERS;

DROP TABLE CUSTOMERS;
ALTER TABLE tmp RENAME TO CUSTOMERS;

Handling Duplicates Using a Client Program

We can also handle duplicate using the client program.

Syntax

To handle duplicates value through a PHP program, we need to execute the "INSERT IGNORE" statement using the mysqli function query() as follows −

$sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
$mysqli->query($sql);

To handle duplicates value through a JavaScript program, we need to execute the "INSERT IGNORE" statement using the query() function of mysql2 library as follows −

sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
con.query(sql);

To handle duplicates value through a Java program, we need to execute the "INSERT IGNORE" statement using the JDBC function execute() as follows −

String sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
statement.execute(sql);

To handle duplicates value through a Python program, we need to execute the "INSERT IGNORE" statement using the execute() function of the MySQL Connector/Python as follows −

sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
cursorObj.execute(sql)

Example

Following are the programs −

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$db = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); $sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; if($mysqli->query($sql)){ printf("Table created successfully...!\n"); } //let's insert some records $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } //now lets insert duplicate record with IGNORE keyword $sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("Duplicate record inserted successfully using IGNORE keyword...!\n"); } $sql = "SELECT * from person_tbl"; if($result = $mysqli->query($sql)){ printf("Table records: \n"); while($row = mysqli_fetch_array($result)){ printf("First Name: %s, Last name: %s, Sex: %s", $row['first_name'], $row['last_name'], $row['sex']); printf("\n"); } } //lets insert a duplicate record $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if(!$mysqli->query($sql)){ printf("You can't insert any duplicate records...!\n"); } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();

Output

The output obtained is as shown below −

Table created successfully...!
First record inserted successfully...!
Second record inserted successfully...!
Duplicate record inserted successfully using IGNORE keyword...!
Table records:
First Name: Thomas, Last name: Jay, Sex:
First Name: Smith, Last name: John, Sex:
PHP Fatal error:  Uncaught mysqli_sql_exception: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' in D:\test\handlingduplicates.php:48    
var mysql = require('mysql2');
var con = mysql.createConnection({
host:"localhost",
user:"root",
password:"password"
});

 //Connecting to MySQL
 con.connect(function(err) {
 if (err) throw err;
//   console.log("Connected successfully...!");
//   console.log("--------------------------");
 sql = "USE TUTORIALS";
 con.query(sql);
 //create table
 sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))";
 con.query(sql, function(err, result){
    if (err) throw err;
    console.log("Table created successfully....!");
    });
//now let's insert some records
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
con.query(sql, function(err, result){
    if (err) throw err;
    console.log("First record inserted successfully...!");
    });
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";
con.query(sql, function(err, result){
    if (err) throw err;
    console.log("Second record inserted successfully...!");
    });
//now lets insert duplicate record with IGNORE keyword
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
    con.query(sql, function(err, result){
        console.log("Insert duplicate record with IGNORE keyword")
    if (err) throw err;
    console.log("Duplicate record inserted successfully with the help of IGNORE keyword");
    });
//lets insert a duplicate record
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";
    con.query(sql, function(err, result){
        console.log("Insert duplicate record");
    if (err) throw err;
    console.log("You can't insert the duplicate record because columns are primary key");
    });
sql = "SELECT * FROM person_tbl";
con.query(sql, function(err, result){
    console.log("Table records(with ID auto_increment sequence).");
    if (err) throw err;
    console.log(result);
    });
});   

Output

The output obtained is as shown below −

Table created successfully....!
First record inserted successfully...!
Second record inserted successfully...!
Insert duplicate record with IGNORE keyword
Duplicate record inserted successfully with the help of IGNORE keyword
Insert duplicate record
Insert duplicate record
D:\test1\duplicate.js:43
    if (err) throw err;              ^

Error: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class HandleDuplicates {
   public static void main(String[] args) {
      String url = "jdbc:mysql://localhost:3306/TUTORIALS";
      String user = "root";
      String password = "password";
      ResultSet rs;
      try {
         Class.forName("com.mysql.cj.jdbc.Driver");
            Connection con = DriverManager.getConnection(url, user, password);
            Statement st = con.createStatement();
            //System.out.println("Database connected successfully...!");
            //create table
            String sql = "CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))";
            st.execute(sql);
            System.out.println("Table created successfully....!");
            //let's insert some records
            String sql1 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
            st.execute(sql1);
            System.out.println("Record inserted successfully...!");
            String sql2 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
            st.execute(sql2);
            System.out.println("Duplicate record inserted successfully...!");
            //lets print the table records
            String sql3 = "SELECT * FROM PERSON_TBL";
            rs = st.executeQuery(sql3);
            System.out.println("Table records: ");
            while(rs.next()) {
               String fname = rs.getString("first_name");
               String lname = rs.getString("last_name");
               System.out.println("First name: " + fname + ", Last name: " + lname);
            }
      }catch(Exception e) {
         e.printStackTrace();
      }
   }
}   

Output

The output obtained is as shown below −

Table created successfully....!
Record inserted successfully...!
Duplicate record inserted successfully...!
Table records: 
First name: Thomas, Last name: Jay
import mysql.connector
# Connecting to MySQL
con = mysql.connector.connect(
    host="localhost",
    user="root",
    password="password",
    database="tut"
)
# Creating a cursor object
cursorObj = con.cursor()
# Creating the table
create_table_query = """
CREATE TABLE person_tbl (
    first_name CHAR(20) NOT NULL,
    last_name CHAR(20) NOT NULL,
    sex CHAR(10),
    PRIMARY KEY (last_name, first_name)
)
"""
cursorObj.execute(create_table_query)
print("Table 'person_tbl' is created successfully!")
# Inserting some records
first_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
print("First record inserted successfully!")
cursorObj.execute(first_record)

second_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"
print("Second record inserted successfully!")
cursorObj.execute(second_record)
# Insert duplicate record with IGNORE keyword
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
print("Duplicate record inserted successfully with the help of IGNORE keyword")
cursorObj.execute(sql)
# Insert a duplicate record (this will throw an error)
try:
    cursorObj.execute("INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')")
    print("Insert duplicate record")
except mysql.connector.Error as err:
    print("Insert duplicate record error:", err)
con.commit()
# Retrieving records
cursorObj.execute("SELECT * FROM person_tbl")
records = cursorObj.fetchall()
# Printing the records
print("Table records.")
for record in records:
    print(record)
# Closing the connection
cursorObj.close()
con.close()   

Output

The output obtained is as shown below −

First record inserted successfully!
Second record inserted successfully!
Duplicate record inserted successfully with the help of IGNORE keyword
Insert duplicate record error: 1062 (23000): Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
Table records.
('Thomas', 'Jay', None)
('Smith', 'John', None)
Advertisements