Article Categories

Selected Reading

Find all duplicate documents in a MongoDB collection by a key field?

MongoDB Database Big Data Analytics

To find all duplicate documents in a MongoDB collection by a key field, use the aggregation framework with $group and $match stages to group by the field and filter groups with count greater than 1.

Syntax

db.collection.aggregate([
    { $group: {
        _id: { fieldName: "$fieldName" },
        documents: { $addToSet: "$_id" },
        count: { $sum: 1 }
    }},
    { $match: { count: { $gte: 2 } }},
    { $sort: { count: -1 }}
]);

Sample Data

db.findDuplicateByKeyDemo.insertMany([
    {"StudentId": 1, "StudentName": "John"},
    {"StudentId": 2, "StudentName": "Carol"},
    {"StudentId": 3, "StudentName": "Carol"},
    {"StudentId": 4, "StudentName": "John"},
    {"StudentId": 5, "StudentName": "Sam"},
    {"StudentId": 6, "StudentName": "Carol"}
]);

Display all documents to see the data ?

db.findDuplicateByKeyDemo.find().pretty();

{
    "_id": ObjectId("..."),
    "StudentId": 1,
    "StudentName": "John"
}
{
    "_id": ObjectId("..."),
    "StudentId": 2,
    "StudentName": "Carol"
}
{
    "_id": ObjectId("..."),
    "StudentId": 3,
    "StudentName": "Carol"
}
{
    "_id": ObjectId("..."),
    "StudentId": 4,
    "StudentName": "John"
}
{
    "_id": ObjectId("..."),
    "StudentId": 5,
    "StudentName": "Sam"
}
{
    "_id": ObjectId("..."),
    "StudentId": 6,
    "StudentName": "Carol"
}

Find Duplicate Documents

db.findDuplicateByKeyDemo.aggregate([
    { $group: {
        _id: { StudentName: "$StudentName" },
        UIDS: { $addToSet: "$_id" },
        COUNTER: { $sum: 1 }
    }},
    { $match: {
        COUNTER: { $gte: 2 }
    }},
    { $sort: { COUNTER: -1 }},
    { $limit: 10 }
]).pretty();

The output displays duplicate records with Carol appearing 3 times and John 2 times ?

{
    "_id": {
        "StudentName": "Carol"
    },
    "UIDS": [
        ObjectId("5c7f5b248d10a061296a3c3c"),
        ObjectId("5c7f5b438d10a061296a3c3f"),
        ObjectId("5c7f5b1f8d10a061296a3c3b")
    ],
    "COUNTER": 3
}
{
    "_id": {
        "StudentName": "John"
    },
    "UIDS": [
        ObjectId("5c7f5b2d8d10a061296a3c3d"),
        ObjectId("5c7f5b168d10a061296a3c3a")
    ],
    "COUNTER": 2
}

How It Works

$group groups documents by the key field (StudentName)
$addToSet collects all document IDs for each group
$sum: 1 counts documents in each group
$match filters groups with count ? 2
$sort orders results by duplicate count (descending)

Conclusion

Use MongoDB aggregation with $group and $match stages to efficiently identify duplicate documents by any key field. This approach provides both the duplicate values and their document IDs for further processing.

Smita Kapse

Updated on: 2026-03-15T00:05:06+05:30

509 Views

Previous Next